Speedy Break Fix

I received a report at SSC of a missing toolbar in the forums. I checked, and sure enough, the toolbar for formatting code and text in responses was missing. I submitted a high-priority ticket, mostly because I hate unformatted code. The site was still usable, but people would be posting stuff that just didn't look good. The developers got to work and had a fix, but it still wasn't right. The wrong toolbar was showing. This was a generic toolbar, not the one that we have had for years (this one). They produced a second fix that contained the right toolbar.

The developers are in and out of the SQL Server Central codebase, working for a few days on open issues, then busy with other work for a week or two. They work in a DevOps style environment, with changes run through a Pull request/peer-review process, then CI, then an automated deployment. They often can make a change and deploy it in hours, or even minutes.

Is this a bad process?

It would be easy to say that the developers were moving too fast in their deployments and not testing things. However, the issue in production wasn't easily reproduced locally, and we found a few places where local dev environments don't quite match live. That's been an ongoing problem for as long as we've been developing software outside of production. The DevOps solution here is to adjust the environment setup in code so that dev does match production closer. I'd say this is my solution, but likely we'll find something else in the future that isn't the same and we will continue to need to adjust something to get the environments in sync.

What's the alternative? We could go slower and batch up a bunch of changes, testing them all, moving at a more waterfall-ish cadence. Would we catch this? Perhaps. However, we might not, and if we are batching changes, then bugs live for a longer time in production until we perform a new release.

Or we could build a quick patch, in one of two ways. One is by moving quickly, at an agile/DevOps speed, to quickly build a patch. That's a hacky, not-thought-through flow that many people use to fix a deployment, usually without the process and bounds that a good DevOps process provides.

The other option is to bundle a fix in the next release, which could be next week? Next month? Who knows. Slow processes don't necessarily make the final code better and can prevent rapid fixes.

Ultimately my feedback from here to the project manager is that we need a test, a process, or something that checks for these issues before release. While I know something else could break, I prefer these small deployments taking place often, with bumpers to limit the regressions we know about.