Why Don't You Upgrade in Place?

Ask a senior-level SQL Server or Windows administrator about doing in-place upgrades and you'll likely get a negative response ranging from "that's a bad idea" to a puzzled look and a polite scoff. To someone with less experience or a higher-level view building a new server plus a migration seems both more expensive in both time and money as well as more risky than the in-place upgrade. Taken together, that sure seems like the topic for an interesting discussion. I'm betting that you already have a well-formed opinion!

Let's start with why. Why do we need to upgrade? If it's working, why do we need to change it? The main answer is security. Once a product gets to the end-of-life stage there are no more guaranteed security patches. Besides the obvious if unquantifiable risk, that makes it a compliance issue for almost every business. Most of the major compliance standards have language that requires software to be supported by the vendor. Yes, it's sometimes possible to get exceptions granted by putting various compensating controls in place, but those are exceptions and by definition should be a small set of our servers.

It's interesting to note that the compliance rules, as they often are, are really designed to support and push businesses into doing the right thing. Going back to the days before PCI, SOX, and other standards, it wasn't uncommon for businesses to only support upgrades for changes that delivered business value and it was tough to make the case for a "nice to have" upgrade from one version to the next.

There are other reasons to upgrade that you might see as more or less pragmatic than being compliant. One is that with each new version, the business gains new capabilities, both for developers/vendors (who may or may not begin using them) and for DBAs (who may or may not begin using them). Whether they get used depends on a lot of variables such as having time to learn about the changes, seeing a use case for them, and having time to implement them. Closely related to the new capabilities rationale is being able to hire and retain people to work with and manage the systems. Not easily quantifiable, but it's a factor. Finally, our systems don't live in a vacuum. We may be forced to upgrade because the team that manages the OS wants/needs to upgrade.

For those new to the topic, I want to add a little more to the why. Just upgrading to the latest version isn't the end of the story, we also have to apply patches within some reasonable time frame based on local policy. Patches, aka cumulative updates or CU's in SQL Server, are generally assumed to be bug fixes and minor optimizations but not larger changes that would require more extensive testing. While teams may let patches bake for a month or more for anything short of a zero-day exploit, most companies see patching as a low-risk, business-as-usual operation. Cumulative updates replaced the larger and less frequent service packs for a number of reasons including reducing the number of changes per CU and decreasing the time to market were part of it, but it was also the reluctance of many (most?) companies to upgrade to a new version until service pack number one (SP1) has been released. The latter was based on the assumption/experience that the generally available (GA) release of a new version was likely to have bugs only found as a larger group of users tried it and that SP1 would fix most of those and by the volume or lack of indicate how stable the new version really was.

Now that we've covered most of the why, let's get into the how. Upgrades are easy to visualize. Download the latest version, launch the installer, click upgrade, and wait for it to complete. The time to complete varies, but let's say that on average it's under an hour. The alternative is a migration where we build a new server, install SQL, copy the databases over to the new version, and point the applications to the new instance. The time to complete will vary a lot depending on the maturity of the team and the process being used, but it's always far more than one hour, often going into days. Those descriptions are very simplified, but highlight the differences and make the key question fairly obvious; what factors would warrant doing a migration that seems to be so much more time-consuming to implement?

The short answer is fear. Or, less provocatively, lack of trust (in the upgrade process). It's a little more complicated than that in the real world of course. Doing an upgrade changes a lot of things in the file system and the registry. It also changes the internal structures of the databases. If it all works, life is good. If it fails, in theory, you could uninstall an upgrade and revert to the earlier version, then restore all the databases to the pre-upgrade state. If all of that gets you back to the old version safely, that's good. If not, you're in the weeds and the clock is ticking on your outage window. Back when physical servers were the norm it might mean a restore from tape, not something anyone really wanted to do. Today with virtual machines it's easier and arguably less risky to revert back to the pre-upgrade state via a snapshot, but even that takes time.

A smaller part of that fear is that if you do upgrade after upgrade, the server just accrues debris, seen and unseen. Old files, registry settings, whatever. Most of us have seen that formatting a laptop and reinstalling Windows not only frees up some disk space but also seems to make it run faster and more reliably. Upgrades should and mostly do work, but even with a success there's that nagging worry that some DLL didn't get upgraded because some process had it locked or it left behind some DLL that will leave or allow a vulnerability.

Contrast that with a migrate the databases to a brand server approach. You start with the latest operating system available to you, you layer on a clean install of SQL with only the features that you need enabled, you bring over all the server level configuration stuff (not trivial, not impossible), bring over the databases via backup/restore, and start the smoke testing. If it works, home run! If you really hit issues, your rollback is a simple as repointing apps to the original server and taking the new one offline. There are risks and additional effort, including but not limited to missing a configuration option or server setting or file share permissions, finding and changing all the applications and processes that point to the old server to reference the new one, plus the time to backup, copy, and restore every database. Lots of little things you might miss, but most of those you'll find and fix quickly and add to the list of things to check as you move on to the next server.

The risks of the two approaches are different. The cost - in hours - is somewhat to a lot different, depending on the experience of the team, how good the scripts are, how standardized the old servers are, and more. It's hard to say one is always the right choice, or the wrong one, but it seems like we've largely sidelined the upgrade in place approach and I wonder if that is good for us or the businesses we support. Is it because we want perfect over good? Or that we prefer the manageability of the risks in a migration vs the unknown risks of in-place?

Why Don't You Upgrade in Place?

Rate

Share

Categories

Share

Rate

Why Don't You Upgrade in Place?

Rate

Share

Categories

Share

Rate

Related content

Restore With Standby; A Review and A Lesson Learned

Turn Out the Lights

Annual Security Compliance Training

Exit Lines

Would a Duress Password be a Good Idea?