RE: Why every SQL Server installation should be a cluster

SSC Guru

Points: 233897

April 16, 2014 at 4:37 am

I'm astounded, after picking myself up off the floor in amazement let's look closer

Francis Apel
Reason to cluster number 1: Hardware Failure
Probably the biggest reason someone would choose to cluster their SQL Server systems is to support high availability. The avoidance (or at least lessoning) of an outage when the hardware goes bad. Funny thing is, although this may not have been true several years ago, with the quality of today’s hardware I rarely see hardware as the main issue in server downtime.

Most organisations that have mainstream kit will likely have a purchased a support pack which typcially provides anything up to a one hour response for hardware failures. Most vendors for some years now have offered bronze to gold support packs, for a very good price. These packs are also extendable when the period is due to lapse. Clusters can negate hardware failures but the cost is high, not always applicable. Other technologies exist to provide hardware protection.

Francis Apel
Reason 2: SQL Server Upgrades
This is probably my favorite reason to consider clustering. As a sole DBA at the place I work, I support over 50 SQL servers (and yes, almost all of them are clustered – even the sandbox, development and QA servers). Upgrading SQL Server is “quick and easy” when you have a cluster, and I can do it on my schedule. Why? Because the only downtime is a failover and the time it takes SQL to update the scripts. This usually happens in a handful of minutes. There is no downtime associated with updating an inactive node, so I can upgrade the inactive node anytime I want. Then I can schedule a failover when it is convenient for the business. After the failover, I upgrade the second node (which is now inactive) when I have time again.
Compare this to doing a stand-alone upgrade, where SQL Server is down the entire time of the upgrade. Not to mention crossing your fingers that everything goes swimmingly with the upgrade. And if it doesn’t, your downtime just turned in to hours, or possibly days. Yikes! Worst case with a cluster, is I have to rebuild the node and re-add it to the cluster, all while SQL keeps running. I have had upgrades fail and had to tweak some things to get it to work, but again, all while SQL keeps running.

Since SQL Server 2008 CUs and SPs can be uninstalled via add\remove programs, this is not a valid reason to opt for clustering. You'll always deploy to a dev and\or test system first so you'll know beforehand if the patch has an impact.

Francis Apel
Reason 3: Windows Upgrades
I work in a company that has a networking group. That group is responsible for all of the OS level patches. What happens is they patch the inactive node (again on their time schedule) and fail over to that node when the business is ready. We’ll then wait for one week, if all looks good we’ll then patch the inactive node. There have been many times when an OS patch directly affected either SQL Server, or one of our SQL Server processes (SSIS, CLR, etc). When that happens, we simply fail back to the unpatched node, check to make sure that it was indeed the patch (by checking that everything is working as before the patch). We then remove the offending patches. Downtime is about a minute. On a stand-along install, you could be looking at multiple reboots and testing as you pull off each patch, all while SQL is probably down.

If the system administrators are doing their job, windows updates will be controlled and pushed carefully. Scrutinised and system soaked in a test scenario before pushing to production, again not a valid reason to opt for clustering.

Francis Apel
Reason 4: Offsite Node
Starting with SQL Server 2008. Geo-clustering (clustering servers not in the same subnet) became possible and in SQL 2012 it was greatly enhanced and actually became usable.

This can be a valid driver but has an extremely high cost and maintenance overhead.

Francis Apel
Reason 5: Move to better hardware
Another of my favorite reasons to cluster. We purchase hardware that comes with a 3 year warranty. Once the warranty runs out, we replace the server. In our situation, it simply is cheaper. We could debate why replacing your server every 3 years is actually cheaper than holding on to it for longer, but that would be another article. So, once our hardware comes up for replacement, we are tasked with moving off the old hardware and onto the new. In a clustered environment, it is extremely easy to simply add an additional node into the cluster (using the new hardware). Then fail over to that node. Once we are sure the new hardware is stable, we add the second new node, then decommission the old two nodes. Easy.
Again, downtime is about 1 minute. On a stand-alone server, best case, you would need to bring down SQL while you reattach the storage to the new server, or if that’s not possible, restore the databases to the new server. That is probably hours of down time and another session of crossing your fingers and hoping everything works out.

With better hardware usually comes an OS upgrade, so, alas clustering does not really work here.

Francis Apel
Reason 6: Add a VM to see if virtual will work
We’ve been moving to VMs for all of our SQL servers. There are huge benefits; licensing, lower power costs, isolating applications, lower hardware costs, taking advantage of VM high availability and the list goes on. Moving from physical to virtual is the right thing for us. When we’re moving our physical servers to virtual servers, we do our homework, do our sizing and performance testing, but in the end how will we know definitively if the virtual server environment will work for our specific SQL Server instance? What happens if you move to a VM environment, only to find out that it simply won’t work for your workload? This is where clustering can pay off again. In a cluster environment, you can simply spin up a SQL Server VM node and add it to the cluster, fail over to that node, test, benchmark, etc. If things look good, add another VM node and you’re on your way. If you missed the boat on estimating, simply fail back to the physical node and the only down time is about a minute (and your hurt pride).

The overhead required to provide shared storage to a VM is high and can be complicated. Not sure about HyperV as i dont use it but in ESX server there is a limit on the number of paths that may be attached to a host, multi pathing storage to a cluster node VM can deplete these paths very quickly. I'd rather have the VM as a stand alone server and rely on Vmotion, it does exactly what i says on the tin.

Francis Apel
Reason 7: Move to a physical when you’ve grown outside a VM
On the flip side, what happens when your VM outgrows the VM environment? Maybe you’ll upgrade your VM hosts to accommodate the growth, but maybe that isn’t feasible. With a cluster you would have the option to move to a physical machine to increase the horsepower for that particular SQL Server by simply adding a physical node to the existing SQL Server VM cluster. Again, downtime would be a minute or so.

again no weight here, i have performed this before in both directions. A full OS backup (physical or virtual) and restore to a new server (physical or virtual) will acheive the desired result.

Don't get me wrong I'm a fan of clustering, just read some of my guides here on SSC, but hink of the admin overheads too, for every instance you cluster you require a new IP address and computername. Cluster each instance into a new Windows cluster everytime and you need 2 IP addresses and 2 computernames. This reduces youur available IP ranges significantly.

Clustering like any other technology has it's place. It's shouldn't be viewed as a default deployment procedure IMHO

-----------------------------------------------------------------------------------------------------------

"Ya can't make an omelette without breaking just a few eggs" 😉