Why every SQL Server installation should be a cluster

  • "You've just countered your own argument there, by deploying nothing but Windows clusters to support purely clustered instances of SQL Server you're forcing support of this onto others."

    Not really, I was stating that a DBA in a given company may support one method of SQL HA, then some IT manager wants the whole SQL Environment clustered. That’s being forced to support it. They may like this idea or they may not. Personally I’m not forcing anyone to support anything.

    "Can't say I've ever had an issue, virtual or physical."

    Good for you. :satisfied:

    "Clustering is a Windows OS technology, nothing to do with SQL Server. In most organisations that have role separation the DBA wouldn't even touch the cluster!!"

    Yes it is, but then again so is Powershell but over time, like clustering it has become intrinsically linked to SQL Server. Saying it is nothing to do with SQL Server is a bit too black and white. I grant you WSFC (or earlier versions) wasn't created for the sole purpose of SQL Server but I would bet my mortgage that if you asked a given number of IT professionals what they use clustering or WSFC with SQL Server would score quite high. Again in most organisations, it isn't that black and white, plenty of grey areas. “Role separation”? Again a grey area in some companies. You say the DBA wouldn't touch the cluster? Really? Often I’ve had to work with the Windows Team on SQL/Clustering issues. However some companies don’t have a windows team, it’s just some guy who installed the back end OS, oh he left yesterday, so whose problem it is? “Well it’s SQL Server so that’s down to the DBA.” :unsure:

    "Yes, they generally hire consultants to come in and provide the necessary knowledge and skills training."

    Yes they keep in me work. 😀

    qh

    [font="Tahoma"]Who looks outside, dreams; who looks inside, awakes. – Carl Jung.[/font]
  • No, actually they would probably reply "exchange server".

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • In a cluster upgrade, you'd better have a rock solid rollback plan or you are hosed. I've not seen or heard of too many folks who go through the trouble of testing a cluster upgrade from nuts to soup. "At least the other node works" is not the solution.

    I've actually had a cluster upgrade fail. Microsoft's PSS solution? -- Reinstall. The Microsoft Windows team would point to the Microsoft SQL team for the cause of the problem. No one took responsibility. Also, one thing I never hear is the ability of folks to do a successful Windows restore from backup if something goes wrong with an upgrade (though personally I've done it myself in a VM/cluster test environment). Additionally, the core competency of our network staff where I work is at an all time low--that cannot even provide the basic permissions necessary for the DBA to install the cluster. With the ability to hardwire LUNs to a VM it's going to be VMs all the way--snapshots, disaster recovery, Windows patching, SQL patching--easier than ever.. For basic clusters, clustering in SQL 2000 actually seemed to work better than what we have today. I'd consider clustering if the DBA had complete control over patching the Windows O/S, and the SQL Server, but in today's segmented I/S environment--no way.

  • Mark Solomon (4/25/2014)


    In a cluster upgrade, you'd better have a rock solid rollback plan or you are hosed. I've not seen or heard of too many folks who go through the trouble of testing a cluster upgrade from nuts to soup. "At least the other node works" is not the solution.

    This is the kind of thing I'd like to hear more about. Especially your horror stories about DBAs not having permission to install the cluster. I've never worked anywhere large enough where that became an issue.

    Also part-way through you went from talking about clusters to talking about VMs. I'm not sure I followed the connection, are you implying VMs have supplanted some of the needs for clustering in some way?

  • Thanks Cody. The VM connection for us is this. We've got redundant VM solutions in place that can push a VM around from hosting server to server in the same or different geographic locations. It's highly available though not a cluster and generally the VM push is around 2 minutes. Perhaps you need killer I/O for a few SQL Servers--no problem--just attach a few physical LUNs. VMs also allow you to auto expand the disks--Windows clustered SAN environments--not so much. CPU creep for your SQL Server--just add another core to your VM no problem. This kills two or more birds with one stone where I work. That is it's a requirement to have disaster recovery site that's separate from our main operating site. Did your latest SQL Server SP just write over a hotfix that you needed--no problem--just restore the VM snapshot you created and you're up and running in no time. I mean, we're getting high availability on a standalone version of SQL Server.

    This VM solution not only fixes our SQL issues with clusters (which are many) -- but the DR is also a no brainer. Additionally, we're not compelled to upgrade every time Microsoft has a new SQL Server software release. We can now follow the customer's time line for their data and SQL Server requirements It's very tiring to maintain hardware for aging 2000/2005/2008 clusters--and we have to--some software simply does not work on newer flavors. However, with the VM solution--just upgrade the VM server itself--no SQL upgrade necessary.

  • I'm a little hazy on the huge cost detraction. To an individual the size of the hardware and software licencing may look huge. To a company that setup may be generating vast multiples of its cost. If you stick with a version and machine for 'n' years then the cost is spread over 'n' years. On top of that tax law allows various rules to be applied to lessen the burden.

    Apart from SQL Server a number of technologies have the concept analogous to a cluster. In many cases the advice is to configure as a cluster even if you have a single node cluster (which SQL Server supports) because adding to the cluster is easy. Converting a non-cluster to a cluster is often a complete rebuild involving planned (and brown trousered) downtime.

    So there are dependencies for setting up a cluster? Are they really that onerous?

    Replication can be daunting but offers great flexibility in configuration. I find the publisher/distributor/subscriber model gives far greater possibilities than master/slave setups in other DB platforms. In 2012 the speed improvements were extremely welcome for getting data out of front line operational systems to investigative and decision systems.

    Perhaps products like attunity would be a better option but if you are arguing about the cost of an HA setup then you probably can't afford it.

  • David's new comment made me go back and look. It seems right after I'd last replied two years ago I hadn't worked anywhere large enough to have VMs and clustering. And right after that I got an Enterprise job and that's all I work with!

    The cluster management role split between traditional Windows admins and DBAs turned out to be a very real one. It only worsened in 2012 when the standard became "you have AGs which show in cluster manager but never ever failover through there!" It was a foolish design decision with real world repercussions as other admins just don't know the difference - but also the risk is so minimal that nobody bothers to follow it anyway, so that one day it really will lead to disaster.

    David.Poole (3/25/2016)


    Replication can be daunting but offers great flexibility in configuration.

    Gives us nothing but trouble. Replication can fail in oh so many ways and bring down the source system when logs and disks fill. "Oh but you are monitoring all of that!" Yes, and we have layers upon layers of replication alerts (fill out the table with a time stamp, SCOM users set up monitors to check it, it gets sent to a queue, and it gets ignored or it takes a few hours to trickle down the line so we usually get there in time to see it go down). It has caused so much downtime.

    Want to set it up on Master Data Services? Heaven help you. It causes service packs to fail and model changes to fail. On any other application (when you have hundreds) every upgrade you now have to script it out, remove it, patch, reapply the script usually in the GUI by hand to make sure that nothing radical gets missed in the schema, wait half an hour to populate, and of course it has rewritten everything on the other end so when those users use CDC etc to pick up replicated changes it causes a tsunami down the line.

    Also the complete lack of documentation is astounding. Having to walk through text logs after doing a validation to get success/failure is ridiculous. I have read the Red Gate book and MSDN pages and anyone who claims to really know how replication works is IMHO naive - knowing how it works when it just works isn't knowing anything at all. Being able to really deep dive and check/investigate/troubleshoot the architecture is where it's at and nobody can.

    We would refuse to use it if we only could. It sucks up a lot of time and money for little return. But if you like it more power to you ??

  • I can't pretend to know vast amounts about replication but a group of us spent a lot of time working out how it worked, how to diagnose and rectify faults, how to performance tune etc.

    I acknowledge that the tooling is weak. The bit where a push subscriber fails eventually filling the distributor logs causing a failure which then causes the publisher logs to fill is a trap for the unwary.

    The sheer number of configuration options is difficult to get your head around. There is no "safety" default for it simply because your safety might not be my safety.

    Merge replication I avoided like plague. Transactional replication I'd more or less nailed and when it blew up I was confident of being able to find and fix quickly.

    I learned to dislike mirroring due to 2005 behaviours, probably fixed now. I was baffled by the complexity of Service Broker. It struck me as an expensive way of doing something with SQL Server that should be done in an external queueing system

  • David.Poole (3/25/2016)


    I was baffled by the complexity of Service Broker. It struck me as an expensive way of doing something with SQL Server that should be done in an external queueing system

    Here here. Though, if you've never seen it, Michael Bourgon does PASS presentations on "Overlord" where he uses it to capture basically every bit of DDL etc that goes through their servers and centralise it into a repository; for triaging when things go wrong. It's amazing.

  • Clustering, in the fashion you explain is by far the last thing anyone should be looking at with regards to SQL and making it HA. Shared Storage, plus having to have shared storage for a failover cluster instance is just a pain. No matter how good your IT department is, there is just way too much overhead in doing this type of cluster. My advice is you stay away from Shared Storage and from clustering the actual instance. What you can do is create a windows cluster with a file share quorum, and use HADR "Always On". This means each individual server that is part of the Windows Cluster does not have to have sql installed in "cluster mode", rather it is installed as an individual instance and will only depend on its local storage - again.. get away from SAN.

    I am not saying this fits for each case.. but I run a very large SQL environment and we DO NOT use traditional windows cluster with failover cluster instances of sql. We use always on.

  • oh.. and one last thing.. forget about SQL Failover Clusters in the cloud like AWS.... Always on will work in the cloud with some tweaks and multiple ip subnets.. but not traditional cluster as you cannot share IP's across machines. Not sure about Azure.. but AWS has that limit imposed on the network stack.

  • One thing I did not see mentioned in any of the comments advocating one method or another was regarding data loss. Clustering is still the only method available that provides HA and guarantees no data loss on a failover. Remember that even synchronous AO AG's can fall behind, causing them to revert to asynchronous, and, in turn, lose data.

    It's not a perfect one-size-fits-all solution, but none of them are, which is why there are so many options available.

  • While well written and brief. hardly a persuasive argument, and it completely ignores today's "cloud" environment where servers can be replicated so quickly, clustering my be obsolete.....

  • First: Thanks for the article. I appreciate that you took the time and made the effort.

    BEGIN RANT

    This is a good idea in an ideal world, where servers are free. DBA's would commune with Sysadmins and the security wonks in an environment of love and respect. Let's call it the Bernie Sanders SQLSerververse.

    In any medium to large sized corporate environment, where DBA's are tasked with hundreds of servers, thousands of databases, this may not be a possible option, nor a desired one. These DBA's have minimal time to handle the demands of end-users, application upgrades, while still having time to cover the documentation explosion that things like ITIL have made mandatory. This is with 95% automation already in place.

    In order to get AG's in place at our company, we need a sysadmin to install the cluster services. No other way around it. They insisted on keeping that duty, while also empahsizing that they wanted nothing to do with the monitoring, care and feeding of the cluster. More politics. Oh well, AG's just use the service, right?

    The security team decided that the permissions needed by the AG Wizard are too wide in scope, so our AG's are only possible to create by conference call, involving the sysadmin, the SQL DBA, and the security gnome, who will keep updating permissions until we achieve the creation of the AG, and then will reduce them immediately.

    The work, hardware, and overall stress of making each machine a cluster just to allow for the few benefits listed is not justified in a shop with a 'Do More With Less' mentality, and that defines every shop I have worked in over the past 15 years. I understand that you can make a 1 server cluster, so there is no need for additional hardware or licensing.

    While AG's are nice, I would only consider them to be a terrific solution if it was possible for an enterprise-level SQL DBA to implement them on their own, without having to involve a committee of people to get it set-up. Mirroring was something I could do for a customer on my own. It was a beautiful thing. Still needed additional hardware, but once I had the hardware, I was a one-stop shop for setting up the mirror. Ditto on replication, and log-shipping.

    Everything we do is on VM's. I have never seen a software or databases 'outgrow' a VM. I suppose it is possible, but we have databases that are a couple of terabytes, and they are doing great. For DR and HA the benefits of vMotion and SiteRecoveryManager really help. They don't do it all, and that is why we do have any AG's in the first place, but it is still a pain point to put them up.

    OK. Done Venting.

    COMMIT RANT

  • Joaquin-435851 (3/25/2016)


    What you can do is create a windows cluster with a file share quorum, and use HADR "Always On

    If you design and deploy your WSFC appropriately you shouldn't need to employ a disk or file share witness.

    Joaquin-435851 (3/25/2016)


    rather it is installed as an individual instance and will only depend on its local storage - again.. get away from SAN.

    Even for a stand alone instance in an AO group the storage could still be SAN presented. Some organisations have a policy of all storage being centralised

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

Viewing 15 posts - 61 through 75 (of 80 total)

You must be logged in to reply to this topic. Login to reply