Why every SQL Server installation should be a cluster

  • William Soranno (4/15/2014)


    Can Win 2008R2 and Win 2012R2 be clustered together?

    No. In the same WSFC, all nodes must be the same major version.

  • john.deprato (4/15/2014)


    You could use "active/active" clustering to get use out of the other hardware, however, if you have to fail-over during a maintenance period you have to run both instances on the same node which may cause a severe performance degradation. If you were to perform the maintenance during a time when you knew the hardware could support both instances then it could be possible. All of these scenarios are definitely conjecture because every environment is different in terms of what will and will not work. So it's always going to be a just depends scenario.

    Active/active is improper terminology.

    http://www.sqlha.com/2012/01/09/once-more-with-feeling-stop-using-activepassive-and-activeactive/

  • robert_verell (4/15/2014)


    Nice post about clustering and I agree on the nice points of it, but I disagree about clustering everything. Even in production. I like clustering as an HA solution, but as always, it depends.

    AlwaysOn and Peer to Peer Replication have the same upsides for the most part and have fewer contention points. People who love clustering and set it up everywhere have never had an entire disk array or multiple luns on a cluster go offline during business hours. At that point, failing over isn't bringing the data back online. With AO and P2PR (and really log shipping too) you at least have a separate copy of the data ready to go. Plus you can load balance across it and get some use out of those servers where they are replicated to.

    Let's also look at infrastructure costs:

    -if you have an AP cluster, you have resources that you have paid for that are sitting idle. You have paid for an entire duplicated server that will basically never do anything. You've doubled the cost of your server as well as the cost for upgrading it if you need to expand memory, add HBAs if you're hooked up to a SAN, etc (assuming you want performance the same on failover).

    -licensing with SQL 2012 is a pain point here too for physical clusters. Since you've installed SQL Server on the second (and n+ nodes) you have to license that box, which once again will sit idle and do nothing. For VM clusters this isn't as big of a deal since all of the cores will be licensed on the host anyway.

    No such thing as active/passive or AlwaysOn as a feature. Wrong terminology that people are hanging onto.

    http://www.sqlha.com/2013/04/29/alwayson-is-the-new-activepassive-and-activeactive/

  • As a Cluster MVP and someone many think of when it comes to clustering SQL Server, I have to disagree. Not every configuration should be clustered.

    While clustered configurations of SQL Server (and I hate using it in a generic sense - there is the WSFC which is the WIndows layer, FCI which is the clustered instance of SQL Server, and AGs which are availability groups and require a WSFC) can provide many benefits, there are a lot of things why you would not consider them.

    1. You can't handle the administration. To truly deploy an FCI or AG, you have to really understand WSFCs and concepts like quorum. I still find people using disk only quorum for heaven's sake! (just an example)

    2. While it is easier, and I did cluster back in the NT 4/SQL 7 days (Windows was more difficult than it was now, but SQL Server was the dog here, not Windows truth be told; the fact the article says SQL Server 7.0 was usable is laughable - clustering SQL Server wasn't really usable until SQL Server 2000 for many reasons), it's still an inherently more complex architecture. See point #1 - even as a Cluster MVP, no way I lead with clusters or recommend them where people can shoot themselves in the foot.

    3. You can't upgrade OS versions or do things like change domains. If you have those scenarios, WSFCs are not for you.

    4. We've moved beyond traditional shared storage with things like SMB 3.0 (introduced in 2012) and CSV (introduced in 2014) support. This makes deployments much more interesting and possibly scalable from a drive perspective.

    5. The more nodes you have and the more instances you have, the more of a nightmare it is to patch. Do not under estimate the update scenario.

    6. While technically true the only downtime during patching would be during the failover and script upgrade (but if you have multiple instances, you're not isolating but that's a whole other topic), you're not done. As someone who does this a lot, my patching IMO is not complete until I've tested things work after patching as they did before. That means fully testing failover wherever that instance is supposed to be able to run.

    7. Geographically dispersed clusters in SQL Server 2012 shift the problem. Instead of a VLAN, now we have possible DNS issues. Easier from a high level, but introduces other challenges.

    8. If you are going to new hardware every three years or so, you should refresh the OS, which means a new WSFC. Use log shipping or AGs to migrate and be done. Just putting in new nodes to an old WSFC seems pointless in most - but not all - cases.

    9. You can't just add a VM to see if it will work. Hyper-V and VMware both have specific ways you can address disks if you're using shared storage. If you're not using iSCSI or SMB, no way this can happen. Same for going to a physical.

    10. FCI + AG is definitely a complex architecture that can be downright ugly if you don't know what you're doing. Two words: asymmetric storage. And let's not get started on the quorum scenario if you're doing it over a distance ...

    Well intentioned, but definitely flawed, article.

  • I have a feeling Allan feels a little bit better now.

    Seriously though. There's a lot of good information there. We appreciate it. we had a 6 node multi-instance cluster where they would almost ritualistically fall on the same node and fail about once a week when they did updates. 15 databases, 3 instances, 32GB of ram, 6 nodes. Every time I was called in for a down application, it reminded me why I hated that job. I look at every option before clustering... Some times it's the right fit, some times it's just what the shop is used too and you have to beg.

    .

  • Bill (DBAOnTheGo) (4/15/2014)


    I have a feeling Allan feels a little bit better now.

    Seriously though. There's a lot of good information there. We appreciate it. we had a 6 node multi-instance cluster where they would almost ritualistically fall on the same node and fail about once a week when they did updates. 15 databases, 3 instances, 32GB of ram, 6 nodes. Every time I was called in for a down application, it reminded me why I hated that job. I look at every option before clustering... Some times it's the right fit, some times it's just what the shop is used too and you have to beg.

    LOL I come out of lurk mode every now and then on forums. 🙂

    Clustered configs really can cause more harm than good. When you have downtime due to poor updating procedures, not knowing what you're doing, etc., you look bad and undermine your credibility ("I thought the purpose of a cluster was to increase, not decrease, availability."). There's a reason I spend quite a bit of time with my customers planning if we're going to do one. So many little things can kill you even now when things are definitely better than they ever used to be. What you don't know will be a painful lesson learned later.

    Quite honestly, I would ensure that I had a rock solid backup and restore strategy WAY before I even consider looking at clustering. If you can't do backup and restore right, are you going to cluster properly? Probably not.

  • Caruncles (4/15/2014)


    Since I didn't know what "clustering" was (other than an inference in a Clint Eastwood movie) it was an interesting learning event for me. I need a disaster recovery option to keep the database running. Seems like there are a lot of cluster-haters in the response group, but I realize it's not a representative sample. Thanx for the article and keep 'em coming!

    I don't think we all hate clusters. We just don't think every SQL server needs to be clustered. You should look at your needs and budget and determine what will work for you for HA solution. My team is moving away from clustering and moving towards VMWARE HA and SAN replication HA solutions. This solves a regional disaster type problem. In this case SQL clustering/mirroring would be moot.

  • kevin.parks 41073 (4/15/2014)


    Caruncles (4/15/2014)


    Since I didn't know what "clustering" was (other than an inference in a Clint Eastwood movie) it was an interesting learning event for me. I need a disaster recovery option to keep the database running. Seems like there are a lot of cluster-haters in the response group, but I realize it's not a representative sample. Thanx for the article and keep 'em coming!

    I don't think we all hate clusters. We just don't think every SQL server needs to be clustered. You should look at your needs and budget and determine what will work for you for HA solution. My team is moving away from clustering and moving towards VMWARE HA and SAN replication HA solutions. This solves a regional disaster type problem. In this case SQL clustering/mirroring would be moot.

    SAN replication = block level

    VMware (DRS/vMotion/etc.) = VM level

    Neither is transaction-based. It all boils down to requirements and things like RTO/RPO. At the end of the day as the DBA, you're responsible for the data. If you have no control, can you meet your RTOs/RPOs? You're still on the hook. Bottom line: there is still a place for in-guest availability features.

  • SQLHA (4/15/2014)


    As ... someone many think of when it comes to clustering SQL Server.....

    I can attest to this. If I made a list of the best clustering experts, Allan would be the first name on it.


    My blog: SQL Soldier[/url]
    SQL Server Best Practices:
    SQL Server Best Practices
    Twitter: @SQLSoldier
    My book: Pro SQL Server 2008 Mirroring[/url]
    Microsoft Certified Master: SQL Server, Data Platform MVP
    Database Engineer at BlueMountain Capital Management[/url]

  • As someone else who presents a lot on clustering, I echo Allan's sentiments. You're introducing a whole lot of complexity for what may or may not be a good reason.

    Additionally, as of last April 1, if you have a clustered installation of SQL Server, you are required to either license both nodes, or have software assurance on the license. (which will add about 25% to your licensing costs)

  • Bill (DBAOnTheGo) (4/15/2014)


    My expertise level in SQL is somewhere below the "DBA in diapers" (I couldn't think of a cool phrase to express my ineptness), but if the clustered servers all share one data storage, that's a single point of failure. That's not cool is it? However, if the data storage has a backup, it might be a little more cool. I don't know how hard it would be to point the server to the storage backup verses the original storage. Definitely seems like there would be down time involved.

    Most storage has redundancy built in. SANs can have redundant arrays or even redundant sans. you gain redundancy when you are in different levels of RAID depending on your needs.... (yes raid 5 is fine... don't hate me....) you can have multiple pipes to the server so if one cable goes bad, you're still up. There's many ways to have good redundancy... it just all has a good price.

    The less down time you can afford, the more server and hardware you need to buy. There are good solutions to help on price.... but there becomes a point that you just have to spend the cash or declare that the data isn't worth it and scale back your seven 9's theory for a database housing your internal share point.

    You could use "active/active" clustering to get use out of the other hardware, however, if you have to fail-over during a maintenance period you have to run both instances on the same node which may cause a severe performance degradation. If you were to perform the maintenance during a time when you knew the hardware could support both instances then it could be possible. All of these scenarios are definitely conjecture because every environment is different in terms of what will and will not work. So it's always going to be a just depends scenario.

    active/active doesn't truly exist as far as I've found. what you're talking about is using two servers with two active/passive instances of SQL so that both boxes are in use. and yes, one goes down, they may both go down. This is not something I'd want my customers relying on. I've worked with those systems and I hated my job. I would strongly advise against that setup unless you build up each server to handle both loads regardless.

    Mirroring does all this without needing a cluster, without needing shared storage... you can still manually fail-over a mirrored instance and work on the other box. Clustering makes some things easier... one jobs to maintain for one, one set of logins and permissions, one server name to log in too.

    on the active/active thing. Active/active suggest that each side can take writes and reads. The closest you can get with SQL would be Peer to Peer replication, even there you have a delay and a chance for two people updating the same record and many design issues to work through.

    That's why I put the active/active portion in quotations because I wouldn't call it a true active/active because of not being able to read/write for the same instance on two different servers. Oracle's RAC solution and I think mySQL has another similar solution that allows this type of scenario. I had even heard of Microsoft potentially coming out with their own solution for this type of load balancing (that's what I would call something that works in that manner), but I guess it never happened. I only mentioned the SQL "active/active" solution because it does exist and I have worked in an environment that was using it. I still believe if you are going to cluster with more than 2 nodes you should still have at least one passive node to fail-over to. I was just suggesting that if you wanted to get use out of both servers you could run a different instance on each node of the cluster. We didn't seem to have too many issues with it actually, just that we would have some degradation in performance if we had to fail-over two or more instances to a single node. I agree though I wouldn't have set it up that way and I moved the client in the direction of finally getting a passive node added for potential fail-overs. Where I am working currently we use mirroring and I am with you and would prefer mirroring to clustering in terms of HA solutions.

  • I do like mirroring... I'm sure that's apparent. ^.^

    Every form of replication has it's place and before any replication is used... you need to ask yourselves a good set of questions.

    1) Do you even need replication?

    Honestly, how important is this data. How long does it take to restore? How often do you restore? Is it worth the cost of the second server?

    2) Does the failover need to be automatic?

    Certain types of replication don't do automatic failovers. Even mirroring doesn't when you only have two nodes.

    3) What types of replication does our current environment support?

    Do we have any requirements that force us to not use shared accounts? Can our service accounts have the permissions needed?

    4) What types of replication can we currently support without buying more hardware?

    Anything that requires shared storage becomes an issue if you do not already have this in place. A cheap quick solution can cost you a lot more in the end.

    5) Clustering specfic, you have to be an administrator to control a cluster for manual failover. Do you have those rights or can you wake the SA up at 2 am?

    This can cause an extra cost if you get paid for overtime.

    6) Is this a knee jerk reaction?

    Back to question 1, do you even really need replication? This may be one of those, "The sky is falling!" scenarios. You have to know that you can still support what you put in place.

    7) Out of those choices left, which one fits within your SLA (Service Level Agreements). All replication is not created equal.

    I'd welcome anyone to add or modify this list. This is what I normally have going through my head as I'm trying to decide what replication would fit best.

    .

  • Very interesting article and a great discussion generator. Many thanks.

    ...As a sole DBA at the place I work, I support over 50 SQL servers...

    Anyone else notice the irony? :hehe: I hope Francis' organization has a failover for when the DBA goes down.:-)

    ...One of the symptoms of an approaching nervous breakdown is the belief that ones work is terribly important.... Bertrand Russell

  • Anyone else notice the irony? Hehe I hope Francis' organization has a failover for when the DBA goes down.Smile

    Smile

    Now that's funny.

    .

Viewing 15 posts - 31 through 45 (of 80 total)

You must be logged in to reply to this topic. Login to reply