Failover Clustering with a 3 server setup

  • I have a question about SQL Server Failover Clustering that I am not too clear on based on our current setup and research.

    We have 3 identical physical blade servers that I would like to use for failvoer clustering.  My plan is to have all 3 servers as a primary, secondary, and tertiary server, so resources are fairly level loaded across the 3 servers during optimal times.  I say all 3 as primary because I want to host SOME of the SQL instances on each of the servers so none of the servers would ever be sitting idle, they may just get spikes in the event one of the other 2 needed to fail over.

    We have roughly 60 SQL instances that I would like to spread out across the 3 physical servers and we are currently licensed to do this, so licensing is not a concern.  The SQL Servers have different versions ranging from 2008 R2 all the way up to 2016.  What I would like is to have approximately 20 SQL instances per physical server but with each server being a failover host for the others.  We have a SAN on the back end and can set the disks up per instance with different drive letters so there will be no conflicts on failover. Our I/O is not that high on the instances and the SAN handles things pretty well, so we don't notice disk throttling at this time having multiple SQL instances sharing disks.

    To make it MORE fun, we would like to be able to have SSIS and SSRS hosted along with this.  I know SSIS doesn't fail over this way from the documents I read (although they may be out of date), but I am thinking if I have the SSIS services installed on all 3 of the servers and running on all 3 of the servers, it shouldn't cause too much trouble as long as the SQL Instance with the SSIS catalog is only running on 1 of the 3 servers at a time which would be the required case as it is on a drive letter that can flow across to the other 2 servers.  SSRS I think I'd need to build up something in powershell and a scheduled task or something so if the primary sever went down, it could start the SSRS services on one of the other hosts, but that isn't a requirement at this time.

    I have not started any of the steps required for SQL Server Failover Clustering, such as setting up windows failover clustering, as I wanted to ensure that what I was thinking of doing was even possible.

    All of my SQL instances are SQL Server Standard; I have no Enterprise editions.  The windows OS is Windows Server 2012 R2.

    I was not able to find any good documentation on how to set something up like what I am proposing above, but that is why I am reaching out to the experts.

  • This is a lot, but first, 25 instances in an FCI setup, so 60 is out.

    My concern at this scale is that you're going to make troubleshooting and resource management hard. I don't worry about the 3 instances being live and failover, but the instances are concerning. Is there a reason you can't use 3 instances on each and have databases be the point of separation inside the FCI?

     

  • Hello Steve,

    Thanks for the reply.  I was not aware of the 25 instance limit; that is a good thing to know!

    That is a good future state for sure, but current state we have a lot of instances.  Did some checking and I counted wrong.  I was including both test and live in my count.  I don't need the test ones, so that drops the number down to 27 instances ranging across multiple versions.  If we exclude our SQL 2000 and SQL 2005 instances from that mix, we are at 25.

    Consolidation is something on our radar to try to reduce the number of instances down as it would make resource planning on the servers much nicer.  Trying to keep the MAX memory per instance at a reasonable level while leaving some room for SSIS and SSRS on these servers has been interesting to say the least.

    We do currently have a failover tool, but are investigating what other options we have which is why failover clustering came into the mix.

    The nice thing about the instances is they are mostly OLTP, but none are exceptionally heavy use.  We rarely see CPU spikes except when our ETL processes run to our OLAP instance.  And memory only spikes when that ETL process runs too.  We have it tuned pretty good at the moment.

    The main reason for having so many instances for our small team (3 DBA's) is the requirements we got at the time the instance was created.  For example, IT has several tools that all need a database backend.  SOME of the tools claim to need sysadmin permissions, so those get their own instances.  I'm not willing to risk having the tool drop a database it does not recognize when updates happen for example.  Then we have a single instance for our ERP tool as we want to keep that one as clean of things we need to create as possible to keep the auditors happy.  They like to go through that instance and ask us a lot of questions each year about who has access and what they have access to.  Plus a few different instances for our internal processing (company internal data such as test results, not DBA specific).

    Most, if not all, of the 3rd party tools requested some level of permissions that I didn't trust having on the same system as anything critical.  Plus some of the 3rd party tools are only compatible with a specific SQL version which is what led to some of our version sprawl.  With the 2000 and 2005 instances - those are the LATEST versions that the tools support, mind you the tools are no longer in support.

    My expectation is that in the end, we will be left with about 10-15 SQL instances total across both test and live just to make sure all of our tools can work.  Getting it down to 3 would be ideal, but just taking a more realistic approach of even getting it down by half to 30 would be a huge improvement.

    Now, with our instances, we have 2 out of the 25 that are bigger resource hogs - our ERP one and our primary test result one, so those 2 we like to keep on different physical boxes as much as possible.  So lets call those A and B.  If our servers are 1, 2 and 3, we would host A on 1, B on 2, and the secondary for both of these we would like to be 3.  So if 2 fails, B starts up on 3.  If 2 AND 3 fail, B would start up on 1.

     

    At the moment, I am mostly curious if this is a possibility.  If so, I think our next steps will be to either do some trial and error with our test systems or to reach out to a consultant.  I just didn't want to use a lot of resources only to find out what I am trying to do is going to fail.

  • You might want to consider this company: https://dh2i.com/

    As for SSRS - you want to use Enterprise Edition and licensed across all nodes and use the scale-out functionality.  You would have a single database engine instance - and scale our reporting services to each of the nodes.  Reporting services would not be clustered at the server level...you can use NLB clustering to load balance across the nodes though.

    For SSIS - I think you have the right idea.  Clustering the instance containing the integration services catalog should be enough...

    As long as you have enough resources available to support all of the instances across 2 nodes, and you make sure the memory allocations are not locking pages and have a correct max memory setting you should be okay.  I would definitely look at limiting what nodes are available to each instance though...and not necessarily allow all instances the ability to fail over to all 3 nodes.

    One more thing - if you have multiple SSRS instances you definitely will need to consolidate those to a single instance.  SSRS 2017 will only allow a single instance of the reporting services to be installed on a server.

    Jeffrey Williams
    Problems are opportunities brilliantly disguised as insurmountable obstacles.

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Thanks for the tips!

    We currently have a single SSRS instance, but were toying around with the idea of a second to handle a subdivision of the company.  One instance per subdivision, but it sounds like that won't work.

    As for failing everything, we can probably have a few things that don't fail over, but was something we were hoping to avoid.  The idea with the failover for us is to reduce downtime due to unexpected server hiccups.

    As for DH2i, that is the tool we are currently using and are liking it, we just like to evaluate other options from time to time to make sure we are using the right tool for the job.  It is very easy to use a tool for years and get comfortable with it and then have trouble thinking about changing.  I am thinking about with HP Polyserve and getting the surprise notice that they were shutting that down which is what led us to DH2i.  I just want to evaluate other options so we don't have a mad panic like we did with Polyserve.

    As for getting enterprise edition for SSRS, that'll be a tough sell.  Something I would definitely like, but not something the company will likely go for.  We keep testing out other reporting solutions (informer, SAS, powerbi, tableau) and then scrapping those ideas in favor of SSRS for one reason or another; usually that the company isn't doing analytical reporting and is wanting "live" data in their reports.

     

    But thanks for the advice Steve and Jeffrey!  It sounds like my approach to this and my end goal is something possible, just might be a pain in the rump.

  • I built out a PolyServe cluster at a former site and really liked the solution, especially the storage solution.  When we licensed the servers for that environment we licensed it for Enterprise Edition across all 4 nodes - which allowed us to 'downgrade' instances to Standard Edition or lower Editions if needed.  However - since we were licensed for Enterprise we just installed Enterprise for any instance in the cluster.

    For SSRS - here is how I would approach that.  I would license several reporting services instances with Enterprise licenses at 2 2-core licenses.  That cost is fairly reasonable for the advantages you get with Enterprise SSRS - namely, scale-out and data-driven subscriptions.  You can still use the cluster for the database engine running Standard Edition - and instead of installing reporting services in the cluster you install those services on a web-farm.  You can then use your load balancer of choice (Netscaler, F5, etc...) to load balance across the web farm to process the report requests.

    One other thought - you could look at purchasing the PowerBI server which is really just SSRS with PowerBI.  Again - the licensing should only be relevant for the actually reporting services and the instance where the databases are stored can be Standard Edition.  I am not positive about that...but I see nothing in the documentation that would require the database engine to be Enterprise Edition.

    What you are looking to do can be done using a standard windows cluster...the only place it gets tricky is exactly the same issues you have with DH2i.  Specifically making sure your instances can fail over to a host that has enough resources to support the additional workload and what hosts are available to that instance.  DH2i makes that a bit easier because you can visualize the configuration better...but the same issues apply.

    With all of that said - the advantages of DH2i are many and I would definitely go that direction.  Much easier storage management - easier balancing of instances - and much, much, much easier deployment of new instances.

    Jeffrey Williams
    Problems are opportunities brilliantly disguised as insurmountable obstacles.

    How to post questions to get better answers faster
    Managing Transaction Logs

  • I like the idea of moving SSRS to the web server.  I will run that past our IT team.

    PolyServe was a fun tool when configured correctly for sure.  We ran into a few issues with some admins thinking they knew how to configure it properly with the fencing and when a failover happened, it took the entire cluster down instead of failing over.  That was a fun one to troubleshoot.  But, if memory serves, we had basically told it that if any node goes down, then take the NIC offline.  Not the NIC from the server that went down, but the primary NIC on each of the servers in the cluster.  So one node went down and all 3 of the servers disappeared from the network and the sever admins needed to go into the server room and manually connect to the servers to turn the NICs back on.

    DH2i made that a much nicer experience in that if a server goes down, the other 2 would talk to each other and happily pull the instances online.  DH2i is definitely not out of our options for failover by any means, we just want to make sure we are not using it simply because we are familiar with it when other tools are out there.  I mean, they allow you to have mismatched hardware for your failover and mismatched OS's (both windows server AND Linux), and you can fail over any service; it doesn't need to just be SQL related services.  We are happy with DH2i for sure, we just want to investigate other options.

Viewing 7 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply