SQL SERVER FCI vs AG's

  • We currently have a two node,Multi Instance SQL 2014 FCI that we will be upgrading in the near future to2016 as well as adding an additional Instance and additional node.  I’ve been tasked with examining AG’s to seeif it makes better business sense than the FCI. We’ve had a few issues with the FCI and some are asking if AG’s would beable to solve the issues…

    We have a SAN for sharedstorage so that is not an issue. 

    We have about 125 databases between two instances. All told the databases use about 10 TB of data between the two instances.

    The issue we’ve experienced in the past is that with thenumber of databases and amount of memory in use (~400 GB per instance), ittakes the FCI a while to checkpoint all of the databases, release the memoryback to the OS and go through crash recovery on the other node to come backonline for manual failovers (patching, other maintenance).  This can take as long as 5-10 minutes.  We have a 24x7 web presence that we need tokeep online.  We’re upgrading to machineswith 3TB of memory and there is concern that failover time could dramaticallyincrease.

    In the past we’ve stayed away from AGs because they did not providea full recovery solution and our current fci based solution was workingwell.  With the newer versions there aresome features that are looking attractive, including readable secondaries, AGDTC’s (we use linked servers to pass data between the instances) and automaticpage recovery among others.

    I’m leaning towards the existing FCI for local redundancy,with the possibility of one or more AGs to a remote and local Datacenter for DR and reporting/read-onlyoffloading respectively.  Currently we use file basedreplication to a warm standby site which I working well.

    Can anyone comment on similar experiences and what is workingfor your large OLTP environments?  Also,I’d like to hear about gotchas and other issues you may have seen with thistype of environment.

    Thanks in advance,

    -Luke.

    To help us help you read this[/url]For better help with performance problems please read this[/url]

  • Luke L - Wednesday, February 8, 2017 7:08 AM

    We currently have a two node,Multi Instance SQL 2014 FCI that we will be upgrading in the near future to2016 as well as adding an additional Instance and additional node.  I’ve been tasked with examining AG’s to seeif it makes better business sense than the FCI. We’ve had a few issues with the FCI and some are asking if AG’s would beable to solve the issues…

    We have a SAN for sharedstorage so that is not an issue. 

    We have about 125 databases between two instances. All told the databases use about 10 TB of data between the two instances.

    The issue we’ve experienced in the past is that with thenumber of databases and amount of memory in use (~400 GB per instance), ittakes the FCI a while to checkpoint all of the databases, release the memoryback to the OS and go through crash recovery on the other node to come backonline for manual failovers (patching, other maintenance).  This can take as long as 5-10 minutes.  We have a 24x7 web presence that we need tokeep online.  We’re upgrading to machineswith 3TB of memory and there is concern that failover time could dramaticallyincrease.

    In the past we’ve stayed away from AGs because they did not providea full recovery solution and our current fci based solution was workingwell.  With the newer versions there aresome features that are looking attractive, including readable secondaries, AGDTC’s (we use linked servers to pass data between the instances) and automaticpage recovery among others.

    I’m leaning towards the existing FCI for local redundancy,with the possibility of one or more AGs to a remote and local Datacenter for DR and reporting/read-onlyoffloading respectively.  Currently we use file basedreplication to a warm standby site which I working well.

    Can anyone comment on similar experiences and what is workingfor your large OLTP environments?  Also,I’d like to hear about gotchas and other issues you may have seen with thistype of environment.

    Thanks in advance,

    -Luke.

    AGs are per set of databases, FCI is per instance, big difference.
    Even with AGs, failover can be time consuming depending on how busy the databases are (you maybe have a large redo queue at the secondary).
    Also bear in mind, if you introduce an FCI into an AG, the failover between the FCI and other replicas in the group will be restricted to manual failover, no automation.
    They both do very different things, with Ags you have to sync accounts, jobs, etc. Are you ready for this?

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • Perry Whittle - Wednesday, February 8, 2017 8:26 AM

    AGs are per set of databases, FCI is per instance, big difference.
    Even with AGs, failover can be time consuming depending on how busy the databases are (you maybe have a large redo queue at the secondary).
    Also bear in mind, if you introduce an FCI into an AG, the failover between the FCI and other replicas in the group will be restricted to manual failover, no automation.
    They both do very different things, with Ags you have to sync accounts, jobs, etc. Are you ready for this?

    Perry, thanks for the reply.  Yes, I'm aware of the difference between db level(AG) and instance level(FCI), it's one of the main reasons why we've not moved away from FCI's to this point. 

    Thanks for the reminder on the redo queue for busy databases.  Are you saying that the FCI fail over will be manual or only the fail over from the FCI to one of the secondary AG's? (In the scenario that an entire datacenter/SAN/the entire FCI) goes offline?  It was my understanding the FCI would function as normal, but we could use the AG's with the Read Only routing listener to offload reporting, backups etc. for certain databases.  I understand that fail over to the AG from the FCI would be manual and could be somewhat cumbersome if quorum is lost.  But that does bring up an interesting point for one of our use cases.

    As for the jobs and logins, that's less of a concern for me.  Those can all be scripted...  jobs could be updated/refreshed via jobs and would be either disabled or created with logic to check if it was running on the primary AG or not and to behave accordingly.

    Thanks again,
    -Luke.

    To help us help you read this[/url]For better help with performance problems please read this[/url]

  • Luke L Are you saying that the FCI fail over will be manual or only the fail over from the FCI to one of the secondary AG's?

    No, it's not related to the FCi itself

    In an AG, if you have 2 replicas 1 an FCI and 1 a standalone instance, AG failover between these replicas is manual, there is no automation.

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • Ok that's what I thought... thanks for the clarification.

    To help us help you read this[/url]For better help with performance problems please read this[/url]

Viewing 5 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply