Availability Groups either -- Windows 2016 Failover Cluster vs SQL Server Cluster

  • Hello,
    I supported SQL Server Clustering for a number of years, but now I am facing a task to implement Availability Groups either on Windows 2016 Failover Cluster(without SQL Server Cluster) or utilizing SQL Server 2017 Cluster on a Windows  2016 Failover Cluster.
    What are the advantages and/or disadvantages of utilizing the above 2 methods?
    There are a number of links around, but could you please share a GOOD links, if any.

    Thank you

  • SQL Clustering * what it is, what it isn*t, and how to get both - PASS.org

    http://www.pass.org/EventDownload.aspx?suid=6528       

    you may have to google the first piece and download the powerpoin

    you have to build your own spreadsheet comparing the two technologies and '
    everything around them. $$$ and expertise is the two main things to gather.

    good luck

  • Thanks, Any other suggestion(s)?

  • barsuk - Thursday, July 26, 2018 2:19 PM

    Thanks, Any other suggestion(s)?

    The direction you go depends on what your needs are and what components you need to have in place - what edition of SQL Server, do you need a readable copy of databases

    A big difference is that FCI is at the instance level and Availability Groups is at the database level
    This is a decent list for consideration:
    Should you choose a SQL Server Failover Cluster Instance or an Availability Group?

    This is short but addresses some differences as well:
    Choosing a SQL Server Availability Technology

    You can have different mixes of HA/DR approaches which is why a lot of what you do depends upon what those needs are. This link touches on the common misunderstandings:
    Will SQL Server 2012’s Availability Groups Replace Clustered Instances?

    Sue

  • Thank you, we are using SS Enterprise 2016. We need to ensure that in the event of the down time, database(s) will become available on the other node or hot spare automatically.
    Currently we have 2 Node Windows Cluster with Availability Groups in Async mode, so each node is being utilized as a Primary and as a Secondary replica at the same time.
    But from what I see Windows Clustering ( without SQL Server Cluster) doesn't have the notion of Automatic Failover in a Sync mode.
    So that option - Automatic Failover in a Sync mode - is only available when SQL Server Cluster exists on top of the Windows Cluster?
    Is it right?

  • inHouseDBA - Thursday, July 26, 2018 8:26 PM

    Thank you, we are using SS Enterprise 2016. We need to ensure that in the event of the down time, database(s) will become available on the other node or hot spare automatically.
    Currently we have 2 Node Windows Cluster with Availability Groups in Async mode, so each node is being utilized as a Primary and as a Secondary replica at the same time.
    But from what I see Windows Clustering ( without SQL Server Cluster) doesn't have the notion of Automatic Failover in a Sync mode.
    So that option - Automatic Failover in a Sync mode - is only available when SQL Server Cluster exists on top of the Windows Cluster?
    Is it right?

    For automatic Failover of AlwaysOn Availability Groups there must be a Quorum Disk and Windows Clustering Setup. As documented here. Asides that you can change the mode of the existing AlwaysOn Availability Group if prerequisites are met, you shouldn't install SQL Server as a Cluster first. There are scenarios where you can combine the power of SQL Clustering and AlwaysOn Availability Groups however it is not necessary.

  • Each option has advantages and disadvantages, You need to choose what is best for you.

    Both options cost the same.  If you run in A/P mode the licensing is the same for clustering or AGs, you only need licensing for the active node.  However, if you want to run A/P/P over 3 nodes then you need to licence two of the nodes.  If you want A/A or A/A/A mode again the licensing is the same, you need licensing for all nodes that can be active. 

    Some advantages to AGs are:
    Failover does not interrupt end-user access
    Failover can be automated, either by using synchronous replication or by using SQL 2017 AG health monitoring
    AGs support the latest SQL2017 features of R services and Python

    Some disadvantages of AGs are:
    If you do a database restore this breaks replication for that database.  SQL2017 automatic seeding can help, but regular restores and AGs do not mix well
    Failover is hard to automate if you are using async replication
    Performance can be affected if you are using sync replication
    Any changes to msdb (eg Jobs) have to be implemented separately to all servers
    You need to set up some automation to restart replication in the event of a failover
    You can get false positives that a failover may be needed if your network is under pressure

    Some advantages of clustering are:
    All databases are automatically included in the scope of the failover providing they are on the cluster-aware drives
    Any changes to msdb only need to be done once
    Restoring a database has no downstream affect on other servers

    Some disadvantages of clustering are:
    A failover will always result in an outage while SQL Server gets going on the target node
    SQL Clustering is not compatible with the latest SQL2017 features of R services and Python

    If none of the above disadvantages affect you then I would recommend you use AGs.  As time passes it is likely that more new items will not be supported within SQL Clustering.  Otherwise go with the option that minimises the disadvantages for your situation and maximises the advantages.

    Whatever way you go, if you want to use SQL FineBuild to install and configure SQL Server then it can do what is needed.

    Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.

    When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara

  • Thanks EdVassie, DinoRS, Sue_H and Nassan.
    What about Sync vs Async? Does anyone have a good comparison of Sync vs Async Replication, as I am thinking about using Async Replication and my boss wants to go for a Sync Replication as a way to ensure automatic failover to occur.

  • Sync v async...  What do your SLAs say. 

    Is minimising failover time more important than minimising query performance.  Is your network up to supporting sync replication, which means no transient outages.  A network outage can mean a failover gets triggered when everything is automatic.  You then need automation in place to get replication back to the old primary restarted.  If another outage occurs before all nodes are synchronised you may get another failover attempted which would be unable to complete and maybe leaving everything down until a human can sort things out.

    My advice would be to start with async and monitor how often you get unexpected status changes.  If you have a couple of weeks without any happening then maybe you can support sync replication.  If they happen once or twice a week then you could find sync replication with automatic failover demands more stability than your network can give.

    The objective in this advice is to reduce risk.  With starting to use AGs you are going into a new way of doing things, and it will take time to gain the experience to run everything smoothly.  Personally I would want to reduce unexpected failovers during this time, and ideally permanently.  Even with planned failovers you will find situations you had not prepared for, until you have enough experience that they become part of your routine.  Unexpected failovers will throw things at you whenever they happen - at peak hours, or just after you got on the train home or at 2am.  Regardless of if you have automated or manual failover, you will need the same alerts to tell you something has happened that may need a failover.  Don't assume your first attempt at setting all this up will work as you need it to.  Make sure the lion has been fed before you put your head into its mouth.

    Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.

    When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara

  • Thanks again EdVassie,
    We don't have SLA at the moment( should have it around the New Year) .
    Based on review of a number of articles online ( and a number of other DBAs I talked to) most of the people are using Async replication. Is it a correct statement?
    I wonder if Sync replication is not being widely used because of the performance issues associated with it or from automatic failover ( and challenges of failing it back)?
    Or both?
    Can tweaking Timeout can help here - at least prevent failover from the soft errors?

  • inHouseDBA - Tuesday, July 31, 2018 10:12 PM

    Thanks again EdVassie,
    We don't have SLA at the moment( should have it around the New Year) .
    Based on review of a number of articles online ( and a number of other DBAs I talked to) most of the people are using Async replication. Is it a correct statement?
    I wonder if Sync replication is not being widely used because of the performance issues associated with it or from automatic failover ( and challenges of failing it back)?
    Or both?
    Can tweaking Timeout can help here - at least prevent failover from the soft errors?

    Where I work one of our systems uses two data-centers (DC), with two servers on each as part of a HA Group. The secondary that shares the DC with the primary is sync, so we have no data loss in the event of fail-over, and the fail-over can be automated (which works very well). We have only once had to make that secondary async, when an index rebuild on a very large table was causing a backlog of transactions on that secondary, thereby delaying the primary.
    The other DC is async, because there is a very slight lag on occasions.
    In our experience, the network is capable of handling the throughput efficiently enough that we can use sync, with that one exception.

  • With 2012 we initially used sync AGs for HA, and async AGs across our WAN for DR.  We experienced many HA failovers that should not have occurred...in fact most of our automated HA failovers should not have occurred.  Lots of MS incidents.  And contrary to prior comments, while AG failover is a bit faster than FCI failover, our apps experience an outage when we failover with either method; even with sync AG failover it takes time before all DBs recovered and ready on new primary node.  We typically see about 40sec (FCI) vs 25sec (AG) for failover.  The AG listener may be up, and clients able to connect to the new primary faster than that, but the DBs they need to access are not immediately ready.

    We've since moved to approach where we use FCIs for HA, async AGs for DR.  So far we like this better.  We've  yet to see an unnecessary HA failover (knock on wood), actual failover (usually planned) is rare enough and fast enough, and we're using less storage.  For us the added FCI requirement of shared disk is not a hurdle.  Another minor plus:  new DBs added to an FCI  (perhaps by installation of 3rd party app) automatically have HA (not true when using AGs for DR).

    We're not anti-AG, though.  For DR they work great for us, best solution we've ever had.

  • Thank you Steve and Mike,
    Steve,
    What do you mean by "and the fail-over can be automated (which works very well)."
    Can you please elaborate on that..

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply