SQL 2012 AlwaysOn Node/File Share Quorum Majority

  • According to my testing, people are in for a rude awakening with using AlwaysOn Availability groups with the default settings for certain scenarios. For instance, according to my testing of a primary SQL instance and a secondary SQL instance for reporting purposes, the default disk majority results in issues. If you shut down one of the servers, say you just reboot the secondary SQL server, you lose quorum and the primary becomes inaccessible! To remedy this, I believe the following is necessary but I want someone to confirm. Below are the two most common scenarios I see so let's walk through them.

    Scenario 1:

    SQL01 Primary

    SQL02 Reporting

    -For this setup, we could change the quorum to use node and fileshare majority. I would put the fileshare on a separate server that should be considered a "production" server. This way, when SQL02 is rebooted, we still have the quorum since SQL01 and the fileshare would both be up. If SQL01 is rebooted, the Availability group would failover to SQL02 without issue since the fileshare and SQL02 can both vote.

    Scenario 2:

    SQL01 Primary

    SQL02 Reporting

    SQL03 Disaster Recovery Server in another data center (async copy)

    -For this setup, we would set SQL03 to not have a vote at all. The reason being that in case the SQL03 DR server cannot reach the primary data center (SQL01 and SQL02), we would not want this to count as a failure since it might just be the network link.

    -After taking SQL03's vote away, we would set both SQL01 and SQL02 to have a vote, but also use fileshare on another "production" server that will be the tie breaker in case SQL02 were to reboot similar to scenario 1 above.

    -The only difference being, if we were to lose the primary data center (SQL01 and SQL02), then we would want to manually failover to the SQL03 DR server. Once the primary data center came back up, we would have to be careful with split brain activity where SQL01 and SQL02 think they are a cluster once they come back up, but SQL03 also thinks its is a cluster.

    Does this sound right? Any issues with this setup?

  • According to the Microsoft SQL Server AlwaysOn Solutions Guide for High Availability and Disaster Recovery, the way to determine the voting is as follows:

    Recommended Adjustments to Quorum Voting

    To determine the recommended quorum voting configuration for the cluster, apply these guidelines, in

    sequential order:

    1. No vote by default. Assume that each node should not vote without explicit justification.

    2. Include all primary nodes.Each node that hosts an AlwaysOn Availability Group primary replica or is

    the preferred owner of the AlwaysOn Failover Cluster Instance should have a vote.

    3. Include possible automatic failover owners.Each node that could host a primary replica or FCI, as

    the result of an automatic failover, should have a vote.

    4. Exclude secondary site nodes.In general, do not give votes to nodes that reside at a secondary

    disaster recovery site.You do not want nodes in the secondary site to contribute to a decision to

    take the cluster offline when there is nothing wrong with the primary site.

    5. Odd number of votes.If necessary, add a witness file share, a witness node (with or without a SQL

    Server instance), or a witness disk to the cluster and adjust the quorum mode to prevent possible

    ties in the quorum vote.

    6. Reassess vote assignments post-failover.You do not want to fail over into a cluster configuration

    that does not support a healthy quorum.

    For more information on adjusting node votes, see Configure Cluster Quorum NodeWeight

    Settings(http://msdn.microsoft.com/en-us/library/hh270281(SQL.110).aspx).

    You cannot adjust the vote of a file share witness. Instead, you must select a different quorum mode to

    include or exclude its vote.

    Note:SQL Server exposes several system dynamic management views (DMVs) that can help you

    administer settings related WSFC cluster configuration and node quorum voting.

    For more information, seeMonitor Availability Groups(http://

  • JJ-469859 (7/13/2012)


    According to my testing, people are in for a rude awakening with using AlwaysOn Availability groups with the default settings for certain scenarios. For instance, according to my testing of a primary SQL instance and a secondary SQL instance for reporting purposes, the default disk majority results in issues. If you shut down one of the servers, say you just reboot the secondary SQL server, you lose quorum and the primary becomes inaccessible! To remedy this, I believe the following is necessary but I want someone to confirm. Below are the two most common scenarios I see so let's walk through them.

    Scenario 1:

    SQL01 Primary

    SQL02 Reporting

    -For this setup, we could change the quorum to use node and fileshare majority. I would put the fileshare on a separate server that should be considered a "production" server. This way, when SQL02 is rebooted, we still have the quorum since SQL01 and the fileshare would both be up. If SQL01 is rebooted, the Availability group would failover to SQL02 without issue since the fileshare and SQL02 can both vote.

    If both nodes are on the same site you would typically use a shared disk resource, although a fileshare witness would also suffice. Are they both on the same site?

    JJ-469859 (7/13/2012)


    Scenario 2:

    SQL01 Primary

    SQL02 Reporting

    SQL03 Disaster Recovery Server in another data center (async copy)

    -For this setup, we would set SQL03 to not have a vote at all. The reason being that in case the SQL03 DR server cannot reach the primary data center (SQL01 and SQL02), we would not want this to count as a failure since it might just be the network link.

    -After taking SQL03's vote away, we would set both SQL01 and SQL02 to have a vote, but also use fileshare on another "production" server that will be the tie breaker in case SQL02 were to reboot similar to scenario 1 above.

    -The only difference being, if we were to lose the primary data center (SQL01 and SQL02), then we would want to manually failover to the SQL03 DR server. Once the primary data center came back up, we would have to be careful with split brain activity where SQL01 and SQL02 think they are a cluster once they come back up, but SQL03 also thinks its is a cluster.

    Does this sound right? Any issues with this setup?

    I think you may be confusing split brain. Split brain happens when all links to the cluster nodes are simultaneously down leaving subsets of cluster nodes in a position where each subset believes they are the only remaining functional set with quorum.

    In scenario 2 you do not need to remove the DR vote and substitute with a file share witness. If links separate between the 2 sites DR has 1 node of 3 active and so, no quorum, game over here. Primary site has 2 out of 3 votes this is sufficient quorum (over half the votes) and the services should continue.

    Multi site clusters demand a huge infrastructure, they cost time and money but provide a very high level of redundancy. You get what you pay for here.

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • Perry Whittle (7/13/2012)


    If both nodes are on the same site you would typically use a shared disk resource, although a fileshare witness would also suffice. Are they both on the same site?

    Yes, scenario 1 has both in the same site. Any issues with using fileshare vs. shared disk to cast the vote? More importantly, I am correct in my assumption that we have to use either a fileshare or shared disk to cast the tiebreaker vote correct? The default setup does not work how we want.

    Perry Whittle (7/13/2012)


    I think you may be confusing split brain. Split brain happens when all links to the cluster nodes are simultaneously down leaving subsets of cluster nodes in a position where each subset believes they are the only remaining functional set with quorum.

    In scenario 2 you do not need to remove the DR vote and substitute with a file share witness. If links separate between the 2 sites DR has 1 node of 3 active and so, no quorum, game over here. Primary site has 2 out of 3 votes this is sufficient quorum (over half the votes) and the services should continue.

    Multi site clusters demand a huge infrastructure, they cost time and money but provide a very high level of redundancy. You get what you pay for here.

    I realize that we technically could do without a fileshare or disk vote in scenario 2, but it might be a slightly better idea to revoke the DR vote so that the DR server does not contribute to any vote in case that link is down. Obviously this would only be an issue if the DR link/server were down and you went to reboot SQL02 for instance.

    For the split brain activity, isn't that what would be happening? For instance, say we are running fine on the primary data center SQL01/SQL02. All of a sudden our primary data center connections blow up and we are forced to manually bring up SQL03 in the DR facility. Now SQL03 is running everything. Then, someone fixes the connections in the primary facility and brings back up SQL01 and SQL02. Now SQL01 and SQL02 think they have a quorum. But so does SQL03 separately from SQL01/SQL02. Then at some later point, someone finally fixes the network link between the primary and DR facility. At this point I would assume we would just reinitialize the SQL01 and SQL02 instances from a current backup of SQL03 since that DR server would have updated data.

  • JJ-469859 (7/14/2012)


    I am correct in my assumption that we have to use either a fileshare or shared disk to cast the tiebreaker vote correct? The default setup does not work how we want.

    It depends on the number of nodes, in the 2 node scenario yes, you need an extra vote to create quorum.

    The default setup genrally does't, you have to make some re configuration (cluster setup usually informs you of this 😉 )

    JJ-469859 (7/14/2012)


    I realize that we technically could do without a fileshare or disk vote in scenario 2, but it might be a slightly better idea to revoke the DR vote so that the DR server does not contribute to any vote in case that link is down. Obviously this would only be an issue if the DR link/server were down and you went to reboot SQL02 for instance.

    Why would you want to start rebooting cluster nodes when you're cluster is already compromised? Work on getting the site back should be the priority. If you're DR site goes down you're severely exposed!

    JJ-469859 (7/14/2012)


    For the split brain activity, isn't that what would be happening? For instance, say we are running fine on the primary data center SQL01/SQL02. All of a sudden our primary data center connections blow up and we are forced to manually bring up SQL03 in the DR facility. Now SQL03 is running everything.

    What connections are you referring too, you have more than 1 network connection to your primary site, right?

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • Perry Whittle (7/14/2012)


    Why would you want to start rebooting cluster nodes when you're cluster is already compromised? Work on getting the site back should be the priority. If you're DR site goes down you're severely exposed!

    Let me put this another way. Say we use the default (no fileshare/disk vote) setup in this three node cluster with one node being in the DR data center. If we have scheduled maintenance on both SQL02 and SQL03, then SQL01 would be inaccessible. Although we would need to equally think about the fileshare/disk vote being inaccessible if that was under scheduled maintenance in the fileshare/disk vote setup as well. I see this playing a bigger role if you have multiple destinations in the DR data center since you don't want the DR servers casting a vote for issues when you only care about the primary data center casting votes.

    Perry Whittle (7/14/2012)


    What connections are you referring too, you have more than 1 network connection to your primary site, right?

    Yes, we have multiple connections to the primary site. I am thinking in terms of worst case scenario. If we are failing over to DR, we are in trouble in the primary site. If we do force a failover to DR, then we now have the potential for DR thinking it is a cluster and SQL01/SQL02 also thinking they are a cluster. I need to read up on what happens in this situtation. I would guess that you would make sure no users are connected to the application in the primary data center and you synchronize all three SQL Servers before bringing back up the primary data center to applications.

  • JJ-469859 (7/16/2012)


    Let me put this another way. Say we use the default (no fileshare/disk vote) setup in this three node cluster with one node being in the DR data center. If we have scheduled maintenance on both SQL02 and SQL03, then SQL01 would be inaccessible. Although we would need to equally think about the fileshare/disk vote being inaccessible if that was under scheduled maintenance in the fileshare/disk vote setup as well. I see this playing a bigger role if you have multiple destinations in the DR data center since you don't want the DR servers casting a vote for issues when you only care about the primary data center casting votes.

    That's just it, with a 3 node cluster you should be using the default which is node majority. You wouldn't ever take SQL02 and SQL03 down at the same time for maintenance as this would leave you severly exposed with the only failover partner and the DR server unavailable!!

    To be clear, possible Quorum types are;

    Windows 2008 - Understanding Quorum Configurations in a Failover Cluster


    Node Majority (recommended for clusters with an odd number of nodes)

    Can sustain failures of half the nodes (rounding up) minus one. For example, a seven node cluster can sustain three node failures.

    Node and Disk Majority (recommended for clusters with an even number of nodes)

    Can sustain failures of half the nodes (rounding up) if the disk witness remains online. For example, a six node cluster in which the disk witness is online could sustain three node failures.

    Can sustain failures of half the nodes (rounding up) minus one if the disk witness goes offline or fails. For example, a six node cluster with a failed disk witness could sustain two (3-1=2) node failures.

    Node and File Share Majority (for clusters with special configurations)

    Works in a similar way to Node and Disk Majority, but instead of a disk witness, this cluster uses a file share witness.

    Note that if you use Node and File Share Majority, at least one of the available cluster nodes must contain a current copy of the cluster configuration before you can start the cluster. Otherwise, you must force the starting of the cluster through a particular node. For more information, see "Additional considerations" in Start or Stop the Cluster Service on a Cluster Node.

    No Majority: Disk Only (not recommended)

    Can sustain failures of all nodes except one (if the disk is online). However, this configuration is not recommended because the disk might be a single point of failure.

    JJ-469859 (7/16/2012)


    Yes, we have multiple connections to the primary site. I am thinking in terms of worst case scenario. If we are failing over to DR, we are in trouble in the primary site. If we do force a failover to DR, then we now have the potential for DR thinking it is a cluster and SQL01/SQL02 also thinking they are a cluster. I need to read up on what happens in this situtation. I would guess that you would make sure no users are connected to the application in the primary data center and you synchronize all three SQL Servers before bringing back up the primary data center to applications.

    What are your failover scenarios? The most obvious are

    Scenario 1, power outage at primary.

    SQL01 and SQL02 are dead so there is no issue

    Scenario 2, catastrophic failure at primary site from terrorist threat (bomb, etc) or gas explosion, etc

    SQL01 and SQL02 are dead so there is no issue

    Scenario 3, total loss of network on primary site. You have failed over to DR, so you would shutdown all nodes on primary site until network links are back.

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • Perry Whittle (7/16/2012)


    That's just it, with a 3 node cluster you should be using the default which is node majority. You wouldn't ever take SQL02 and SQL03 down at the same time for maintenance as this would leave you severly exposed with the only failover partner and the DR server unavailable!!

    I think it is just a trade off. Microsoft's recommendation is to not have DR cast votes that could lead to thinking the primary data center is offline when it is not. Both ways should work. Here is a discussion where they revoke the vote on the DR server though http://blogs.msdn.com/b/saponsqlserver/archive/2012/02/20/sql-server-2012-alwayson-part-3-sap-configuration-with-two-secondary-replicas.aspx

    What are your failover scenarios? The most obvious are

    Scenario 1, power outage at primary.

    SQL01 and SQL02 are dead so there is no issue

    Scenario 2, catastrophic failure at primary site from terrorist threat (bomb, etc) or gas explosion, etc

    SQL01 and SQL02 are dead so there is no issue

    Scenario 3, total loss of network on primary site. You have failed over to DR, so you would shutdown all nodes on primary site until network links are back.

    Yes at this point it is just a matter of testing to make sure all scenarios work how I think they should.

    Below is what I think should be the quorum method used in each scenario. Let me know if these are correct. Fileshare could obviously be substituted with shared disk vote if desired

    All primary data center

    2 servers primary data center - use fileshare

    3 servers primary data center - use default node majority

    4 servers primary data center - use fileshare

    5 servers primary data center - use default node majority

    ....

    6, 8, 10, etc. even number use fileshare

    7,9,11, etc. odd number use default node majority

    Both primary and DR

    1 server primary, 1 server DR - use fileshare and revoke DR vote??? This one I am unsure about?

    2 server primary, 1 server DR - best recommendation is to revoke the DR vote and add a fileshare vote in the primary data center

    2 server primary, 2+ server DR - best recommendation is to revoke all DR votes and add a fileshare vote in the primary data center

    3 server primary, any# servers DR - revoke DR vote(s) and leave default node majority

    4 server primary, any# servers DR - revoke DR vote(s) and use fileshare

    ................

    5,7,9,etc. odd number of primary use default node majority, revoke DR votes

    6,8,10,etc. even number of primary use fileshare, revoke DR votes

  • JJ-469859


    Both primary and DR

    1 server primary, 1 server DR - use fileshare and revoke DR vote??? This one I am unsure about?

    If you're going to revoke the DR node in this scenario that leaves just 1 node in the cluster. I can't see there is a need for a disk or fileshare vote since the failure of the one node will leave you with no active cluster anyway. Something i didn't make clear earlier is that with multi site clusters the fileshare itself should ideally be on a separate site to rest of the nodes.

    I think you're focusing too much on revoking votes on offsite nodes, the hotfix you mention can be applied to stop actions on the DR site for instance, interferring with the Live site. Node weights and quorum configurations are designed to prevent the cluster split scenarios and this all depends on how many nodes and sites you have.

    Here's the link for the hotfix, it basically allows you to override the planning of your physical nodes to achieve the quorum type for your multi site scenario. In certain multi site configurations you might end up with the same amount of nodes on each site, in this scenario you could end up with split brain syndrome, by revoking votes from nodes on each site and leaving one site "vote heavy" you'll always ensure that split brain doesn't occur.

    Take this scenario

    Site 1 = Tokyo

    ClusNode1

    ClusNode2

    Site 2 = Sydney

    ClusNode3

    ClusNode4

    Site 3 = London

    ClusNode5

    ClusNode6

    Site 4 = Paris

    ClusNode7

    ClusNode8

    You place a fileshare on Site 1 and this gives you quorum, the whole of Europe disappears and you lose Site 3 and Site 4, you still have quorum. What happens if you lose Site 1, where the fileshare vote is and Site 3?

    Same principal but different deployment

    Site 1 = Tokyo

    ClusNode1

    ClusNode2

    Site 2 = Sydney

    ClusNode3

    ClusNode4

    Site 3 = London

    ClusNode5

    ClusNode6

    Site 4 = Paris

    ClusNode7

    ClusNode8

    Site 5 = Head office Rome

    Fileshare witness

    What happens if you lose Site 3 and Site 4?

    What if there were no Site 5 and you lost Site 3 and Site 4?

    Lastly, same principal but different deployment

    Site 1 = Tokyo

    ClusNode1

    ClusNode2

    Site 2 = Sydney

    ClusNode3

    ClusNode4 [vote revoked]

    Site 3 = London

    ClusNode5

    ClusNode6 [vote revoked]

    Site 4 = Paris

    ClusNode7

    ClusNode8 [vote revoked]

    What happens if you lose Site 3 and Site 4 now?

    There are many ways to set up multi site clusters and it's dependant upon the detail I mentioned above

    How many nodes?

    How many sites?

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • Yes I agree that it could come in handy to add a third data center fileshare vote. I see where you are going with this and it all makes sense on what to use. For the majority of the time, a simple one data center (primary) or two data center (primary and DR) question can break down to generally:

    --Even number of total nodes should be fileshare

    --Odd total number of nodes can be default node majority.

    We would just make sure the primary has an additional vote either by node or by fileshare. Then I could failover to DR manually only once I have determined that I want to go that route. The only change to this would be with the possibility of throwing in the denial of DR votes and then going from there if desired. Throwing in more data centers also requires more forethought.

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply