dynamic quorum

  • We have a two node windows cluster with file share witness. We had to reboot secondary replica and for some reason we lost connection to file share at the same time. Suddenly, the cluster service was shutdown and AG went to offline. It immediately recovered. With windows server 2012 dynamic quorum setup how the cluster got shutdown with one node up and running. How was this possible? I am not able to follow. We tested on test servers with just two nodes and haven't experienced any issues with the reboot of one node. After we introduce file share witness why this issue popping up. Can someone please explain this behavior?

  • muthyala_51 (8/30/2015)


    We have a two node windows cluster with file share witness. We had to reboot secondary replica and for some reason we lost connection to file share at the same time. Suddenly, the cluster service was shutdown and AG went to offline. It immediately recovered. With windows server 2012 dynamic quorum setup how the cluster got shutdown with one node up and running. How was this possible? I am not able to follow. We tested on test servers with just two nodes and haven't experienced any issues with the reboot of one node. After we introduce file share witness why this issue popping up. Can someone please explain this behavior?

    What operating system have you deployed to your cluster nodes?

    Is this a multi site cluster?

    Where is your fileshare witness located?

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" πŸ˜‰

  • What operating system have you deployed to your cluster nodes?

    Is this a multi site cluster?

    Where is your fileshare witness located?

    Windows server 2012 R2 Standard

    yes, it's a multi site cluster with two nodes in one data center and another in a different data center. We disabled vote for the node in other data center.

    fileshare is location on domain controller.

  • muthyala_51 (8/31/2015)


    Windows server 2012 R2 Standard

    yes, it's a multi site cluster with two nodes in one data center and another in a different data center. We disabled vote for the node in other data center.

    fileshare is location on domain controller.

    Your original post says you have a 2 node cluster but you have a 3 node cluster. Please ensure to post accurate details in future.

    Further questions are

    Is the prod cluster virtual or physical?

    On which site is the fileshare witness located?

    Does the test cluster span cross site or same site?

    Are they virtual?

    You should understand that dynamic quorum is not an end all solution, you must still configure a suitable quorum configuration for your cluster topology.

    During planned shutdowns dynamic quorum will recalculate and keep the cluster online. For random failures the dynamic quorum will generally be unable to react quick enough and this is why you should still configure quorum appropriately.

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" πŸ˜‰

  • Your original post says you have a 2 node cluster but you have a 3 node cluster. Please ensure to post accurate details in future.

    Further questions are

    Is the prod cluster virtual or physical?

    On which site is the fileshare witness located?

    Does the test cluster span cross site or same site?

    Are they virtual?

    You should understand that dynamic quorum is not an end all solution, you must still configure a suitable quorum configuration for your cluster topology.

    During planned shutdowns dynamic quorum will recalculate and keep the cluster online. For random failures the dynamic quorum will generally be unable to react quick enough and this is why you should still configure quorum appropriately. On test clusters we haven't configured file share witness.

    The reason why i said two is because the other data center node doesn't participate in the voting system. We manually disabled it following the msft recommendation. Cluster is physical. The file share exists on same site as the two nodes which contribute to quorum. Test servers are virtual and they are on the same site.

    Perry,

    Also, you discussed the same topic before. But still I was not able to follow your logic behind enabling the vote for node which is in different data center other than primary replica.

    http://www.sqlservercentral.com/Forums/Topic1329593-2799-1.aspx

    Thanks,

  • muthyala_51 (9/2/2015)


    The reason why i said two is because the other data center node doesn't participate in the voting system. We manually disabled it following the msft recommendation.

    But it still participates in the cluster!!

    To which msft recommendation are you referring?

    muthyala_51 (9/2/2015)


    Cluster is physical. The file share exists on same site as the two nodes which contribute to quorum.

    Why on earth take a vote from a perfect healthy cluster node and create a file share witness on the same site as existing voting nodes?

    This shows a distinct lack of misunderstanding for quorum configurations, you haven't acknowledged my previous post. Do you understand that dynamic quorum has a limitation?

    muthyala_51 (9/2/2015)


    Test servers are virtual and they are on the same site.

    i wouldn't be relying on virtual and physical to act in a similar fashion, there are too many other factors involved.

    muthyala_51 (9/2/2015)


    Perry,

    Also, you discussed the same topic before. But still I was not able to follow your logic behind enabling the vote for node which is in different data center other than primary replica.

    http://www.sqlservercentral.com/Forums/Topic1329593-2799-1.aspx

    Thanks,

    You've misunderstood the article and quorum configurations and maybe need to re read it

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" πŸ˜‰

  • Perry,

    I was referring to this document, on disabling the vote system--

    https://msdn.microsoft.com/en-us/library/hh270280.aspx

    Exclude secondary site nodes. In general, do not give votes to WSFC nodes that reside at a secondary disaster recovery site. You do not want nodes in the secondary site to contribute to a decision to take the cluster offline when there is nothing wrong with the primary site.

  • muthyala_51 (9/2/2015)


    Perry,

    I was referring to this document, on disabling the vote system--

    https://msdn.microsoft.com/en-us/library/hh270280.aspx

    Exclude secondary site nodes. In general, do not give votes to WSFC nodes that reside at a secondary disaster recovery site. You do not want nodes in the secondary site to contribute to a decision to take the cluster offline when there is nothing wrong with the primary site.

    Yep you need to read in detail and understand quorum, this does not apply to all scenarios. In my opinion it's a badly written kb and a little misleading.

    A question for you

    You have nodea and nodeb on site 1 and node c on site 2, you have configured quorum for Majority Node Set, each node has a vote

    The network connection between the sites is severed, what happens to your cluster roles and the cluster node states on

    • site 1
    • site 2

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" πŸ˜‰

  • A question for you

    You have nodea and nodeb on site 1 and node c on site 2, you have configured quorum for Majority Node Set, each node has a vote

    The network connection between the sites is severed, what happens to your cluster roles and the cluster node states on

    • site 1
    • site 2

    The cluster will not have any impact even with severed connection as we just lost one vote. node b will be down in the cluster which should not have any impact on cluster.

  • muthyala_51 (9/2/2015)


    Perry,

    I was referring to this document, on disabling the vote system--

    https://msdn.microsoft.com/en-us/library/hh270280.aspx%5B/quote%5D

    I assume you're referring to this section of the above link

    microsoft WSFC quorum


    In order to simplify your quorum configuration and increase up-time, you may want to adjust each node’s NodeWeight setting so that the node’s vote is not counted towards the quorum.

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" πŸ˜‰

  • muthyala_51 (9/2/2015)


    A question for you

    You have nodea and nodeb on site 1 and node c on site 2, you have configured quorum for Majority Node Set, each node has a vote

    The network connection between the sites is severed, what happens to your cluster roles and the cluster node states on

    • site 1
    • site 2

    The cluster will not have any impact even with severed connection as we just lost one vote. node b will be down in the cluster which should not have any impact on cluster.

    Exactly, so in your configuration why have you removed a vote from a perfectly suited node and then gone to all the hassle of deploying a fileshare witness on the primary site. Your default configuration for your number\placement of nodes was fine.

    The kb you linked, is there to advise in situations where a geographically dispersed cluster ends up with a vote heavy DR site. With such a configuration primary site outages would be a real issue. To combat this design, good or bad, you may remove votes from nodes on a DR site to avoid having vote heavy DR.

    The guide is not a one size fit all, it has specific circumstances that it applies to, you're not the first to misconstrue that article and you won't be the last.

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" πŸ˜‰

  • Exactly, so in your configuration why have you removed a vote from a perfectly suited node and then gone to all the hassle of deploying a fileshare witness on the primary site. Your default configuration for your number\placement of nodes was fine.

    The kb you linked, is there to advise in situations where a geographically dispersed cluster ends up with a vote heavy DR site. With such a configuration primary site outages would be a real issue. To combat this design, good or bad, you may remove votes from nodes on a DR site to avoid having vote heavy DR.

    The guide is not a one size fit all, it has specific circumstances that it applies to, you're not the first to misconstrue that article and you won't be the last.

    Thanks Perry. The another reason why we removed the vote from the quorum is that we often get alerts from that node that it's not healthy or it not available due to the monitor server not able to communicate with it. If this is the case, let's suppose when we rebooted one of the node in the same data center and we had the same network issue during that time we will be in the same loop of cluster going down and coming back as soon as it communicates with node in different data center. Don't you think we have to be ready for this kind of situation or you are expecting the quorum to be up on the last node standing if we change it to node majority?

  • muthyala_51 (9/3/2015)


    get alerts from that node that it's not healthy or it not available due to the monitor server not able to communicate with it.

    Which monitor server?

    muthyala_51 (9/3/2015)


    If this is the case, let's suppose when we rebooted one of the node in the same data center and we had the same network issue during that time we will be in the same loop of cluster going down and coming back as soon as it communicates with node in different data center. Don't you think we have to be ready for this kind of situation or you are expecting the quorum to be up on the last node standing if we change it to node majority?

    You're still missing what I've been saying

    Firstly, if you have one node in DR with a vote and 2 in primary and you reboot a node on primary site then assuming no other issues you'll still have quorum. If you really have a network between sites that is that unstable shouldn't you be resolving that issue, this won't just affect the WSFC but possibly client connectivity too.

    Also, note:

    For a reboot, dynamic quorum will be able to quickly recalculate and adjust so quorum will be maintained, node failures are more of an issue and expose you to cluster outages as the dynamic quorum doesn't have chance to react.

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" πŸ˜‰

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply