I have total of 11 years of IT experience with Application development, Database Development and Database Administration. I have worked with different version of SQL server from 7.0 to 2008.Started my carrier as VB ,VC++ and database developer in a banking sector for implementing their core banking solution. Currently working as Database Administrator with wide knowledge in performance tuning, high availability solution, troubleshooting and server monitoring. This blog is my humble attempt to share my knowledge and what I learned from my day to day work.
In my earlier post , I have explained about the windows cluster and how Sql server works on cluster environment. In this post let us try to understand the quorum settings of windows cluster environment. When I say quorum, do not interpret as quorum disk. Quorum has literal meaning in the cluster environment. In this post I will use the word witness disk to refer the quorum disk.Let us see what are all the possible quorum settings and how it will affect the windows cluster.
As per Wikipedia, quorum is the minimum number of members of a deliberative assembly necessary to conduct the business of that group. In short quorum is minimum number of votes required for majority.As I explained in my earlier post, the nodes participating in the windows cluster are connected through a private network and communicate through User Datagram Protocol (UDP) port 3343.The quorum configuration in a failover cluster determines the number of failures (failure of nodes) that the cluster can sustain while still remain online. If additional failure happened beyond this threshold, the cluster will stop running.Quorum is designed to handle the Split Brain scenario. When nodes are unable to communicate each other, each node assume that, resource groups owned by other nodes have to brought online. When same resource brought online on multiple nodes at the same time,data corruption can occur. This scenario is called Split Brain.
Let us assume that we have four node cluster and one instance of sql server is running on each node. Node1 and Node2 lost the communication with Node3 and Node4. Node1 and Node2 can communicate each other and Node3 and Node4 can communicate each other. In this scenario each group does not know what happened to other two nodes. Are they offline or just a communication failure ?. In this scenario, Node1 and Node2 try to bring online the Sql instance(resource) owned by Node3 and Node4. In the same way Node3 and Node4 will try to bring online the Sql instance (resource) owned by the Node1 and Node2, which will lead to disk corruption and many other issues.The windows cluster quorum setting is designed to prevent this kind of scenario.By having the concept of quorum, the cluster will force the cluster service to stop in one of the subsets of nodes to ensure that there is only one true owner for the particular resource group.
Having quorum (majority) is based on the voting algorithm where more than half of the voters must be online and able to communicate each other. The cluster knows how many node are used to form the the cluster and will know how many votes constitutes a quorum. If the number of votes drop below the majority, the cluster service will stop on the nodes of that group.Cluster requires more than half of the total votes to achieve the quorum.This is to avoid the tie in the number of votes. In a 8 node cluster , 5 voters must be online and able to communicate each other to have quorum. Because of this logic, it is recommended to always have an odd number of total voters in the cluster and the quorum setting define the the voters in a cluster.This does not necessarily mean an odd number of nodes is needed to form the cluster since both a witness disk (quorum disk) and a file share can contribute a vote, depending on the quorum settings.
Windows 2008 cluster supports four quorum models.
1 Node Majority
2 Node and Disk Majority
3 Node and File Share Majority
4 No Majority (disk only)
Node Majority: Node majority option is recommended for cluster with odd number of nodes.This configuration can handle a loss of half of the number of cluster nodes rounded off downwards. For example , a five node cluster can handle failure of two nodes. In this scenario three of the nodes (N1,N2,N3) can communicate each other but other two(N4 and N5) are not able to communicate. The group constituted by three node have the quorum (majority) and cluster will remain active and cluster service will be stopped on the other two nodes (N4 and N5). The resource group (sql server instance) hosted on that two nodes goes offline and come online on one of the three nodes based on possible owner settings.
Node and Disk Majority: This option is recommended for cluster with even number of nodes.In this configuration every node gets one vote and witness disk (quorum disk) gets one vote which makes total votes a odd number. The witness disk is a small ( approx 1 GB ) clustered disk.This disk is highly available and can failover between nodes. It is considered as part of the cluster core resource group.In a four node cluster, if there is a partition between two subsets of nodes, one of the subset will have witness disk and that subset will have quorum and cluster will remain online. This means that the cluster can lose any two voters,whether they are two nodes or one node and the witness disk.
Node and File Share Majority: This configuration is similar to the the Node and Disk Majority, but in this case the witness disk is replaced with a file share which is also known as File Share Witness Resource (FSW). This quorum configuration usually used in multi-site clusters (nodes are in different physical location) or where there is no common storage. The File Share Witness resource is a file share in any server in the same active directory which all the cluster nodes have access to. One of the node in the cluster will place a lock on the the file share to consider that node as owner of the file share.When this node goes offline or lost the connectivity another node grabs the lock and own the file share.On a standalone sever, the file share is not highly available , however the file share can also put on a clustered file share on an independent cluster,making the FSW clustered and giving it the ability to fail over between node. It is important that, this file share should not put in a node of the same cluster, because losing that node would cause for loosing two votes. A FSW does not store cluster configuration data like witness disk. It contain information about which version of the cluster configuration database is most recent.
No Majority (Disk only) : This configuration was available in windows server 2003 and has been maintained for compatibility reason and it is highly recommended not to use this configuration. In this configuration,only witness disk has a vote and there are no other voters in the cluster. That means if all nodes are online and able to communicate , but when witness disk failed or corrupted, the entire cluster will go offline.This is considered as single point of failure.
Hope you got a fair idea about various quorum settings available in windows 2008 cluster.