WSFC Lost Quorum votes due to network problem

araujochristian, 2018-06-18 (first published: 2018-06-05)

Problem:

Early in the morning I received emails requesting for support as the applications were failing to connect to one of the production servers that are configured with Always On, I asked for an specific error and the customer sent me this:

---------------------------

Error while updating the backup server list: A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: Named Pipes Provider, error: 40 - Could not open a connection to SQL Server)

---------------------------

After checking with one of the IT managers he confirmed that one of the database servers was going through some networking issues, however that didn't explain why the SQL server didn't automatically failover to the other node that was up and running, as both servers are part of the same WSFC as required to configure Synchronous availability groups with automatic failover configured.

So after getting into the available replica the troubleshoot began.

The first thing I noticed was:

After also checking the availability group dashboard which also confirmed the error state I went to the Failover cluster manager and noticed the following errors:

Also noticed that the node that was under the issues was showing that the Failover cluster state was stopped. Obviously if the server was not able to be reached through the network the cluster wouldn't be able to show it as healthy or running. But because lf that and the fact that there was no third witness the entire cluster state was failed and due to that the availably group and Listener were down as well, so what was needed? getting the cluster up and running even if one node is missing, but how would you do that? Well you need to start the cluster without the quorum validation, with that you will have the WSFC running and in return the availability group and listener will be up as well.

But wait, I'm a DBA that means that my experience with managing clusters at the windows level is minimal. If that is the case you have 2 options call IT to get the assistance you need or do it yourself, In my case was the second, if you choose the same, in the following URL are the required steps
https://docs.microsoft.com/en-us/sql/sql-server/failover-clusters/windows/force-a-wsfc-cluster-to-start-without-a-quorum?view=sql-server-2017

In my particular case the steps with the windows interface didn't work but the PowerShell ones did.
This got my cluster running, however my Availability group still had as primary the failed server, so I still had to failover the group to the one working, I achieved that by forcing the failover with this command:

ALTER AVAILABILITY GROUP <DAG NAME> FORCE_FAILOVER_ALLOW_DATA_LOSS

I know, it is scary when you see "Allow data loss" however, I had Synchronous commit set, so worst case scenario, I would be loosing the transactions that weren't committed at the time the primary server went offline so in the end that wasn't even written in the Primary database and it was safe to roll them back if needed.

That brought my databases online and the applications could resume working.

After fixing all the network issues and noticing that you are able to connect to your secondary replica execute this command to restart the synchronization, make sure that you update it for each of your databases.
ALTER DATABASE database_name SET HADR RESUME

And bam, you are all set, your cluster and Database availability group is back to the state it was before the failure.

As a lesson learned, add a witness to a file server, so that you don't loose your Quorum and prevent this issue to happen.

Thanks for reading!

Book Review: Big Red - Voyage of a Trident Submarine

by Andy Warren

SQLServerCentral.com

Blogs

I've grown up reading Tom Clancy and probably most of you have at least seen Red October, so this book caught my eye when browsing used books for a recent trip. It's a fairly human look at what's involved in sailing on a Trident missile submarine...

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-03-10

1,439 reads

Database Mirroring FAQ: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup?

by Robert Davis

SQLServerCentral.com

Blogs

Question: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup? This question was sent to me via email. My reply follows. Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup? Databases to be mirrored are currently running on 2005 SQL instances but will be upgraded to 2008 SQL in the near future.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-23

1,567 reads

Inserting Markup into a String with SQL

by Phil Factor

SQLServerCentral.com

T-SQL

In which Phil illustrates an old trick using STUFF to intert a number of substrings from a table into a string, and explains why the technique might speed up your code...

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-18

1,631 reads

Networking - Part 4

by Andy Warren

SQLServerCentral.com

Blogs

You may want to read Part 1 , Part 2 , and Part 3 before continuing. This time around I'd like to talk about social networking. We'll start with social networking. Facebook, MySpace, and Twitter are all good examples of using technology to let...

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-17

1,530 reads

Speaking at Community Events - More Thoughts

by Andy Warren

SQLServerCentral.com

Blogs

Last week I posted Speaking at Community Events - Time to Raise the Bar?, a first cut at talking about to what degree we should require experience for speakers at events like SQLSaturday as well as when it might be appropriate to add additional focus/limitations on the presentations that are accepted. I've got a few more thoughts on the topic this week, and I look forward to your comments.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-13

360 reads

WSFC Lost Quorum votes due to network problem

Rate

Share

Share

Rate

WSFC Lost Quorum votes due to network problem

Rate

Share

Share

Rate

Related content

Book Review: Big Red - Voyage of a Trident Submarine

Database Mirroring FAQ: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup?

Inserting Markup into a String with SQL

Networking - Part 4

Speaking at Community Events - More Thoughts