Availability Group Failover Stops Working After First Failure.

Question

Post reply

Availability Group Failover Stops Working After First Failure.

Rogman

Ten Centuries

Points: 1332
More actions
April 21, 2014 at 10:09 am

#288148

Hello-
I've setup a two node Cluster Server (non-shared storage) with a file sharing witness. I'm testing some of the different failover scenarios to see that everything is working properly. Everything works fine until I try testing the failure of the SQL Server service. When I stop the SQL Service on the primary server, it fails over to the secondary server as expected. I then start the service on the (now) secondary server and it comes back online as the secondary server. I then try to test that the service will fail back over when I stop the service on the new primary server. However, when I stop the service, the secondary server now shows "resolving" and never comes back online. When I bring the service back up on the primary server, the secondary now shows as secondary instead of resolving. So to see if it's something about failing over from one server to another, I do a manual failover making the original primary server the primary again and everything is as it was originally. I then stop the service on the primary server, but the secondary server now says resolving and the AG will not become available again until I start the service on the primary server.
It seems that when I first configured the quorum it worked fine the first failover scenario, then stopped working. I then added the file sharing witness, and failover worked the first time again, but not after that. For some reason after the initial failover it won't automatically failover again after that.
Any ideas on why this might be happening?
Thanks in advance.
Config:
Servers: Windows Server 2012 Standard
SQL : SQL Server 2012 Enterprise SP1

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

Matt Crowley SSCarpal Tunnel Points: 4163 More actions · Answer 1

Are the build numbers for both sides of the cluster the same? This should be an immediate failure in the failover.

Also, is the (once) primary in synchronous mode? Failover partners can not be asynchronous.

Rogman Ten Centuries Points: 1332 More actions · Answer 2

Thank for the reply Matt-

Both servers are running SQL Server 11.00.3128 and both servers are running the same service pack levels and software updates. I verified this again through the Validate a Configuration Wizard.

Both nodes are set to Automatic Failover with Synchronous commit.

Any other ideas?

Rogman Ten Centuries Points: 1332 More actions · Answer 3

Well I figured it out. Apparently it will only fail over once by default within a six hour period. I upped it to three and could fail over back to the original primary from the secondary. Not sure why this doesn't apply when the server hard fails, but that's for a later time.

Thanks!

Perry Whittle SSC Guru Points: 233850 More actions · Answer 4

this is the default setting for all cluster resources, just something to be aware of

-----------------------------------------------------------------------------------------------------------

"Ya can't make an omelette without breaking just a few eggs" 😉

Rogman Ten Centuries Points: 1332 More actions · Answer 5

Yep I have that listed on my implementation sheet for when we build the production system.

Thanks