CPU getting maxed out while rebooting the Secondary Node in AlwaysON AG Cluster

  • I have a 2-node windows cluster(SRV1 & SRV2) with Azure Storage account witness and AlwaysON configured. SRV1 is primary and SRV2 is secondary. During the patching activity we failed over the availability group from primary to secondary(SRV1 to SRV2). Failover happened successfully. Then we rebooted the old primary i.e. SRV1 (which is secondary now). Reboot happened successfully but when SRV1 came back up we noticed that in SRV2 CPU spiked to 100%. Clussvc.exe consumed the maximum cpu around 70-80%. As a result SRV2(now primary) choked out and databases were inaccessible. After few minutes automatically cpu consumption came back to normal in SRV2. I am not sure what is happening? What could be the possible reasons for cpu spike in the primary node when we reboot the secondary node. It is causing business disruption.

    Anyone faced such kind of issues? This is happening every time when we perform the patching activity.

    • This topic was modified 3 weeks, 3 days ago by Deepam Ghosh.
  • Thanks for posting your issue and hopefully someone will answer soon.

    This is an automated bump to increase visibility of your question.

  • I haven't managed an AG for a while so I don't know if this will help:

    If I remember correctly there is a log for the availability group which can be viewed through SSMS and a log file for the Windows Cluster in the Windows cluster manager. Is there anything in those log files that might indicate the cause?

     

  • No there is noting indicative which we found the in the logs.  It only contains generic messages of failover and timeouts.

Viewing 4 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply