August 19, 2025 at 10:59 am
I have a 2-node windows cluster(SRV1 & SRV2) with Azure Storage account witness and AlwaysON configured. SRV1 is primary and SRV2 is secondary. During the patching activity we failed over the availability group from primary to secondary(SRV1 to SRV2). Failover happened successfully. Then we rebooted the old primary i.e. SRV1 (which is secondary now). Reboot happened successfully but when SRV1 came back up we noticed that in SRV2 CPU spiked to 100%. Clussvc.exe consumed the maximum cpu around 70-80%. As a result SRV2(now primary) choked out and databases were inaccessible. After few minutes automatically cpu consumption came back to normal in SRV2. I am not sure what is happening? What could be the possible reasons for cpu spike in the primary node when we reboot the secondary node. It is causing business disruption.
Anyone faced such kind of issues? This is happening every time when we perform the patching activity.
August 20, 2025 at 11:10 am
Thanks for posting your issue and hopefully someone will answer soon.
This is an automated bump to increase visibility of your question.
August 26, 2025 at 12:18 pm
I haven't managed an AG for a while so I don't know if this will help:
If I remember correctly there is a log for the availability group which can be viewed through SSMS and a log file for the Windows Cluster in the Windows cluster manager. Is there anything in those log files that might indicate the cause?
September 1, 2025 at 7:53 am
No there is noting indicative which we found the in the logs. It only contains generic messages of failover and timeouts.
Viewing 4 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply