CPU getting maxed out while rebooting the Secondary Node in AlwaysON AG Cluster

Question

Post reply

CPU getting maxed out while rebooting the Secondary Node in AlwaysON AG Cluster

Deepam Ghosh

Mr or Mrs. 500

Points: 551
More actions
August 19, 2025 at 10:59 am

#4637573
I have a 2-node windows cluster(SRV1 & SRV2) with Azure Storage account witness and AlwaysON configured. SRV1 is primary and SRV2 is secondary. During the patching activity we failed over the availability group from primary to secondary(SRV1 to SRV2). Failover happened successfully. Then we rebooted the old primary i.e. SRV1 (which is secondary now). Reboot happened successfully but when SRV1 came back up we noticed that in SRV2 CPU spiked to 100%. Clussvc.exe consumed the maximum cpu around 70-80%. As a result SRV2(now primary) choked out and databases were inaccessible. After few minutes automatically cpu consumption came back to normal in SRV2. I am not sure what is happening? What could be the possible reasons for cpu spike in the primary node when we reboot the secondary node. It is causing business disruption.
Anyone faced such kind of issues? This is happening every time when we perform the patching activity.
- This topic was modified 11 months ago by Deepam Ghosh.

Viewing 4 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply

Site Owners SSC Guru Points: 80223 More actions · Answer 1

Thanks for posting your issue and hopefully someone will answer soon.

This is an automated bump to increase visibility of your question.

as_1234 SSCrazy Points: 2887 More actions · Answer 2

I haven't managed an AG for a while so I don't know if this will help:

If I remember correctly there is a log for the availability group which can be viewed through SSMS and a log file for the Windows Cluster in the Windows cluster manager. Is there anything in those log files that might indicate the cause?

Deepam Ghosh Mr or Mrs. 500 Points: 551 More actions · Answer 3

No there is noting indicative which we found the in the logs. It only contains generic messages of failover and timeouts.