cluster node quarantined

  • evening All

    hoping someone can shed some light - I have a (originally) 2 node windows server 2019 cluster with sql server 2019 ent serving almost 60TB of databases.

    We added another pair of nodes into the cluster - same specs.

    We've had issues with the cluster and since the addition of these newer 2 - we have had nodes being kicked out of the cluster and sometimes quarantined.

    I've not been able to maintain some of the AGs. The wintel guys are combing the cluster and have instituted a work plan and have deprecated some of the add-on software (like SEP) and removed them.

    Now - this evening I have been trying to rebuild one of the AGs (primary on one of the newer nodes and taking a backup (database is over a TB) and when I come back to check on the back up the nodes has been quarantined. I then check and go back through the logs and scheduled tasks and it looks like another node where I do a backup of 2 databases to restore onto a DEV server at 5AM - this node gets kicked out of the cluster at 5:07AM

    It's too loose to positively say this is the problem, but it's too much of a coincidence to ignore - however looking at google I've not been able to find anything on or similar to this,

    I need some examples before I can say this is a possibility

    Can anyone point me to any evidence that the SQL backups (compressed) are/may be causing the cluster to be kicked out - and if so why?

    thanks

  • Have you checked the cluster logs, export them via Powershell and inspect them for any errors, this link details the Get-ClusterLog cmdlet

    https://learn.microsoft.com/en-us/powershell/module/failoverclusters/get-clusterlog?view=windowsserver2022-ps

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

Viewing 2 posts - 1 through 1 (of 1 total)

You must be logged in to reply to this topic. Login to reply