Basic Always On Groups Randomly Stop Synchronizing

  • In one of my environments I have 3 pairs of Always On SQL 2022 Servers (CU18), Standard Edition with BAGs. I'm finding that randomly one or more of the BAGs stops synchronizing and the only way to resolve this is to pull the DB from the BAG and add it back again.

    The error information we have is shown here, and one BAG is NOT_HEALTHY but the other 3 are fine (only 2 of 4 BAGs shown). 2026-01-22_164233

    We thought it may be network related and for one pair of servers we removed what the sysadmins thought was the problem node, and added a new node with new IP Address, etc. But this didn't fix it. It's been suggested it may be a SAN issue, but then why only one database out of 4 on this cluster, and one out of several on the other clusters? If the SAN were an issue I would expect it to impact multiple BAGs. Also if it were the SAN, why is it always the Secondary that gives an issue. There are at least some days when there are Primary Nodes on the secondary environment and we don't get issues on the Primary nodes.

    I recall from the SQL 2012 days when IPv6 was a new thing that we were having issues and had to disable IPv6 to get a stable AG, but I was under the impression this was resolved and I've not seen any recent issues re. this.

    We've not found anything meaningful in the Cluster Manager Error log when the AG goes belly up, except "The Cluster service failed to bring the clustered role 'BAG_tlive' completely online or offline. One or more resources may be in a failed state. ...", which is not very enlightening. We can't identify the resource that "may be" in a failed state!

    At the moment I'm thinking of asking the sysadmins to disable IPv6! Are there any other things I could look at?

    Leo
    Nothing in life is ever so complicated that with a little work it can't be made more complicated.

Viewing post 1 (of 1 total)

You must be logged in to reply to this topic. Login to reply