Cluster failed over, now we see critical errors but DBs are all "Synchronized"

  • Our cluster failed over from Server1 to Server2 during a patching reboot. So now Server2 is Primary and Server1 is Secondary.

    I'm not 100% familiar with Availability groups and replicas, so am looking for some help to resolve the issue

    In SSMS -> Server -> Databases on both Server1 and Server2, all DBs are showing as (Synchronized), but in the Availability Group dashboard I see Issues about one database in a "Not Synchronising" state.

    Critical: The availability group is not ready for automatic failover

    Warning: The data synchronization state of this availability database is unhealthy

    Warning: Either a database administrator or the system has suspenced data syncrhonization on this availability database

    After googling these messages, these are some things I've looked at and tried:

    I've attempted to "Resume data movement" for the affected database, but the only option is to "Suspend.."

    Disk space is all plentiful

    Nothing for the Availability Group in the Application Event log

    How can I resolve this? I feel like I need to take a backup of the broken database from the current primary then restore it to the secondary but I'm not sure which process to follow for this. Also, could there be an underlying reason why one of my databases is failing to Synchronize correctly?

    Thanks

  • Without knowing a lot more about how the system and that database came to be in that state - the easiest method to clean it up would be to re-add that database to the availability group.

    The simplest method would be to ensure the seeding mode for both replicas is set to automatic.  Once that has been done, remove the database on the now primary from the availability group.  Once it has been removed - then delete the database from the now secondary replica (should be showing a state of restoring).

    Once the database has been removed from the secondary - re-add the database to the availability group on the primary.  SQL Server will then start the backup/restore process in the background and add that database back to the availability group.

    Note: if the database is very large - in the hundreds of GB's or larger, then using automatic seeding will take a long time and will be system intensive.  If that is an issue or concern, then you can use a manual seeding mode - where you backup the database, restore it to the secondary - apply transaction logs up to current point in time - and then join the database.

    For future reference:

    When patching the servers involved in an AG - patch and restart all secondary replicas first.  Once all secondary replicas have been patched and restarted, manually failover to one of the secondary replicas.  Verify all databases have recovered on all secondary replicas - then patch and restart the last node.  Once that node is back up and available - you can manually fail back (not required - you can continue on the new primary node).

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

Viewing 2 posts - 1 through 1 (of 1 total)

You must be logged in to reply to this topic. Login to reply