AG recovery following disaster

  • Hi All.

    I'm looking for suggestions on the best way to recover from a disaster. Setup is an AG (asynchronous commit) on a two node cluster (one production, one DR). Forced failover with data loss following abrupt unplanned outage at production data centre. Have been in DR with production unavailable for a week. Lots of new data. Transactional log backups every 10mins and nightly full backups on DR (now the primary node).

    Is the best way to recover to backup DR and then restore to Production and re-create the AG ?

    Is it even possible to let the AG "sort itself out" following restoration of network link between production and DR or is the the fact that the transaction log has been backed up (and therefore parts of the log truncated / overwritten) preclude that ? I suppose the question here is how does AG actually work when replicating data back to production from DR? Does is only use the transaction log or are transaction log backups also used ?

    Thanks in advance,

    Baz

  • Thanks for posting your issue and hopefully someone will answer soon.

    This is an automated bump to increase visibility of your question.

  • Generally, the way to recover this would be to bring the AG back online and let it sort itself out. In fact, you should be seeing ever growing log sizes if the AG has been offline for that long. It'll keep transactions that haven't been transmitted. It ought to be able to recover. That's the design. If it doesn't, or you turned it all off because of the logs, then you'll have to go through a process of rebuilding it.

    I'd start with trying to just bring it back online and see what happens.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Thanks Grant.

    And where are these transactions held? Within the transaction log presumably? If so, do the transaction log backups that have been done not have an effect ?

    Assuming the above does not have an impact then the steps would be to .... when the production node comes back online, I would need to resume data movement on both the DR and Production nodes and then once the replication has finished, I could then failback both cluster and AG ?

    Thanks,

    Baz

  • MrBen wrote:

    Thanks Grant.

    And where are these transactions held? Within the transaction log presumably? If so, do the transaction log backups that have been done not have an effect ?

    Assuming the above does not have an impact then the steps would be to .... when the production node comes back online, I would need to resume data movement on both the DR and Production nodes and then once the replication has finished, I could then failback both cluster and AG ?

    Thanks,

    Baz

    If the database still exists in the AG - but data movement is suspended, the performing a transaction log backup cannot mark the transaction log as reusable until those transactions have been committed on the secondary.

    What is the current state of the AG and the state of each database?

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • In theory, it really is as simple as getting the production server back online, wait for the logs to get moved over, and then fail over back to production. The question is as Jeffrey says above, what's the state of the AG.

     

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply