AG Readable Secondary Setup blocked Primary

  • We currently have a single db (approx 400Gb) setup in a sync AG group and had a new requirement to setup a remote readable secondary. 
    The remote setup only had 10Mbs bandwidth and 175IOPs available. At the end of the log file restore the AG group then hung and caused all applications to hang etc. The business asked for rollback 5mins in to the outage. The readable secondary was setup with async mode so I'm wondering why this lag at the secondary also caused the primary to effectively hang for so long?

    We are addressing the resource limits at the remote site as that clearly wasn't sufficient but just wondered what other folks experience is of setting up readable secondary as remote site? i.e. remote to our primary data centre.

    Many thanks!

  • So if I'm understanding correctly, this was while the database was still being restored to the new replica? Did you use direct seeding to do the restore?

    What does it say in the error log?

    (Sync/async mode won't make any difference until the restore has finished.)

  • This was after the database restore and at the end of the log restore. Only thing in the logs are lots and lots of IO waits. I did not use direct seeding.

    Nothing in the error logs to indicate a failure or error, just the IO messages like:
    SQL Server has encountered 5470 occurrence(s) of I/O requests taking longer than 15secs
    Disk queue was at 6.5k at this point.

  • Knight - Monday, August 6, 2018 12:07 PM

    This was after the database restore and at the end of the log restore. Only thing in the logs are lots and lots of IO waits. I did not use direct seeding.

    Nothing in the error logs to indicate a failure or error, just the IO messages like:
    SQL Server has encountered 5470 occurrence(s) of I/O requests taking longer than 15secs
    Disk queue was at 6.5k at this point.

    Definitely some IO problems there. In addition to increasing the resources, there are a lot of other components in the IO path that could be contributing/causing the issues. You can find a pretty good list of things to check in the following article:
    Diagnostics in SQL Server help detect stalled and stuck I/O operations

    Sue

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply