Rebuilding indexes with Maxdop on AGs

  • I was curious if anyone else has ran into this or has an explanation.

    We cut over to AGs about a year ago and have noticed a weird behavior when rebuilding or adding indexes. Every time we use a MAXDOP greater than 2, the server bogs down causing a lot of queueing and preventing transactions from committing. It comes with an AVAILABILITY_REPLICA wait type.

    We are running 2017 and have a 3 node AG setup; 2 synchronous nodes in our primary DC and one asynchronous node to our DR site. All of our servers generally have at least 8 cores and some of them are in the twenties and it doesn't matter, we still get the gridlock.

    Wondering if this is another surprise AG headache or if there is something I'm missing.

    Thanks

     

  • The problem is how much data you are trying to push across the network to the secondary instances.  The send and redo queues get backed up and that causes the system to slow down as it is trying to apply those changes.

    The synchronous nodes have to apply their changes before those changes are committed/applied to the primary - which is why you are seeing the wait type as availability_replica.  However, if the send queue falls far enough behind - SQL Server should automatically switch over to asynchronous until it catches back up.  I don't think you are hitting that threshold...

    One option would be to schedule your index rebuilds and index adds during a maintenance window.  Prior to starting the process - switch your secondary systems to asynchronous, rebuild the indexes - monitor the send/redo queues - and once caught back up switch back to synchronous.

     

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • You may want to consider backing off the index rebuilds. Fragmentation isn't typically the cause of a performance problem, but out of date statistics can be. I would encourage you to use a threshold of re-organize at 30-50% fragmentation and rebuild at 75-80% fragmentation. This is easy to do with Ola Hallengren's Maintenance plans.  I had this same problem where during index rebuilds the redo queue was growing rapidly when using the very low, made up thresholds of reorganize indexes between 5 and 30% and rebuild indexes if they are 30% or more fragmented. When I changed the threshold, the redo queue issue went away and query performance has not suffered.

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply