SQL 2016 AlwaysOn - Parallel redo not performing.

  • Currently, we’re testing SQL 2016 AlwaysOn on a large database (8 TB). Server is physically large, 64 cores, 1 TB memory, Flash storage and 10GB connection between the primary/replica which reside in the same rack. 

    We’re experiencing large latency on the replica node during redo. We’ve currently working with Microsoft. They’ve reproduced the latency in their lab environment. We’ve testing using trace flag 3459, which suppresses parallel redo and the performance actually improved, mimicking the performance of  AO in SQL 2014. We’ve dropped and recreated the AG several times w/out much success. 

    Anyone experience similar degradation using SQL 2016 AlwaysOn?

  • I haven't had this problem myself yet, but I was aware that it could be an issue because as a result of parallelising the log transfer process, the redo process could struggle to keep up (even though the redo was also parallelised). I think I first heard about it here: http://sqldatapartners.com/2016/12/31/availability-group-improvements/

  • Updated information: 

    Microsoft was able to reproduce our scenario thus proving Parallel Redo is indeed slower for
    short tables online index rebuild . They were also able to determine the root cause: in case of
    a short table the main Redo Thread does all the work and the dispatched parallel slave threads
    have virtually no work left for them to do; the parallel architecture calls for the Main Redo Thread
    to perform extra work associated with governing the parallel processing – specifically the operations
    of queuing, de-queuing and analyzing the log stream data in order to dispatch the work to parallel
    Redo slaves.- The overhead of performing this extra work by the main Redo Thread coupled with
    virtually no work left to do for Parallel Redo slave threads causes Parallel Redo to be slower than
    Serial for short tables online index rebuild.

    Microsoft will try to find a solution for improving Parallel Redo performance for short tables online
    index rebuild but it will take months before this work is completed and released. In the meantime
    Microsoft will officially recommend using Trace Flag 3459 (changes Redo performance from Parallel to Serial)
    when customers experience Redo performance degradation in case of short tables.

  • Hi Fox / Beatrice -- is this only for online index rebuilds?

  • Yes, our environment is a very large OLTP/Decision ecosystem that requires all index rebuilds to be 'online'. All our testing was performed with online index rebuilds.

  • Cheers 'Fox

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply