SQL 2016 AlwaysOn - Parallel redo not performing.

Question

SQL 2016 AlwaysOn - Parallel redo not performing.

cheshirefox

Hall of Fame

Points: 3464
More actions
March 23, 2017 at 7:55 am

#319818

Currently, we’re testing SQL 2016 AlwaysOn on a large database (8 TB). Server is physically large, 64 cores, 1 TB memory, Flash storage and 10GB connection between the primary/replica which reside in the same rack.
We’re experiencing large latency on the replica node during redo. We’ve currently working with Microsoft. They’ve reproduced the latency in their lab environment. We’ve testing using trace flag 3459, which suppresses parallel redo and the performance actually improved, mimicking the performance of AO in SQL 2014. We’ve dropped and recreated the AG several times w/out much success.
Anyone experience similar degradation using SQL 2016 AlwaysOn?

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

Beatrix Kiddo SSC-Dedicated Points: 32407 More actions · Answer 1

I haven't had this problem myself yet, but I was aware that it could be an issue because as a result of parallelising the log transfer process, the redo process could struggle to keep up (even though the redo was also parallelised). I think I first heard about it here: http://sqldatapartners.com/2016/12/31/availability-group-improvements/

cheshirefox Hall of Fame Points: 3464 More actions · Answer 2

Updated information:

Microsoft was able to reproduce our scenario thus proving Parallel Redo is indeed slower for
short tables online index rebuild . They were also able to determine the root cause: in case of
a short table the main Redo Thread does all the work and the dispatched parallel slave threads
have virtually no work left for them to do; the parallel architecture calls for the Main Redo Thread
to perform extra work associated with governing the parallel processing – specifically the operations
of queuing, de-queuing and analyzing the log stream data in order to dispatch the work to parallel
Redo slaves.- The overhead of performing this extra work by the main Redo Thread coupled with
virtually no work left to do for Parallel Redo slave threads causes Parallel Redo to be slower than
Serial for short tables online index rebuild.

Microsoft will try to find a solution for improving Parallel Redo performance for short tables online
index rebuild but it will take months before this work is completed and released. In the meantime
Microsoft will officially recommend using Trace Flag 3459 (changes Redo performance from Parallel to Serial)
when customers experience Redo performance degradation in case of short tables.

alex.sqldba SSChampion Points: 10254 More actions · Answer 3

Hi Fox / Beatrice -- is this only for online index rebuilds?

cheshirefox Hall of Fame Points: 3464 More actions · Answer 4

Yes, our environment is a very large OLTP/Decision ecosystem that requires all index rebuilds to be 'online'. All our testing was performed with online index rebuilds.

alex.sqldba SSChampion Points: 10254 More actions · Answer 5

alex.sqldba

SSChampion

Points: 10254

April 12, 2017 at 3:38 pm

#1937951

Cheers 'Fox