Transactional Replication Performance Issues After Migration From 2000 to 2008

  • We currently have 140 SQL 2000 publications replicating to a single 2008 server with about 25 articles. 5 of the articles are quite substantial. We are in the process of migrating to new 2008 servers that perform in general significantly better than the old 2000 infrastructure. Previously all replication ran fine with a latency of no more than a minute at any time even under peak load. We have migrated about 60 servers to the new 2008 infrastructure but the latency has now shot up. This makes no sense in terms of performance. We have tried a number of things to resolve this but have so far been unsuccessful. The 2008 and 2000 publications are both going to the same DB and using the same replication procs.

    First the @status in add article needed changing from the original script where it was 0 to 24. This gave a significant improvement but the latency is still up to 40 minutes at peak times.

    We have tried changing from push to pull. This made the performance worse.

    We have changed the PollingInterval on the distribution agent from 5 (2008 default) to 10 (2000 default). This made no noticeable difference.

    We have changed the ImmediateSync setting to 0 from 1. This made no noticeable difference.

    We have ensured the index etc is ok on the central MSreplication_unscriptions table. This made no noticeable difference.

    We have tried lock hints on some of the replication procs. This made no noticeable difference.

    Any ideas would be much appreciated

  • For reference all changes except the @status and PollingInterval have been made at 2 or 3 of the 60 new servers to test, not at all 60.

  • Have you established where the latency is?

    Log reader to distribution server?

    Distribution agent to subscriber?

  • Cheers for the response. The publisher is its own distributor, there isn't a separate server for this. I will post some outputs from the Distribution and Log Reader agents shortly

  • Also, is the latency across all subscribers?

  • Some are worse than others, but there is latency across all the new servers. There was a correlation to the number of records in msrepl_commans and performance. But even when we set ImmediateSync to 0 and running the Distribution Cleanup job every 30 mins to keep the table size, this didn't help with I guess suggests it is an issue with applying the commands at the subscriber rather than getting the off the distributor?

    There is no latency on the 2000 boxes

  • chris.roddis-ferrari (11/14/2013)


    Some are worse than others, but there is latency across all the new servers. There was a correlation to the number of records in msrepl_commans and performance. But even when we set ImmediateSync to 0 and running the Distribution Cleanup job every 30 mins to keep the table size, this didn't help with I guess suggests it is an issue with applying the commands at the subscriber rather than getting the off the distributor?

    There is no latency on the 2000 boxes

    It could be.

    For clarity, you have 200 publishers (140/60 2000/2008) delivering to a single subscriber using push transactional replication. All of the 60 2008 publishers are experiencing latency at a currently unknown "bottleneck".

    Are the subscriptions going to the same database/objects?

  • Yes all going to the same database/objects. The split is 80 on 2000 and 60 on 2008

    Cheers

  • Any Drive(disk) level changes happened ? like comparatively low graded disk is being used now.

    -------Bhuvnesh----------
    I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

  • Disk is now significantly better

    Was previously 2 Utlra SCSI 420 72GB drives RAID1-0 and is now 4 SAS 300GB drives 2 RAID1-0 pairs.

  • Distribution Agent Log

    ************************ STATISTICS SINCE AGENT STARTED ***********************

    11-14-2013 13:20:45

    Total Run Time (ms) : 394605 Total Work Time : 389457

    Total Num Trans : 5194 Num Trans/Sec : 13.34

    Total Num Cmds : 8928 Num Cmds/Sec : 22.92

    Total Idle Time : 0

    Writer Thread Stats

    Total Number of Retries : 0

    Time Spent on Exec : 25784

    Time Spent on Commits (ms): 1622 Commits/Sec : 0.13

    Time to Apply Cmds (ms) : 389457 Cmds/Sec : 22.92

    Time Cmd Queue Empty (ms) : 157 Empty Q Waits > 10ms: 10

    Total Time Request Blk(ms): 157

    P2P Work Time (ms) : 0 P2P Cmds Skipped : 0

    Reader Thread Stats

    Calls to Retrieve Cmds : 2

    Time to Retrieve Cmds (ms): 369629 Cmds/Sec : 24.15

    Time Cmd Queue Full (ms) : 19843 Full Q Waits > 10ms : 128

  • It looks like the delivery of commands is whats taking the time. Have you compared the distribution agent profile settings between the servers?

  • Only differences are BCPBatchSize and QueryTimeout which we haven't changed and PollingInterval which changed to 5 by default and we have changed back to 10

  • chris.roddis-ferrari (11/14/2013)


    Only differences are BCPBatchSize and QueryTimeout which we haven't changed and PollingInterval which changed to 5 by default and we have changed back to 10

    None of those would make any difference to command delivery. Have you checked for blocking on distribution db and subscription db?

    Checked latency using a tracer token?

    These are the parameters which modify delivery rate.

    [-CommitBatchSize commit_batch_size]

    [-CommitBatchThreshold commit_batch_threshold]

    [-MaxDeliveredTransactions number_of_transactions]

    [-PacketSize packet_size]

    [-SubscriptionStreams [1|2|...64]]

  • Tracer Token show Publisher to Distributor (same box) as a couple of secs and all the latency is Distributor to Subscriber - 12 minutes in the one I just did. There is no blocking on the distribution db, just a ASYNC_NETWORK_IO wait of about 2 secs on the distribution process. There has always been a level of blocking on the subscriber even before the upgrades started, but this has never caused performance issues. I am not able/don't know how to tell if this has increased

Viewing 15 posts - 1 through 15 (of 25 total)

You must be logged in to reply to this topic. Login to reply