Replication latency Need Suggetions

  • We are using Transaction replication on a DB size(publisher) of 1 TB. We have two subscribers P01 and P02. The distributor DB is on one of the subscriber(P01).

    Our source(Publisher) is a heavy transaction based system and we are replicating all the data(tables) to both subscribers.

    We are seeing high latency in replication when the transaction and user activity increase on publisher. Surprisingly, P01 which has distributor locally on the same server is having more latency(35 hours) compared to P02 which has distributor remotely(18 hours).

    What are the suggetions for reducing the latency and bring both subscribers with in 1 or 2 hours latency. We are on 64 bit SQL and OS.

  • What are the suggetions for reducing the latency and bring both subscribers with in 1 or 2 hours latency

    One of the suggestions is, defragment the indexes in publisher as well as subscribers.

  • There’s all sort of things to investigate here. Starting with some of the more likely

    1)is the latency between the publisher and distributor (i.e. Log Reader Agent)?

    2)is the latency between the distributor and the subscriber (i.e. Distribution Agent)?

    3)are both 1) and 2) showing high latency?

    4)I assume the Log Reader Agent and Distribution Agent(s) are running continuously?

    If 1) then, if your publisher isn’t massively overloaded, check your network performance for packet loss etc. Also, check the Log Reader delivery rate which can be obtained from the mslogreader_history table in the distribution database. As a rough guideline, I’d be concerned with a rate of less than 150 commands per second but, without knowing anything about your environment, that’s a bit of an arbitrary figure. Poor log reader performance tends to be caused by network issues, in my experience. I work with a very high volume replication infrastructure and log reader performance has only ever been impacted by poor network performance.

    If 2) then again network could be an issue except that the local subscriber is struggling. It may well be further behind than the remote subscriber because it is acting as the distributor as well and being impacted by the extra I/O. You may need to look at configuring a dedicated distribution server into your replication design. If you’re happy that the subscribing servers aren’t overloaded and that the SQL being executed by the distribution agents is performing well then my next question is how many publications do you have. If you are replicating high volumes of data then you may not have enough parallelism in your replication design. Parallelism can be increased by adding publications. Also, check the delivery rates in the msdistribution_history table in the distribution database. If the delivery rates are good (say >= 150 commands per second) but latency is high this would tend to suggest that you need more distribution agents to get the data delivered more quickly. If delivery rates are low and latency is high then you probably have performance issues at the subscriber though you would also need to check the size of your distribution database. As data backs up in the distribution database, distribution agents can struggle to read data from it and this will affect the delivery rate metric.

    Hope some of that helps though there are still other avenues to explore.

  • Are your subscribers in the same physical location as the publisher? If they are, what kind of network connectivity do you have between them (10\100 or gigabit?)? If you're local make sure your NICs are set to FULL duplex.

    In terms of hardware configuration, how close are your subscribers to your publisher? If your publisher has 6 spindles but your subscribers only have 2 you could be looking at disk I/O issues.

    Remember that changes are applied to your subscribers row by row. If you run a single update statement on your publisher that affects 1,000 rows expect 1,000 update statements to be run at your subscriber. Since you're replicating every single table why not look at replicating stored procedure execution rather than every row of data?

    You can also increase throughput by tinkering with the agent values for CommitBatchSize and CommitBatchThreshold. I posted blog entries here and here that explain what those do in detail.

    Kendal Van Dyke
    http://kendalvandyke.blogspot.com/[/url]

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply