September 28, 2010 at 6:26 am
Hi DBAs
I do have some Issues with my latency. It seems, as longer its running as worser it gets. At the moment the system is not productive and no user are logged on and working with the data. But this will change soon.
Just a week ago I had to rebuild the replication because the publisher DB was screwd. Took me while to drop the DB because all SP were not executable in the DB. After restore and adding a new local publication with new subscribers everything went fine. It took less then 1 sec to replicate any article if I changed a record manually.
Publisher and Distributor are the same server with 2 Subscribers with push trans log repl.
So far I can say the problem is from Pub to Dist takes about 40 to 75min and from Dist to Subs less then a sek (tracer token)
Heer the Output from the agent where I can say the Reader takes most time:
09-28-2010 13:43:15
Total Run Time (ms) : 6849391 Total Work Time : 466813
Total Num Trans : 23 Num Trans/Sec : 0.05
Total Num Cmds : 1251 Num Cmds/Sec : 2.68
Total Idle Time : 5715000
Writer Thread Stats
Total Number of Retries : 0
Time Spent on Exec : 6953
Time Spent on Commits (ms): 172 Commits/Sec : 0.30
Time to Apply Cmds (ms) : 466813 Cmds/Sec : 2.68
Time Cmd Queue Empty (ms) : 667297 Empty Q Waits > 10ms: 7
Total Time Request Blk(ms): 6382297
P2P Work Time (ms) : 0 P2P Cmds Skipped : 0
Reader Thread Stats
Calls to Retrieve Cmds : 1146
Time to Retrieve Cmds (ms): 2050 Cmds/Sec : 610.24
Time Cmd Queue Full (ms) : 466171 Full Q Waits > 10ms : 220
I have checked several blogs and forum topics but I am stuck here.
Any hint how to move on are welcome, thanks in advance
September 28, 2010 at 8:35 am
For a transactional replication, it should never take more than 10 seconds to move a change of data from publisher to subscriber. Please check if the log reader agent is running without errors. Check how big is your log file in your publisher DB.
Also it is recommended to run the distributor in a separate machine.
-Roy
September 28, 2010 at 8:39 am
From the distribution agent output provided:
Total Work Time : 466813
Time to Apply Cmds (ms) : 466813 -> distribution agent writer thread
Time to Retrieve Cmds (ms): 2050 -> distribution agent reader thread
It looks like your distribution writer thread is the bottleneck...
Did you check this http://blogs.msdn.com/b/repltalk/archive/2010/02/21/transactional-replication-conversations.aspx - especially the "Distribution Agent Writer Latency" section that has some information on how you can try to troubleshoot the cause of writer thread latency?
From the link - top things to check:
a) Do you have triggers enabled on your subscriber?
b) Are your replication SPs parameterized?
c) Do you have blocking/load on the subscriber?
You could also see if any of the distribution agent switches (for e.g. -SubscriptionStreams, -CommitBatchSize or -CommitBatchThreshold) help in your situation - however the best way to test all this is on a dev or test environment where you can repro the latency issue and see if these switches help.
September 29, 2010 at 6:40 am
Hi Roy and winash
Please check if the log reader agent is running without errors. Check how big is your log file in your publisher DB.
Also it is recommended to run the distributor in a separate machine.
-Cant find any errors about the agent, agent history is everything completed or running
in the running statement: Replicated transactions are waiting for next Log backup or for mirroring partner to catch up
Transaction log backups are taken every houre once. Option "sync with backup" is enabled for Publisher and Distributor
The message of the running statement confuses me a bit... Do I have to take transaction log backups of the publisher DB more frequent?
-log file was 90MB, 92% used and 1% growth, max size 2000GB... very strange setup. I have changed it to 500MB, growth 100MB and max size 2GB.
-Local Distribution was designed because there are just a few articles to publish.
a) Do you have triggers enabled on your subscriber?
b) Are your replication SPs parameterized?
c) Do you have blocking/load on the subscriber?
a) No triggers are enabled on my subscribers
b) No, I ran sp_changearticle as mention in your link on all articles
c) No, on both subscribers the subscriber DB is the only DB on the server and nothing is locked
I also set up a trace to measure how long it takes till Distribution Agent takes to start a SP on the subscriber. If I change a value in the publisher DB it takes over 30min till anything will happen on the subscriber
"exec sp_MSget_last_transaction @publisher_id = 0, @publisher_db = N'xxx', @for_truncate = 0x1" is sleeping but has still an open transaction, no Blocking or Blocked by
September 29, 2010 at 6:53 am
Sync with back up is the reason for latency.
If the “sync with backup” option is turned on for the publication database, Log Reader Agent does not propagate the transactions until they have been backed up at the Publisher. This means that the publication database with the “sync with backup” setting turned on would have to be backed up at frequent intervals. Any type of backup (including transaction log, differential etc) can be performed for the publication database.
-Roy
September 29, 2010 at 7:03 am
Is the "allow_initialize_from_backup" option set on the publisher and distributor DBs? If you run sp_helppublication on the publisher DB does the column "allow_initialize_from_backup" in the results of this SP have a value of 1?
If this is set:
From http://msdn.microsoft.com/en-us/library/ms152560.aspx:
Setting this option on the publication database ensures that transactions are not delivered to the distribution database until they are backed up at the publication database. The last publication database backup can then be restored at the Publisher without any chance of the distribution database having transactions that the restored publication database does not have.
Latency and throughput are affected because transactions cannot be delivered to the distribution database until they have been backed up at the Publisher. For example, if the transaction log is backed up every five minutes, there is an additional five minutes of latency between when a transaction is committed at the Publisher and when the transaction is delivered to the distribution database, and subsequently the Subscriber.
Since you see the message "Replicated transactions are waiting for next Log backup or for mirroring partner to catch up" I'd assume that latency you're seeing might be due to this issue.
To test out if this is indeed the case - you can turn off the "allow_initialize_from_backup" (if it is set on) using the sp_changepublication SP and then check if the throughput improves.
To turn it off:
sp_changepublication @publication = 'PublicationName',
@property='allow_initialize_from_backup',
@value='false'
To turn it on:
sp_changepublication @publication = 'PublicationName',
@property='allow_initialize_from_backup',
@value='true'
September 29, 2010 at 7:55 am
Thanks (Danyavad) for your help 🙂
Issue solved and I learned a lot about SQL replication, their agents and how to gather information to narrow a problem down.
mission accomplished!!
Viewing 7 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply