Replication and Deadlocks at subscriber

  • Hello friends, and happy TurkeyDay,

    I have a database used for reporting, a subscriber, that occasionally has mismatch records, and I'm trying to find out why. Why they keep getting out of synch.

    Setup:

    Both Publisher and Subscribers: SQL2016 SP2 CU15.  Windows Server 2019. Same datacenter, same subnet, so nothing too complex

    Distributor in the same server as Publisher. Push Subscriptions. jobs [Agent history clean up: distribution] and [Distribution clean up: distribution] enabled and running with the default values

    There are several Transactional Publications on the main database, but nothing overlapping, and all point to the subscriber.

    The subscriber's tables are only written by replication, and each table has only one subscription (so no convergence replication or merge replication)

    What I've done

    About once a day I run a count on a particular table, and the counts are off, so then I ran tablediff.exe and shows a lot of mismatch rows.. anywhere from a couple dozen to a few hundreds.

    I checked the latency, and it look good, only a few seconds, at that time. However in times of high Publisher activity it can climb up to 20 mins, but it eventually catches up. After those spots of latency, tablediff shows no difference so it's not happening then.

    I inserted a tracer token and it looks good. There are no errors on neither of the jobs, replication monitor, etc.

    The Distribution Agent history, "Distributor to Subscriber History section" shows "Skipped X error(s) when applying transaction a the subscriber"

    distro_hist

    The "view sync status" section keep showing no errors, most of the time "no transaction available"

    If I try to validate the subscriptions, it shows me that might be out of synch:

    CheckSum

    The only thing I can find is a deadlock on the subscriber. Our SQLSentry thing only captures long running deadlocks so I'm sure there are more, and that's the cause of my problem, but I still don't know how to prevent it.

    Sometimes I'm able to capture it like this:

    deadlock

    The Read Committed Snapshot Isolation setting is set to FALSE on both Publisher and Subscriber. So I changed the RCSI to true in the subscriber but I still got deadlocks.

    What I need

    First to all, I need to figure out how I can stop getting deadlocks and mismatch data. I mean, I eventually run the output of tablediff to sync the tables, and occasionally I just re-ran the snapshot, but I keep having the issue.

    I figure I could improve performance and look at this article:

    https://red9.com/blog/performance-tune-sql-server-replication-checklist/

    but before start tweaking, I would like to measure performance, so I can tell if any changes I make do things better (other than presence/absence of deadlocks.

    Too many websites about how to fix Replication when it doesn't work at all, but can someone point me to a URL where I can learn what to measure on Replication?

     

    Thank you ALL

     

    Miguel

     

     

     

  • Thanks for posting your issue and hopefully someone will answer soon.

    This is an automated bump to increase visibility of your question.

  • I can't speak to all of this as I don't use replication and haven't set it up or seen that problem before, but I am curious what you mean by "Our SQLSentry thing only captures long running deadlocks so I'm sure there are more, and that's the cause of my problem".  Deadlocks are either "yes" or "no"... I am not aware of anything like a "long running deadlock".  Either a deadlock happened and something was rolled back, or it didn't happen and nothing was rolled back.  I am not sure what a "long running deadlock" is.

    And SQL should just retry the query after the deadlock as it indicates in the error message.  Although you may want to check the error log in the job folder as indicated in that error.  If it fails multiple times, it is probably something you should investigate.

    Plus, since you have SQL Sentry, might as well explore the deadlock graphs in there.  See what caused the deadlock and see if the replication actually retried successfully, or if those got stuck and unprocessed.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!

  • This was removed by the editor as SPAM

  • This was removed by the editor as SPAM

Viewing 5 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply