Inserts not propagated in transactional replication

  • mTBCent

    SSChasing Mays

    Points: 647

    Hello,

    I have a transactional replication configured which subscription fails to sync to often due to 20598 errors. I've enabled verbose logging on the Distribution agent, nothing more detailed captured than the message that the UPDATE fails due to the row missing at the Subscriber.

    I've run profiler on the Subscriber side and verified that very often INSERT commands are not being replicated for the articles. But UPDATE commands do, and the 20598 error occurs to often. I can't just insert the missing rows all the time for to fix the process. Currently, I have an automated PowerShell solution that inserts these missing rows as 20598 errors appear. But in need to find the source of the problem.

    The articles are configured properly, inserts are to be propagated by calling the procedure sp_MSIns_{article_name}.

    I tried to extract all the commands that are being inserted in the distribution database for to be replicated by the Log Reader Agent, from the MSRepl_commands and MSRepl_transactions, but I had difficulties decoding the command from the MSRepl_commands without using sp_browsereplcmds (I am trying not to call this procedure all the time but run a job to capture what everything is being captured for replication and be able to search that result set just to confirm that the missing INSERT commands are really missing and are not even brought to the distributor server).

    I am using a Remote Distributor, one Publisher server and one Subscriber server. The tables I am replicating are around 100 Gb in size and grow fast.

    How can I debug this further?

    Thanks in advance

  • Site Owners

    SSC Guru

    Points: 80372

    Thanks for posting your issue and hopefully someone will answer soon.

    This is an automated bump to increase visibility of your question.

  • Sue_H

    SSC Guru

    Points: 90814

    You'll probably have to dig in a bit more before anyone comes up with some ideas on the issue. I'd check the MSdistribution_history and MSrepl_errors tables in the distribution database. I'd also check the logs on the server with the distributor. Those should hopefully have more information to track things down.

    Sue

  • Jeff Moden

    SSC Guru

    Points: 997308

    This is a bit of a hunch but you might want to increasing the "LanMan" timeout in Windows.  We had a similar problem with remote backups that would work just fine until the very end when the "file resize" bit kicked in.  Increasing the "LanMan" timeout in the registry "changed it from 30 seconds to 600 seconds" fixed it all.

    Like I said, just a hunch.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".
    "Dear Lord... I'm a DBA so please give me patience because, if you give me strength, I'm going to need bail money too!"

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • mTBCent

    SSChasing Mays

    Points: 647

    Hello Sue and Jeff,

    And thanks a lot for replying. The MSdistribution_history and MSrepl_errors table will give me the same 20598 error message that I am receiving within the alert I have configured - that the row was not found at the Subscriber when applying the replicated UPDATE command for a particular table with a specific primary key. The error messages are always as follow:

    1. Error executing a batch of commands. Retrying individual commands.
    2. The row was not found at the Subscriber when applying the replicated UPDATE command for Table '{TableName}' with Primary Key(s): [{PrimaryKey}] = {PK_Value}.

    And I can't find a pattern for the reoccurrence, it happens during the whole day. I did run a profiler on the source database and compared the flow of commands with the one run on the subscriber for a particular batch within which the error is registered - the insert is just missing at the subscriber, every other command is properly captured.

    'LanMan' is not an option for me Jeff as all the SQL Servers are Linux-hosted ones.

    Thanks again for replying.

Viewing 5 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply