Transaction Replication Sync Problem

  • Hello All,

    For the last month I am facing constantly with an issue with my transaction replication in Sql Server 2012.

    Both Servers have Sql Server 2012 and it seems to be no differences between the tables.

    The error is always with only one particular table and with only one random record.

    I got all the time the error 20598 with the following message "The row was not found at the Subscriber when applying the replicated command".

    The replication is working like this:

    The data is inserted in our Database db 1 in Table1 and replicates the data to the customer database db2 in table1 then the customer database db2 from table2 replicates the confirmation of receiving that record to our database db1 to table2 and this last part is when the error occur. We don't have access to the data from customer side or the server.

    Anyone have any ideas how to fix this or how to troubleshoot or if this is a known issue with MSSQL 2012 Transaction?

    Thanks in advance,

    Angelo

  • Angelo SIlva (3/27/2014)


    Hello All,

    The data is inserted in our Database db 1 in Table1 and replicates the data to the customer database db2 in table1 then the customer database db2 from table2 replicates the confirmation of receiving that record to our database db1 to table2 and this last part is when the error occur. We don't have access to the data from customer side or the server.

    This confirmation process must be a custom process you have created. What inserts/updates the date in db2.table2?

    Have you confirmed the error command?

    USE distribution

    GO

    SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED

    SET NOCOUNT ON

    DECLARE @SeqNo NCHAR(22)

    DECLARE @SeqnoB VARBINARY(16)

    DECLARE @cmdID INT

    DECLARE @publisher_db_id INT

    DECLARE @article_id INT

    DECLARE @Time DATETIME

    SELECT DISTINCT

    @Seqno = master.dbo.fn_varbintohexstr(Re.xact_seqno)

    , @SeqnoB = RE.xact_seqno

    , @cmdID = RE.command_id

    , @publisher_db_id = RC.publisher_database_id

    , @Time = RE.[time]

    , @article_id = RC.article_id

    FROM Distribution.dbo.MSrepl_errors RE WITH ( NOLOCK )

    INNER JOIN Distribution.dbo.MSrepl_commands RC WITH ( NOLOCK ) ON RC.xact_seqno = RE.xact_seqno

    AND RC.command_id = RE.command_id

    WHERE RE.time BETWEEN DATEADD(mi, -5, GETDATE()) AND GETDATE()

    ORDER BY RE.time DESC

    IF @SeqNo IS NOT NULL

    AND @publisher_db_id IS NOT NULL

    AND @cmdID IS NOT NULL

    BEGIN

    EXEC sp_browsereplcmds @xact_seqno_start = @SeqNo,

    @xact_seqno_end = @SeqNo,

    @publisher_database_id = @publisher_db_id, @command_id = @cmdID -- Show all subscribers for the article that has errored

    SELECT SS.Name AS [Subscriber]

    , S.Subscriber_DB AS [Target Database]

    FROM Distribution.dbo.MSsubscriptions S WITH ( NOLOCK )

    INNER JOIN sys.servers SS WITH ( NOLOCK ) ON SS.Server_ID = S.Subscriber_ID

    WHERE publisher_database_id = @publisher_db_id

    AND article_id = @article_id

    ORDER BY SS.Name

    , S.Subscriber_DB

    END

    ELSE

    BEGIN

    SELECT 'No Errors Found'

    END

  • The problem is I don't have access to the customer server side. So we can check this things.

    I just check the distribution table for this error and I can't find it on our side. MSrepl_errors.

    I think the problem is the publisher from their side tries to update the record without replicating the insert first.

  • Just a point note, the above query only shows errors in the last 5 minutes. You have to change it to check for one outside that period.

    If you find the error is an update against a row you don't have its likely your the subscriber table is out of sync with the publisher.

    There a couple things I can think of

    -The table is out of sync and missing rows

    -The sp_msins_<table> proc has been edited and is failing to insert

    Its unlikely the publisher is failing to send new rows to you as this tech is proven to work. There must be a breakdown somewhere.

  • I know that. I am not checking with that query. I am checking for all records with a query that I create yesterday for the same table.

    The thing is I don't have access to the customer side to check what is happening there, but the replication is failing from their side when Sql server replicates the information and this happen only to one table from the all 26 tables.

    Could this be caused from different data type or definition in the schema?

    Do you have any idea.

    Thank you

  • Angelo SIlva (3/28/2014)


    I know that. I am not checking with that query. I am checking for all records with a query that I create yesterday for the same table.

    The thing is I don't have access to the customer side to check what is happening there, but the replication is failing from their side when Sql server replicates the information and this happen only to one table from the all 26 tables.

    Could this be caused from different data type or definition in the schema?

    Do you have any idea.

    Thank you

    The command succeeded on their side, was written to the transaction log, read and written into the distribution db and subsequently fails to write to your subscriber. So the problem is with the command being applied to your side. Your customer would be unaware of a problem.

    If it was a schema difference you would get a different error.

    As I said either your local table is out of sync or may be a change to the sql replication topology.

    If the table isnt to large you could perform a reinitialisation? Have you looked for commonalities between the missing rows by looking at the failed commands?

  • The error is something like this:

    2014-03-27 18:45:03.262 sp_MSget_repl_commands timestamp returned: 0x0x0000017d00007ef0000345000000, 5, local rowcount: 5

    2014-03-27 18:45:03.265 sp_MSget_repl_commands timestamp value is: 0x0x0000017d00007ef0000300000000

    2014-03-27 18:45:03.450

    42000 The row was not found at the Subscriber when applying the replicated command. 20598

    2014-03-27 18:45:03.452

    42000 The row was not found at the Subscriber when applying the replicated command. 20598

    2014-03-27 18:45:03.453

    42000 The row was not found at the Subscriber when applying the replicated command. 20598

    2014-03-27 18:45:03.455

    42000 The row was not found at the Subscriber when applying the replicated command. 20598

    2014-03-27 18:45:03.456

    42000 The row was not found at the Subscriber when applying the replicated command. 20598

    2014-03-27 18:45:03.457

    Failed command = {CALL [sp_MSupd_dboRM_ENTITLEMENT_OB_RESULT] (,?,,,,,?,?,0x42)} {CALL [sp_MSupd_dboRM_ENTITLEMENT_OB_RESULT] (,?,,,,,?,?,0x42)} {CALL [sp_MSupd_dboRM_ENTITLEMENT_OB_RESULT] (,?,,,,,?,?,0x42)} {CALL [sp_MSupd_dboRM_ENTITLEMENT_OB_RESULT]

    The thing is I don't know how the replication has been setup. It was already done in the past and the guy is not here anymore and we don't have any information about this. 🙁

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply