Curious question about Transaction Replication from a backup

  • I've read various articles on how to do this but tried a different approach and while it appears to work for the most part, there were still some issues that I'm hoping someone can point out the obvious.

    We have replication already set up and going on a reporting server. There is a separate publisher, distributor, and subscriber server. Our attempt is to recreate "clone" the existing replication set up on the existing reporting server to a new reporting server.

    Our process:

    - We stopped the distributor agent

    - Waited for replicated transactions to complete at the subscriber

    - Took a full backup of the subscribed databases

    - Restored these databases on the new reporting server (at this point both subscribers should be 100% the same)

    - Created a new subscription to the existing publications on the new server

    - Re-enabled the distribution agent and waited for replicated commands to hit both subscribers

    This seems to work, but for some tables we're off by a few hundred, to a few thousand rows...

    The only error we can find is "The row was not found at the Subscriber when applying the replicated command" - which to keep things flowing we added -SKIPERRORS 20598

    It appears very random, but is keeping things in sync for the most part (minus the original missing rows)

    Any suggestions or ideas as to we are getting these errors? Shouldn't both database servers be exactly the same in the method we took?

    ______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience

  • MyDoggieJessie (7/5/2013)


    I've read various articles on how to do this but tried a different approach and while it appears to work for the most part, there were still some issues that I'm hoping someone can point out the obvious.

    We have replication already set up and going on a reporting server. There is a separate publisher, distributor, and subscriber server. Our attempt is to recreate "clone" the existing replication set up on the existing reporting server to a new reporting server.

    Our process:

    - We stopped the distributor agent

    - Waited for replicated transactions to complete at the subscriber

    - Took a full backup of the subscribed databases

    - Restored these databases on the new reporting server (at this point both subscribers should be 100% the same)

    - Created a new subscription to the existing publications on the new server

    - Re-enabled the distribution agent and waited for replicated commands to hit both subscribers

    This seems to work, but for some tables we're off by a few hundred, to a few thousand rows...

    The only error we can find is "The row was not found at the Subscriber when applying the replicated command" - which to keep things flowing we added -SKIPERRORS 20598

    It appears very random, but is keeping things in sync for the most part (minus the original missing rows)

    Any suggestions or ideas as to we are getting these errors? Shouldn't both database servers be exactly the same in the method we took?

    Your process are correct. Let me ask you few more questions.

    1. Do you have any jobs that delete rows the reporting server

    2. Do you have SP's that are participating in the replication as an article

    3. You mentioned a few tables are not always sync. Does this happen all the time those tables are not in sync. What I meant is when ever you recreate the replication does those tables are not in sync.

    I am using P2P replication with more than 3 peers I never encountered this issue. But we do get a tons of conflicts.

    Thank you

  • To answer your questions:

    1. No

    2. No

    3. It appears to be the same table(s). We were going to try and drop the articles and add them back in to see if it remedies the issues we encounter for the problematic tables

    Thanks for the reply!

    ______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience

  • We used the below script to set the replication . I am wondering even stopping the distribution job some batch of transactions was still in progress can cause this or not .

    EXEC sp_addsubscription @publication = N'<>',

    @subscriber = N'<>',

    @destination_db = N'<>',

    @subscription_type = N'Push',

    @sync_type = N'replication support only',

    @article = N'all',

    -- @update_mode = N'read only',

    @subscriber_type = 0

    Or instead of adding a subscription to an exisiting Publication , creating a new pub for the specific new subscription can help in getting the trick done .

  • I am wondering even stopping the distribution job some batch of transactions was still in progress can cause this or not .

    This was my original thought. Perhaps we didn't wait long enough for everything to trickle through...

    We can't create a new publication because of the snapshot issue. It would involve an entire snap of all the data and place us in a rather long maintenance window we can't really get approval for right now.

    The only think I can think of is:

    1. Redo it again, waiting much longer to ensure there are no pending rpl_commands, checking and then rechecking the distribution database to make 100% certain everything is replicated before taking the FULL backup of the subscription databases

    2. If there are lingering commands, we need to find a way to push those to the new subscriber as well before turning the distribution agent back on

    3. Dropping the troublesome article(s) from the existing publication, then re-adding them back in to ensure we minimize the impact on our OLTP system

    ______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience

  • MyDoggieJessie (7/8/2013)


    I am wondering even stopping the distribution job some batch of transactions was still in progress can cause this or not .

    This was my original thought. Perhaps we didn't wait long enough for everything to trickle through...

    We can't create a new publication because of the snapshot issue. It would involve an entire snap of all the data and place us in a rather long maintenance window we can't really get approval for right now.

    No , it wont require a re-snap thats the point . We are not intializing the subscription as the data is available already in subscription . Replication with backup or Replication with support only is used to avoid massive snapshot generation .

  • Instead of stopping the distribution agent first, you should stop the log reader agent on the published database.

    Wait for replication to quiesce, stop the distribution agent, recreate the new subscription and you should be done.

    This of course assumes this is the only publication from that database.

  • Why the log reader agent?

    ______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience

  • MyDoggieJessie (7/8/2013)


    I am wondering even stopping the distribution job some batch of transactions was still in progress can cause this or not .

    This was my original thought. Perhaps we didn't wait long enough for everything to trickle through...

    We can't create a new publication because of the snapshot issue. It would involve an entire snap of all the data and place us in a rather long maintenance window we can't really get approval for right now.

    The only think I can think of is:

    1. Redo it again, waiting much longer to ensure there are no pending rpl_commands, checking and then rechecking the distribution database to make 100% certain everything is replicated before taking the FULL backup of the subscription databases

    2. If there are lingering commands, we need to find a way to push those to the new subscriber as well before turning the distribution agent back on

    3. Dropping the troublesome article(s) from the existing publication, then re-adding them back in to ensure we minimize the impact on our OLTP system

    Perfect. Please share us the results of your re work. I can't think of any reason why you miss of records. And please turn on the server side trace when you enable the replication to make sure you capture the all the events.

    -- Babu

  • MyDoggieJessie (7/8/2013)


    Why the log reader agent?

    That stops transactions from being sent to the distribution database.

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply