Transactional Replication failed after restoring full back up of database in NORECOVERY, then restoring DIFF backup in FULL RECOVERY - PLEASE HELP!

  • We had a siutation last night in our production environment that forced us to revert back to an earlier version of the database (before a major code rollout that failed). After restoring the days full backup (with NORECOVERY), and then restoring a DIFF backup (FULL RECOVERY and had checked Preserve Replication Settings)...the transactional replication failed.

    I know there's an issue with the

    Here's the errors:

    Message #1

    The replication agent has been successfully started. See the Replication Monitor for more information.

    Message #2

    2011-03-04 15:07:17.566 Copyright (c) 2008 Microsoft Corporation

    2011-03-04 15:07:17.566 Microsoft SQL Server Replication Agent: logread

    2011-03-04 15:07:17.566

    2011-03-04 15:07:17.566 The timestamps prepended to the output lines are expressed in terms of UTC time.

    2011-03-04 15:07:17.566 User-specified agent parameter values:

    -Publisher MyPublisher

    -PublisherDB MyDB

    -Distributor MyServer

    -DistributorSecurityMode 1

    -Continuous

    -XJOBID 0x551234548201E5498734D4BD2B73B9BA

    -XJOBNAME MyServer-MyDB-1

    -XSTEPID 2

    -XSUBSYSTEM LogReader

    -XSERVER MyServer

    -XCMDLINE 0

    -XCancelEventHandle 0000000000000E10

    -XParentProcessHandle 00000000000009EC

    2011-03-04 15:07:17.738 Parameter values obtained from agent profile:

    -pollinginterval 5000

    -historyverboselevel 1

    -logintimeout 15

    -querytimeout 1800

    -readbatchsize 500

    -readbatchsize 500000

    2011-03-04 15:07:17.784 Status: 4096, code: 20024, text: 'Initializing'.

    2011-03-04 15:07:17.784 The agent is running. Use Replication Monitor to view the details of this agent session.

    2011-03-04 15:07:17.831 Status: 0, code: 20011, text: 'The process could not execute 'sp_repldone/sp_replcounters' on MyServer.

    2011-03-04 15:07:17.831 The process could not execute 'sp_repldone/sp_replcounters' on MyServer.

    2011-03-04 15:07:17.831 Status: 0, code: 9003, text: 'The log scan number (193864:92424:1) passed to log scan in database MyDB is not valid. This error may indicate data corruption or that the log file (.ldf) does not match the data file (.mdf). If this error occurred during replication, re-create the publication. Otherwise, restore from backup if the problem results in a failure during startup. '.

    2011-03-04 15:07:17.831 Status: 0, code: 22017, text: 'The process could not set the last distributed transaction.'.

    2011-03-04 15:07:17.831 Status: 0, code: 22037, text: 'The process could not execute 'sp_repldone/sp_replcounters' on MyDB .'.

    Message #3

    The replication agent encountered an error and is set to restart within the job step retry interval. See the previous job step history message or Replication Monitor for more information.

    Another Message

    2011-03-04 08:01:42.311 Copyright (c) 2008 Microsoft Corporation

    2011-03-04 08:01:42.311 Microsoft SQL Server Replication Agent: logread

    2011-03-04 08:01:42.311

    2011-03-04 08:01:42.311 The timestamps prepended to the output lines are expressed in terms of UTC time.

    2011-03-04 08:01:42.311 User-specified agent parameter values:

    -Publisher MyPublisher

    -PublisherDB MyDB

    -Distributor MyServer

    -DistributorSecurityMode 1

    -Continuous

    -XJOBID 0x551234548201E5498734D4BD2B73B9BA

    -XJOBNAME MyServer-MyDB-1

    -XSTEPID 2

    -XSUBSYSTEM LogReader

    -XSERVER MyServer

    -XCMDLINE 0

    -XCancelEventHandle 0000000000000DB4

    -XParentProcessHandle 0000000000000E80

    2011-03-04 08:01:42.546 Parameter values obtained from agent profile:

    -pollinginterval 5000

    -historyverboselevel 1

    -logintimeout 15

    -querytimeout 1800

    -readbatchsize 500

    -readbatchsize 500000

    2011-03-04 08:01:42.577 Status: 4096, code: 20024, text: 'Initializing'.

    2011-03-04 08:01:42.577 The agent is running. Use Replication Monitor to view the details of this agent session.

    2011-03-04 08:01:42.749 Status: 0, code: 20011, text: 'The process could not execute 'sp_repldone/sp_replcounters' on MyServer.'.

    2011-03-04 08:01:42.749 The process could not execute 'sp_repldone/sp_replcounters' on 'CCPRODDB1'.

    2011-03-04 08:01:42.749 Status: 0, code: 9003, text: 'The log scan number (193864:92424:1) passed to log scan in database MyDBis not valid. This error may indicate data corruption or that the log file (.ldf) does not match the data file (.mdf). If this error occurred during replication, re-create the publication. Otherwise, restore from backup if the problem results in a failure during startup. '.

    2011-03-04 08:01:42.749 Status: 0, code: 22017, text: 'The process could not set the last distributed transaction.'.

    2011-03-04 08:01:42.749 Status: 0, code: 22037, text: 'The process could not execute 'sp_repldone/sp_replcounters' on MyServer.'.

    I've tried reinitializing the publication/subscription and while that took brand new snapshots and copied it over to the replicated data server, it did not fix the problem.

    I read from a different post that I could try running "sp_replrestart" but that ran for about a half an hour and didn't appear to do anything but fill up our log files...did I not wait long enough?

    ANY HELP/SUGGESTIONS would be greatly appreciated!!!

    The only thing I know to do at this point is to drop the publication on the production server and rebuild it completely (and with all the tables we're replicating that would take quite a bit of time.

    ______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience

  • I would try taking a new snpahot and re-sync'ing with the subscriber.

  • Sorry, just re-read your post.. I see you've already tried that!

  • Unfortunately you are going to lose any out of sync transactions based on the state you are in. What you will need to do is stop the logreader agent for that database, open up an SSMS connection to the publication database, execute sp_repldone, close or disconnect that SSMS session and then restart the logreader agent. That should allow things to flow again.

    Unless I am mistaken, and I would have to have you set logging to verbose to check this, you have an out of sync condition in that records being pulled by logreader and trying to be inserted into the distribution database that are "out of sync". More detail that I can't provide right now unfortunately. By running sp_repldone you will set all transactions in the transaction log that are flagged for replication as completed and that SHOULD solve the issue.

    You will NEED to do a manual data compare at the end though or reinitialize again.

    Let me know how this works or if you have questions / concerns.

    David

    @SQLTentmaker

    “He is no fool who gives what he cannot keep to gain that which he cannot lose” - Jim Elliot

  • One last note, please make sure that before you run sp_repldone, you are connected to the publisher and in the published database that you are having problems with. If you do it in another published database you are going to reset all those transactions there. Not good.

    David

    @SQLTentmaker

    “He is no fool who gives what he cannot keep to gain that which he cannot lose” - Jim Elliot

  • Thanks for the quick reply. Do you have any idea how long the sp_replrestart should take? It;s been over 15 mins already and still going.

    I am beginning to think that dropping the entire publication and rebuilding it may be the best approach. What are your thoughts?

    ______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience

  • MyDoggieJessie (3/4/2011)


    Thanks for the quick reply. Do you have any idea how long the sp_replrestart should take? It;s been over 15 mins already and still going.

    I am beginning to think that dropping the entire publication and rebuilding it may be the best approach. What are your thoughts?

    If your data set is small enough that you can push the snapshot out the subscribers and that having a downtime to apply the snapshot at the subscribers is not an issue then yes, dropping and rebuilding the publication and subscriptions is the safest approach.

    David

    @SQLTentmaker

    “He is no fool who gives what he cannot keep to gain that which he cannot lose” - Jim Elliot

  • Okay, I deleted the publisher for the database I was replicating and the subscriptions on the other server.

    Recreated the publisher and added the articles I needed and ran the job to create the publication...opened up the replication monitor and see:

    Snapshot Agent: (21%) Bulk copying snapshot data for 'Table' (part this of that)

    Log Reader Agent: The process could not execute 'sp_repldone/sp_replcounters' on 'MyServer'

    Will this fix itself when the snapshots are complete or is this another problem altogether?

    I was hoping that by deleting the publication from the Publisher and it's dependent subscriptions from the distributor server that this would be fixed???

    ______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience

  • Did you try my first comment? Let me know.

    David

    @SQLTentmaker

    “He is no fool who gives what he cannot keep to gain that which he cannot lose” - Jim Elliot

  • This is coming from the Log Reader Agent:

    Error messages:

    The process could not execute 'sp_repldone/sp_replcounters' on MyServer. (Source: MSSQL_REPL, Error number: MSSQL_REPL20011)

    Get help: http://help/MSSQL_REPL20011

    The log scan number (193864:92424:1) passed to log scan in database MyDB is not valid. This error may indicate data corruption or that the log file (.ldf) does not match the data file (.mdf). If this error occurred during replication, re-create the publication. Otherwise, restore from backup if the problem results in a failure during startup. (Source: MSSQLServer, Error number: 9003)

    Get help: http://help/9003

    The process could not set the last distributed transaction. (Source: MSSQL_REPL, Error number: MSSQL_REPL22017)

    Get help: http://help/MSSQL_REPL22017

    The process could not execute 'sp_repldone/sp_replcounters' on MyServer. (Source: MSSQL_REPL, Error number: MSSQL_REPL22037)

    ______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience

  • This is what I am referring to. Have you tried this?

    "What you will need to do is stop the logreader agent for that database, open up an SSMS connection to the publication database, execute sp_repldone, close or disconnect that SSMS session and then restart the logreader agent. That should allow things to flow again. "

    David

    @SQLTentmaker

    “He is no fool who gives what he cannot keep to gain that which he cannot lose” - Jim Elliot

  • On my production box, I opened up a new window/connection.

    Ran sp_repldone NULL, NULL, 0, 1 (the defaults)

    (it said it was successful)

    Went to the Replication monitor, stopped the log reader agent, closed my window in SSMS, then restarted the log reader agent.

    Same error "The process could not execute 'sp_repldone/sp_replcounters'

    ______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience

  • And you ran that on the publication database? Sorry if I am asking picky questions but it makes a difference.

    Checking some notes I have on this as well.

    David

    @SQLTentmaker

    “He is no fool who gives what he cannot keep to gain that which he cannot lose” - Jim Elliot

  • Ok, try the following;

    use YourPublisher;

    go

    EXEC sp_repldone @xactid = NULL, @xact_segno = NULL, @numtrans = 0, @time = 0, @reset = 1

    If that doesn't work then you can try sp_replflush as well. The steps would really be;

    - Stop the logreader agent

    - Stop the distribution agents

    - Execute sp_repldone as outlined above

    - Execute sp_replflush if necessary

    - Disconnect in SSMS

    - Start logreader

    - Start distribution agents

    - Verify that it is working

    IF that doesn't work then we might have to clear out old transactions from the distribution database manually but hopefully that isn't the case. Let me know how this works.

    David

    @SQLTentmaker

    “He is no fool who gives what he cannot keep to gain that which he cannot lose” - Jim Elliot

  • No luck...

    I can't believe this is a problem after deleting the publication and subscriptions...

    ______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience

Viewing 15 posts - 1 through 15 (of 27 total)

You must be logged in to reply to this topic. Login to reply