Frequent Transactional Replication failures on SQL 2008 X64

  • Hi Team,

    I am facing an issue with frequent Replication failures on SQL 2008 x64 --exact version --> Microsoft SQL Server 2008 (SP1) - 10.0.2531.0 (X64) . Below is the complete error that I am seeing in the logs.

    Error message:

    ------------------------

    The process could not execute 'sp_replcmds' on 'SERVERNAME'. (Source: MSSQL_REPL, Error number: MSSQL_REPL20011)

    Get help: http://help/MSSQL_REPL20011

    A system assertion check has failed. Check the SQL Server error log for details. Typically, an assertion failure is caused by a software bug or data corruption. To check for database corruption, consider running DBCC CHECKDB. If you agreed to send dumps to Microsoft during setup, a mini dump will be sent to Microsoft. An update might be available from Microsoft in the latest Service Pack or in a QFE from Technical Support. (Source: MSSQLServer, Error number: 3624)

    Get help: http://help/3624

    on 'SERVERNAME'. (Source: MSSQL_REPL, Error number: MSSQL_REPL22037)

    Get help: http://help/MSSQL_REPL22037

    I went through the SQL Dumps and found the following error : <<<File: <replicat.cpp>, line=2686 Failed Assertion = 'pRowset'. >> More details below:

    2010-07-07 14:34:25.67 spid109 Error: 17066, Severity: 16, State: 1.

    2010-07-07 14:34:25.67 spid109 SQL Server Assertion: File: <replicat.cpp>, line=2686 Failed Assertion = 'pRowset'.

    This error may be timing-related. If the error persists after rerunning the statement, use DBCC CHECKDB to check the database for structural integrity, or restart the server to ensure in-memory data structures are not corrupted.

    2010-07-07 14:34:25.70 spid109 Error: 3624, Severity: 20, State: 1.

    2010-07-07 14:34:25.70 spid109 A system assertion check has failed. Check the SQL Server error log for details. Typically, an assertion failure is caused by a software bug or data corruption. To check for database corruption, consider running DBCC CHECKDB. If you agreed to send dumps to Microsoft during setup, a mini dump will be sent to Microsoft. An update might be available from Microsoft in the latest Service Pack or in a QFE from Technical Support.

    2010-07-07 14:34:25.75 spid108 Error: 14151, Severity: 18, State: 1.

    2010-07-07 14:34:25.75 spid108 Replication-Replication Transaction-Log Reader Subsystem: agent failed. The process could not execute 'sp_replcmds' on 'Servername'.

    Steps tried already:

    ------------------

    --DBCC Check DB has been performed with allinfo and there are no errors received for the publisher or subscriber db's.

    --We are able to resolve the problem by re-initializing but as mentioned earlier -- this is a frequent failure and seems like what we are doing to fix it is just a workaround.

    --Have googled around for info on this issue and it looks like this might be a problem with SQL 2008 x64 ....

    I am not sure how to proceed.......Please help? Is there anything else I should check?

  • Service Pack 1 is over a year old now and there have been multiple Cumulative Updates since then. You might want to start implementing those in your test environment and see if they resolve the issue (you could also go through the list of items fixed in each CU http://support.microsoft.com/kb/970365/en-us



    Shamless self promotion - read my blog http://sirsql.net

  • Thanks for the info!

    Unfortunately, I have not been able to reproduce this issue in the test environment.....I have been through the CU's, there is 1 replication related fix mentioned in there.

    http://support.microsoft.com/kb/971136/ FIX: Error message when you synchronize a database or manually run sp_replcmds after you perform partial updates on a varbinary(max) column in SQL Server 2008: "The rowset does not contain any column with offset -1."

    Not sure if this would address the issue that I am facing.

    Moreover, I found a few other links for updates referencing the assertion failure in SQL 2008 - http://support.microsoft.com/kb/974319/ ; http://support.microsoft.com/kb/975719/ ; again ..these could be tried out in test, however, since I do not have a test repro available I think it would get a little risky to try the fixes on production.

    Any more suggestions would be really appreciated.

  • I have decided to have Cumulative 9 applied and if that does not fix it ...next step would be escalating it to Microsoft.

    Thanks for the comments!

  • were you able to fix this issue. We are using sql server 2005 sp3. we are also having the same issue.dbcc checkdb also did not help.

  • Any resolution found for this issue .

  • Hi.

    I've seen this before, and it usually means that the distribution stored procedures are fouled up. Assuming that this is transactional replication with continuous updates after a snapshot. This is my recommendation:

    1. Script out each subscription as create and drop it.

    2. Script out the publication as create and drop it.

    3. If the subscriber database is not subscribed to any other publications, run sp_removedbreplication on each subscriber database. If it is, then you are in the unenviable position of recreating all of the replication publications that publish to this database.

    4. If the publisher database is not subscribed to any other publications, run sp_removedbreplication on the publisher database. If it is, then you are in the unenviable position of recreating all of the replication publications that publish from this database.

    5. Execute the Step 2 create script on the publisher.

    6. Execute each Step 1 create script for subscribers.

    7. Run Snapshot.

    8. Check everything out.

    Needless to say, you should have a full backup of publisher and subscriber DB prior to step 1, and another one after Step 4.

    Sorry, this is a known issue and the above is the only effective fix I've ever found.

    Thanks

    John.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply