Database corruption on a merge replication subscriber

  • Merge replication is being used to provide inter-site high availability for a 7*24 database with 99.999% availability requirements (we have had no downtime in 2 years). In the event that our primary data centre encounters a major outage we failover to our secondary data centre where the subscriber becomes our primary database server, with any changes replicated to the publisher at the primary data centre when the outage is resolved.

    As a result of an error identified by Symantec BackupExec, a DBCC CHECKDB was run on the subscriber databases. It revealed a data corruption with a minimum repair level of REPAIR_ALLOW_DATA_LOSS. The table in question, including indexes, is approximately 6GB and is an article in a publication comprising a number of larger tables.

    We have a full Backup Exec database backup prior to the corruption (which occurred 4 days ago!). As the corruption only affects a single page (comprising 37 rows) we could consider recovering the page but it is far from obvious how to achieve that with Backup Exec.

    As well as the potential for data loss and referential implications with REPAIR_ALLOW_DATA_LOSS, it does not seem to be viable given the issues concerning replication documented here: http://msdn.microsoft.com/en-us/library/ms176064(v=sql.100).aspx: merge replication may not replicate the changes as triggers will not be executed as a result of changes due to CHECKDB; we would have to quiesce replication and stop all at activity at the publisher, which is not feasible given the implications for our service.

    We have considered removing the table from the publisher publication; dropping the table at the subscriber, creating a new publication with the table at the publisher and relying on replication to refresh the secondary but this could leave the corrupt page at the subscriber.

    So, the question is, how can we resolve the corruption with minimal disruption at the publisher and minimal impact on our high availability architecture.

  • Wish you had chosed mirroring for HA instead of merge replication , is there particular constraint for going the merge replication way πŸ™‚

    Quick one would be use BCP to export the table and import them into subcriber , then re-configure replication this would if there are no referential integrity in place or logical Data relationships are not compromised.

    Other process you can consider in backup of publisher and restore to subscriber , then try configuring merge replication. For this you may have to have enough backup space as you might be aware.

    Cheer Satish πŸ™‚

  • Thanks for the response.

    Wish you had chosed mirroring for HA instead of merge replication , is there particular constraint for going the merge replication way πŸ™‚

    So do I. Bandwidth restrictions between our two sites prevents the user of mirroring.

  • Resolved the issue with the following process:

    Stop the merge agent jobs for the publications:

    Set database to single user mode: ALTER DATABASE database SET SINGLE_USER WITH ROLLBACK IMMEDIATE

    Ran DBCC CHECKDB to repair the corrupt page: DBCC CHECK DB (β€˜database’, REPAIR_ALLOW_DATA_LOSS)

    DBCC CHECKCONSTRAINTS WITH ALL_CONSTRAINTS

    Set database to multi-user: ALTER DATABASE database SET MULTI_USER; GO

    Restarted the merge agent jobs

    Confirmed replication running correctly in Replication Monitor

    Used sp_addtabletocontents with a filter to repopulate the data that had been lost due to the repair

    Ran DBCC CHECKDB to confirm that the database now in a good state

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply