Repair database corruption using the mirror db in SQL 2005
SQL Server 2008 has a new feature that allows for automatic page repair. This feature automatically requests pages from the mirrored database to fix certain kinds of corruption on the primary and vice versa. Unfortunately this feature does not exist in SQL 2005 but we might still be able to use the mirrored database to recovery from corruption.
We have a database which is running on SQL 2005 Enterprise Edition and is mirrored for HA/DR. I'll call the current primary database server 1 and the mirrored database server 2. The database is in high safety with automatic failover since we also have a witness server.
A disk controller failed on server 1 causing a database failover to server 2. Once the disk controller was back online and validated, server 1 was brought back online as the mirror server and server 2 being the new primary began pushing transactions back to server 1. With the controller failure we were nervous about corruption now on server 1.
To check for corruption on Server 1(current mirror) we needed to create a database snapshot so we could run DBCC CHECKDB against it. After running CHECKDB, the results were in.
CHECKDB found 0 allocation errors and 8 consistency errors in table 'TABLE' (object ID 462682974).
CHECKDB found 0 allocation errors and 8 consistency errors in database 'DATABASE'.
repair_allow_data_loss is the minimum repair level for the errors found by DBCC CHECKDB Object ID 462682974, index ID 1, partition ID 72057605514461184, alloc unit ID 72057605727518720 (type In-row data): Page (7:23955756) could not be processed. See other errors for details.
Page (7:23955756) was not seen in the scan although its parent (7:23955744) and previous (7:23955755) refer to it. Check any previous errors.
Page (7:23955757) is missing a reference from previous page (7:23955756). Possible chain linkage problem.
Crap. Now what? After checking the dump file from the SQL Server logs we found that we had a couple of corrupt pages in the clustered index of TABLE. Had the corruption been in a nonclustered index, we could just drop and recreate it. We could also see the 2 corrupt pages in the suspect_pages table in the msdb. The event_type was 2 which indicated a bad checksum.
SELECT * FROM msdb..suspect_pages
Now we needed to make sure we didn't have any corruption on server 2 (current primary). We shouldn't since SQL mirroring works at the transaction level. After running CHECKDB we finally got some good news.
CHECKDB found 0 allocation errors and 0 consistency errors in database 'DATABASE'.
Now it's time to decide how to fix the corruption. We had a couple different options.
- Run CHECKDB with repair_allow_data_loss
- Drop the mirrored database that had corruption and rebuild mirroring
- Do a page level restore from backups
- Call MS and see if they have a tool that will work like mirroring in 2008 and recovery the corrupt pages from the mirror
Option 1 was not really an option since it can lose data. This is not recommended by any SQL Server experts. Option 2 would work fine but did I mention this is a 3TB mirrored database? I really didn't want to rebuild mirroring for a 3TB database just to recover 16KB of data. Option 3 would not work since we are using SAN snapshot technology and not native SQL backups.
My last hope was a call to Microsoft. I called and opened a case with PSS. After speaking with PSS they informed me that they did not have a tool for automatic page repair for SQL 2005. So we started brainstorming ideas and we finally settled on rebuilding the clustered index on the current primary, which had no corruption. The thought was that rebuilding the index would ship the transactions to the mirror where the corruption was and remove it. I used the following index rebuild statement.
CREATE UNIQUE CLUSTERED INDEX [CL_INDEX]
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = ON, DROP_EXISTING = ON, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, MAXDOP = 4) ON [PRIMARY]
After the clustered index was rebuilt we took another database snapshot on the mirror and ran DBCC CHECKTABLE against the table with the 2 corrupt pages and found that the corruption was gone. Success!
Without the automatic page repair functionality of SQL Server 2008 we were still able to recovery from corruption that occurred in our primary database by using the mirror server. Doing an index rebuild was much less time consuming than rebuilding mirroring for a 3TB database and it fixed our issue. Also a big thanks to Microsoft PSS and their team for saving me a ton of time.