Day 5 of 31 Days of Disaster Recovery: Dealing With Corruption in a Nonclustered Index

31 Days of Disaster Recovery

Welcome to day 5 of my series on disaster recovery. I want to start digging into some corruption scenarios. We’ll start off with the easiest form of corruption to fix, a nonclustered index.

The generic steps we will go through for any corruption scenario are as follows:

Identify the corruption (DBCC CheckDB)
Identify the objects and types of objects involved
Take the appropriate steps to correct

Sadly, one of the biggest mistakes people make is to jump straight to the third step and start trying to fix things without even knowing what they are up against. Different objects have to be fixed in different ways. taking the wrong action could cause unrecoverable damage and waste a lot of time. And please, please, PLEASE do not use the repair options of DBCC CheckDB unless everything else is not possible.

If you missed any of the earlier posts in the series, you can check them out here:

31 Days of disaster Recovery

Does DBCC Automatically Use Existing Snapshot?
Protection From Restoring a Backup of a Contained Database
Determining Files to Restore Database
Back That Thang Up

Identifying Corruption

The first thing you need to do is to identify corruption. You will probably be performing routine integrity checks or you will be responding to a specific alert or error. If you have an error message, you will have the info needed for at least the one page. You may be tempted to take action on that one page, but I advise you to take a step back and run DBCC CheckDB on the database first. There may be additional pages corrupted that force a different plan of action.

Use DBCC CheckDB to get the full list of errors so you can see which pages are corrupt. I like to use the No_InfoMsgs option to reduce unnecessary chatter, the All_ErrorMsgs option to make sure all errors are returned, and the TableResults undocumented option to output the results in a more readable format. For this demo, I will be running this on a corrupted version of the AdventureWorksDW2012 database.

DBCC CheckDB(AdventureWorksDW2012)
    With No_InfoMsgs, All_ErrorMsgs, TableResults;

This returns a lot of errors for the same things. So it takes a little practice to know what to look for. You need to identify which errors are the real errors and focus on those. You will want to focus on the errors that tell you an object ID, index ID, partition ID, allocation unit ID, file, and page.

DBCC CheckDB Output — Click to Enlarge

MSDB Suspect Pages Output

Identify the Corrupt Objects

After running DBCC CheckDB, you can also check the msdb database to see the pages that were identified as being corrupted. It will have an entry for each corrupt page and give you only very base information. It does not give you the object ID or the index ID, but it does give you the database ID, file, and page. You can use this to dump the page using DBCC Page and get the additional info from there. We don’t need to do that in most situations though. We have the iC CheckDB output above. There is only one object ID reported, and I use the OBJECT_NAME() function to get its name (FactResellerSales). I also see that the index ID is 2. I don’t need to look up the index. The index ID tells me everything I need to know.

Index ID Mapping

ID 0 = heap
ID 1 = clustered index
ID > 1 = nonclustered index

Take Action

Now that we know we’re dealing with just a nonclustered index, we can take the appropriate action. The easiest way to fix a corrupt nonclustered index is to drop and recreate the index. Since the underlying clustered index or heap isn’t corrupted, it can easily recreate it as the data is all still there. However, we are limited to drop and recreate. If we try to rebuild or create with drop_existing, it will fail. Rebuild and create with drop_Existing will scan the existing index and will fail when it hits the corruption.

Now we do need the index name since we are going to be working with it directly. I query sys.indexes to get the object names.

Select Object_Name(object_id) As TableName, name As IndexName
From sys.indexes
Where object_id = 341576255
and index_id = 2

This tells me that the table name is FactResellerSales and the index name is IX_FactResellerSales_CurrencyKey. So just to illustrate that we have to drop and recreate it, let’s try rebuilding it first.

-- Rebuild the index??
Alter Index IX_FactResellerSales_CurrencyKey
    On dbo.FactResellerSales
    Rebuild;

And that gives us this error:

The statement has been terminated.
Msg 824, Level 24, State 2, Line 2
SQL Server detected a logical consistency-based I/O error: incorrect pageid (expected 1:7171; actual 0:0). It occurred during a read of page (1:7171) in database ID 8 at offset 0x00000003806000 in file 'C:\Program Files\Microsoft SQL Server\MSSQL11.SQL13\MSSQL\DATA\AdventureWorksDW2008R2_Data.mdf'.  Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.

Now let’s try using CREATE INDEX … WITH DROP_EXISTING. For this, I will need to script out the create index statement.

-- Create with drop existing??
Create Index IX_FactResellerSales_CurrencyKey
    On dbo.FactResellerSales(CurrencyKey)
    With Drop_Existing;

This yields the exact same error as the rebuild statement did. This leaves us with drop and recreate. I write the drop statement followed by the create statement and run that:

-- Drop and create?
Drop Index dbo.FactResellerSales.IX_FactResellerSales_CurrencyKey;

Create Index IX_FactResellerSales_CurrencyKey
    On dbo.FactResellerSales(CurrencyKey);

That completed successfully. We should be free of corruption now. So let’s run DBCC CheckDB again to verify. This time CheckDB completes successfully with 0 errors.

Summary and Follow-up

As I showed above, some forms of corruption are actually very easy to fix. There’s no reason to start panicking when you get a corruption error message. Just follow the steps defined above and then take the appropriate action. I’ll get into some other types of corruption later in the series. In the meantime, you can download the sample corrupt database I used tonight and the code to step you through fixing it yourself.

Sample database and demo code: CorruptionDemo_AdventureWorksDW2012.zip (12.22 MB)

Day 5 of 31 Days of Disaster Recovery: Dealing With Corruption in a Nonclustered Index

Identifying Corruption

Identify the Corrupt Objects

Take Action

Summary and Follow-up

Rate

Share

Share

Rate

Day 5 of 31 Days of Disaster Recovery: Dealing With Corruption in a Nonclustered Index

Identifying Corruption

Identify the Corrupt Objects

Take Action

Summary and Follow-up

Rate

Share

Share

Rate

Related content

Disaster Recovery Docs–Offsite

Day 31 of 31 Days of Disaster Recovery: Backup and Restore of the Resource Database

Day 28 of 31 Days of Disaster Recovery: Recovering SQL if the Tempdb Drive Dies

Automatic page repair with SQL Server Mirroring

SQL Server 2012 AlwaysOn & A Thorough Setup