Blog Post

Day 9 of 31 Days of Disaster Recovery: Use All the Checksums

,

31 Days of Disaster Recovery

31 Days of Disaster Recovery

Welcome to day 9 of my 31 Days of Disaster Recovery series. Today, I want to talk about the three ways you can use CHECKSUM to protect yourself from and identify corruption. Checksum is the default page verification option in SQL Server 2005+ and helps identify corruption to data pages. The other two uses of Checksum are options for the BACKUP and RESTORE commands. When we are done here, hopefully you will be convinced to use all 3 CHECKSUM options.

If you missed any of the earlier posts in my DR series, you can check them out here:

    31 Days of disaster Recovery

  1. Does DBCC Automatically Use Existing Snapshot?
  2. Protection From Restoring a Backup of a Contained Database
  3. Determining Files to Restore Database
  4. Back That Thang Up
  5. Dealing With Corruption in a Nonclustered Index
  6. Dealing With Corruption in Allocation Pages
  7. Writing SLAs for Disaster Recover
  8. Resolutions for All DBAs

CHECKSUM Page Verification

There are 3 page verification options in SQL Server 2005 and newer. Checksum is the current default setting and the most comprehensive. Torn page detection was the default in SQL Server 2000. Torn page detection is less comprehensive because the way it works is to write a 2-bit pattern to the header of the page and then an alternating 2-bit pattern every 512 byte sector. This means it is possible for corruption to occur in a very small area of the page and not be detected. Checksum on the other hand generates a different value if even one bit of data is different. Any amount of corruption will result in a different value and the page will be flagged as corrupted.

Since the default page verification option changed, new databases are protected, but if an older database is upgraded, the page verification option is not changed. If you have really old databases, they may not be fully protected. You should check the page verification option in sys.databases and update those databases.

Here’s the tricky part. If you change the page verification option, you are not automatically protected. It doesn’t automatically rewrite every page to have checksum values when you change it. The next time a page is written to disk, the checksum value is written to the page. My recommendation when changing to checksum page verification is during the next index maintenance period, simply rebuild every index or heap (SQL Server 2008+) rather than rebuilding selectively. This will cause every table and index to be rewritten to disk and ensure that all data structures are protected by checksums.

The third page verification option is NONE. This has NEVER been the default page verification option and if you have databases using this option then it is time for someone to find a new job. This option is only set if someone changes it and whether the change was done maliciously or due to not knowing better, that person should not be touching SQL Server.

Backup and Restore With Checksum

The final 2 ways to use checksum is with the backup and restore commands. When you use WITH CHECKSUM with the backup and restore commands, it performs extra checks. If checksums exist on a page (see previous section), it will recalculate the checksum values and ensure that they are still corruption free. This does NOT replace the need to run regular integrity checks with DBCC CHECKDB, but it gives you an extra opportunity to catch corruption. Furthermore, if you backup a database that has become corrupted without using this option, it may complete successfully with no warnings or errors. This often leads to corruption getting backed up and restored to other servers or environments without anyone realizing it. Additionally, when the backup completes, it will generate a checksum for the entire backup.

When restoring a backup using WITH CHECKSUM, it performs the same checks. The command does require that the backup was created using the checksum option, but it ensures that you don’t restore a corrupted database without being aware you are doing so. Also, if the backup became corrupt after it was created, then the restore is able to detect that and fail the restore almost immediately rather than waiting until it hits the corrupt page. If the backup file became corrupt, the checksum for the backup will be different and for a very large database, this can save you hours of time.

If you did hit corruption when restoring or backing up with the checksum option, you can complete the backup or restore using the CONTINUE_AFTER_ERROR option. Creating a backup of a corrupt database using this option will mark the database as being damaged and will ensure that the backup cannot be restored unless the continue after error option is also used for the restore. This will prevent someone from unwittingly restoring a backup of a corrupt database without realizing it.

EDIT: recommendation from Paul Randal (blog|@PaulRandal)

A great way to check a backup for corruption is to run RESTORE VERIFYONLY … WITH CHECKSUM; to perform the same checks (check the checksum of every page with checksum values and check the checksum of the backup file) without actually restoring it.

See it in Action

I put together a demo so you can prove to yourself the value of using all 3 checksum options. Rather than explain the demo in the blog post here, the demo code is heavily commented. It uses the two sample databases I’ve used in 2 earlier posts in this series. You may already have them downloaded, or you can download them again. I have provided the demo script and the 2 sample corrupt databases in separate zip files so you can download only what you need.

Sample corrupt databases: SampleCorruptDBs.zip (12.33 MB)
Demo code: Demo_BackupWithCheckSum.zip (2 KB)

-- Check page verify setting
-- Note that both databases have Checksums enable
Select name, page_verify_option_desc
From sys.databases
Where name in ('PFSCorruption', 'AdventureWorksDW2012');
-- Check to see if the databases are corrupted
-- Note that both databases are
DBCC CheckDB(AdventureWorksDW2012)
    With No_InfoMsgs, All_ErrorMsgs, TableResults;
DBCC CheckDB(PFSCorruption)
    With No_InfoMsgs, All_ErrorMsgs, TableResults;
-- Backup PFSCorruption with Checksum
-- Note that it suceeds even though we know it is corrupt
-- Pages have not had checksum values written yet
Backup Database PFSCorruption
    To Disk = 'C:\bak\PFSCorruption_checksum.bak'
    With Init, Checksum;
-- Backup AdventureWorksDW2012 with Checksum
-- Note that this backup fails and notifies us that it is corrupt
-- Pages have had the checksum values written to them
Backup Database AdventureWorksDW2012
    To Disk = 'C:\bak\AdventureWorksDW2012_checksum.bak'
    With Init, Checksum;
-- Backup AdventureWorksDW2012 without Checksum
-- Suceeds with no warning or errors
Backup Database AdventureWorksDW2012 To Disk = 'C:\bak\AdventureWorksDW2012_nochecksum.bak'
    With Init;
-- Use Continue_After_Error option to get backup of corrupt database
Backup Database AdventureWorksDW2012
    To Disk = 'C:\bak\AdventureWorksDW2012_checksum.bak'
    With Init, Checksum, Continue_After_Error;
-- What happens if we restore AdventureWorksDW2012_nochecksum.bak?
-- Database restores with no errors. Another lost opportunity to catch corruption.
-- We have potentially spread corruption to different server/environment
Restore Database CorruptDB_nochecksum
    From Disk = 'C:\bak\AdventureWorksDW2012_nochecksum.bak'
    With Move 'AdventureWorksDW2008R2_Data' To 'c:\bak\AdventureWorksDW2008R2_Data.mdf',
        Move 'AdventureWorksDW2008R2_Log' To 'c:\bak\AdventureWorksDW2008R2_Log.ldf';
-- Let's try it again with the checksum option
If DB_ID('CorruptDB_nochecksum') Is Not Null
    Drop Database CorruptDB_nochecksum;
-- Restore fails because we cannot use checksum for restore unless
-- we used it for the backup
Restore Database CorruptDB_nochecksum
    From Disk = 'C:\bak\AdventureWorksDW2012_nochecksum.bak'
    With Checksum,
        Move 'AdventureWorksDW2008R2_Data' To 'c:\bak\AdventureWorksDW2008R2_Data.mdf',
        Move 'AdventureWorksDW2008R2_Log' To 'c:\bak\AdventureWorksDW2008R2_Log.ldf';
-- What happens if we restore AdventureWorksDW2012_checksum.bak
-- Restore fails because it was marked as a corrupt database
Restore Database CorruptDB_checksum
    From Disk = 'C:\bak\AdventureWorksDW2012_checksum.bak'
    With Move 'AdventureWorksDW2008R2_Data' To 'c:\bak\AdventureWorksDW2008R2_Data.mdf',
        Move 'AdventureWorksDW2008R2_Log' To 'c:\bak\AdventureWorksDW2008R2_Log.ldf';
-- Let's try it again with the checksum option
-- Restore fails because it was marked as a corrupt database
-- Requires using the continue after error option
Restore Database CorruptDB_checksum
    From Disk = 'C:\bak\AdventureWorksDW2012_checksum.bak'
    With Checksum,
        Move 'AdventureWorksDW2008R2_Data' To 'c:\bak\AdventureWorksDW2008R2_Data.mdf',
        Move 'AdventureWorksDW2008R2_Log' To 'c:\bak\AdventureWorksDW2008R2_Log.ldf';
-- Let's try it again with the checksum and Continue_After_Error options
-- Suceeds with a low level warning telling us that the database
-- was damaged and should be checked out
Restore Database CorruptDB_checksum
    From Disk = 'C:\bak\AdventureWorksDW2012_checksum.bak'
    With Checksum,
        Move 'AdventureWorksDW2008R2_Data' To 'c:\bak\AdventureWorksDW2008R2_Data.mdf',
        Move 'AdventureWorksDW2008R2_Log' To 'c:\bak\AdventureWorksDW2008R2_Log.ldf',
        Continue_After_Error;
-- Cleanup extra database(s)
If DB_ID('CorruptDB_checksum') Is Not Null
    Drop Database CorruptDB_checksum;
If DB_ID('CorruptDB_nochecksum') Is Not Null
    Drop Database CorruptDB_nochecksum;

Summary

You can see that the checksum options I’ve described in this post are very useful for protecting the integrity of your data and for increasing the chances that you will find corruption early as well as being the quickest way to detect a corrupt backup file. I highly encourage you to step through the demo code and see how it protects you and can save you a lot of hardship down the line.

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating