Blog Post

Are CHECKSUMs being performed with a Compressed Backup?

,

Now, why in the world would I be asking a question like this?

Well, in BOL (BACKUP) is this little blurb:

CHECKSUM

Specifies that the backup operation will verify each page for checksum and torn page, if enabled and available, and generate a checksum for the entire backup. This is the default behavior for a compressed backup.

Using backup checksums may affect workload and backup throughput.

(Emphasis mine)

However, when performing a compressed backup and examining the results in msdb.dbo.backupset, the has_backup_checksums column is not set to 1. Let’s run through a quick example to see what I’m talking about.

First, let’s create a database and a table, and populate it with a million rows:

USE master;
GO
-- drop the database if it already exists
IF DB_ID('BackupTest') IS NOT NULL DROP DATABASE BackupTest;
GO
-- make a database. make a table. populate the table.
CREATE DATABASE BackupTest;
GO
USE BackupTest;
GO
CREATE TABLE dbo.Tally (
    N INTEGER CONSTRAINT PK_Tally PRIMARY KEY CLUSTERED 
    );
WITH Tens    (N) AS (SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
                     SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
                     SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1),
     Hundreds(N) AS (SELECT 1 FROM Tens t1, Tens t2),
     Millions(N) AS (SELECT 1 FROM Hundreds t1, Hundreds t2, Hundreds t3),
     Tally   (N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM Millions)
INSERT INTO dbo.Tally (N)
SELECT N FROM Tally;
GO

Now, let’s perform various full database backups on this database.

USE master;
GO
-- ensure that the backup directory exists
EXECUTE xp_create_subdir 'C:\BackupTest';
 
-- plain full backup, no options
BACKUP DATABASE BackupTest 
TO DISK = 'C:\BackupTest\BackupTest1_FULL_NoOptions.bak';
GO
-- full backup with checksums
BACKUP DATABASE BackupTest
TO DISK = 'C:\BackupTest\BackupTest2_FULL_CHECKSUM.bak'
WITH CHECKSUM;
GO
-- compressed backup
BACKUP DATABASE BackupTest
TO DISK = 'C:\BackupTest\BackupTest3_FULL_COMPRESSION.bak'
WITH COMPRESSION;
GO
-- compressed backup, no checksums
BACKUP DATABASE BackupTest
TO DISK = 'C:\BackupTest\BackupTest4_FULL_COMPRESSION_NOCHECKSUM.bak'
WITH COMPRESSION, NO_CHECKSUM;
GO
-- compressed backup, checksums
BACKUP DATABASE BackupTest
TO DISK = 'C:\BackupTest\BackupTest5_FULL_COMPRESSION_CHECKSUM.bak'
WITH COMPRESSION, CHECKSUM;
GO

Now let’s looks at the results of these backups as stored in msdb:

-- show the backups that were performed along with relevant information
SELECT  [DATABASE name] = database_name, 
        [backup start date] = backup_start_date,
        [has backup checksums] = has_backup_checksums,
        [backup size] = backup_size,
        [compressed backup size] = compressed_backup_size 
FROM    msdb.dbo.backupset 
WHERE   database_name = 'BackupTest'
AND     backup_start_date > DATEADD(DAY, DATEDIFF(DAY, '2000-01-01', GETDATE()), '2000-01-01')
;

For the backups I just performed, this shows (I’ve manually added the last column):

database namebackup start datehas backup checksumsbackup sizecompressed backup sizeoptions used
BackupTest2013-12-18 14:34:50.00001701580817015808none
BackupTest2013-12-18 14:34:50.00011701580817015808CHECKSUM
BackupTest2013-12-18 14:34:50.0000170158082667989COMPRESSION
BackupTest2013-12-18 14:34:50.0000170158082667990COMPRESSION, NO_CHECKSUM
BackupTest2013-12-18 14:34:50.0001170158082518869COMPRESSION, CHECKSUM

Let’s look at what the actual backup files show…

-- show the backup file headers
RESTORE HEADERONLY FROM DISK = 'C:\BackupTest\BackupTest1_FULL_NoOptions.bak';
RESTORE HEADERONLY FROM DISK = 'C:\BackupTest\BackupTest2_FULL_CHECKSUM.bak';
RESTORE HEADERONLY FROM DISK = 'C:\BackupTest\BackupTest3_FULL_COMPRESSION.bak';
RESTORE HEADERONLY FROM DISK = 'C:\BackupTest\BackupTest4_FULL_COMPRESSION_NOCHECKSUM.bak';
RESTORE HEADERONLY FROM DISK = 'C:\BackupTest\BackupTest5_FULL_COMPRESSION_CHECKSUM.bak';

Which returns the same results as the query against the msdb.dbo.backupset table.

So, where does this leave us? Are checksums being performed with a compressed backup? By looking at only the has_backup_checksums column, then it appears the answer is no. Even when you also look at the compressed backup size, any compressed backup without the checksum option gives file sizes of the same relative size (I show one byte difference). So, we appear to have a discrepancy between the documented behavior in BOL, and what is actually being observed.

So the real question becomes: Does performing a compressed backup provide the same usefulness as when specifying the checksum option? As it turns out, after I started writing this blog post (and before I finished it), Brent Ozar had the same question. Doing something that I hadn’t considered, he pulled out his hex editor and… well, read about it here. The quick summary is: if you don’t specify the checksum option, then page checksums / torn pages aren’t being checked for. Which means that BOL is wrong. (Thanks Brent!)

It is my opinion that when the compressed backup is occurring, it calculates an internal checksum for the data being backed up (a calculation based upon the data being backed up, NOT verifying the page checksum values previously calculated or checking for torn pages like the checksum option for the backup command does). I think that many years ago, back when backup compression was being added to SQL 2008, the BOL writer saw “checksum is calculated during a compressed backup”, and wrongfully wrote up that the checksum option is the default for a compressed backup. Again, this is just my opinion, and I have no facts to back this up.

Since it is a good practice to utilize the checksum option when doing backups (to ensure that there isn’t any corruption from the IO subsystem), you should know that you cannot perform this from a maintenance plan – that option isn’t there. I was really hoping that the compressed backups did do the checksums, as it would be easy to implement at all of the clients that still use maintenance plan (since making a compressed backup is an option in the maintenance plan). So you will need to use a solution based upon executing the T-SQL statement to use the checksum option (such as Ola Hallengren’s free solution). This is just another reason why using a script-based solution is better than using a maintenance plan.

If you made it this far, I hope that you will visit this connect item (Backup COMPRESSION enables CHECKSUM?) to vote on getting this fixed.

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating