SQLServerCentral Article

Turn on CHECKSUM after an upgrade

,

The old days

By default in the SQL 2000 years, the only way to protect the data pages on disk from being scribbled by some rogue driver or low-level process was to turn on a database option named TORN_PAGE_DETECTION.

The idea was to detect whenever a rogue write trashed a SQL Server data page by spanning over two or more of its sixteen 512 bytes sectors, thus causing a torn page. SQL Server tracked the first bytes of each sector to compute a checksum stamped in the header of each page. Though effective in most situations, there were some cases when the trash was occuring in the middle of a sector, and the TPD mechanism was not able to see it.

Enter CHECKSUMS

In SQL Server 2005, a new option CHECKSUM has been introduced and adopted as the default option for PAGE_VERIFY. Each time a page with CHECKSUM enabled is modifed in the buffer pool and written back in the data file, SQL Server computes a checksum based on the full contents of the page and stamps it in the header. When loading the page from the data file to the buffer pool, SQL Server recomputes the checksum and compares to the value stored in the header. If the value is not the same, it means something very bad happened to this page.

OK, SQL Server 2005 being around for a certain time now, this CHECKSUM thing is not a scoop for anyone.

PAGE_VERIFY and the upgrade process

Back to the upgrade problem. If I choose to restore a SQL 2000 database in a SQL 2005 instance, the data pages will remain under the TPD protection. Consider this script:

restore database pagechecksumtest 
from disk='F:\SQLDATA$SQL2000\BACKUP\pagechecksumtest.bak'
with move 'pagechecksumtest' to 'E:\SQLDATA$SQL2005\DATA\pagechecksumtest.mdf',
move 'pagechecksumtest_log' to 'F:\SQLDATA$SQL2005\LOG\pagechecksumtest_log.ldf',
stats;
GO
16 percent processed.
24 percent processed.
33 percent processed.
41 percent processed.
58 percent processed.
66 percent processed.
74 percent processed.
83 percent processed.
91 percent processed.
100 percent processed.
Processed 96 pages for database 'pagechecksumtest', file 'pagechecksumtest' on file 1.
Processed 1 pages for database 'pagechecksumtest', file 'pagechecksumtest_log' on file 1.
Converting database 'pagechecksumtest' from version 539 to the current version 611.
Database 'pagechecksumtest' running the upgrade step from version 539 to version 551.
(...)
Database 'pagechecksumtest' running the upgrade step from version 610 to version 611.
RESTORE DATABASE successfully processed 97 pages in 0.047 seconds (16.754 MB/sec).
select name, page_verify_option_desc from sys.databases
where name = 'pagechecksumtest';
GO
name page_verify_option_desc
------------------------- -----------------------------------
pagechecksumtest TORN_PAGE_DETECTION

Therefore I absolutely need to switch to CHECKSUM mode:

alter database pagechecksumtest set PAGE_VERIFY CHECKSUM ;
GO

Well, actually the data pages are not just magically protected after switching to CHECKSUM mode. They will need to be read from the data file in the buffer pool, modified, and then written back to the data file. There is no other way to stamp a checksum in a page header, not even with a DBCC CHECKDB. CHECKDB is just validating the checksums among many other things, but not writing them in the pages.

What about READ ONLY data ?

It implies also that the checksum can only be stamped if the data can be modified. Even if the database has checksums enabled, could the checksum really be stamped on a readonly page ? Nope, because it can't be modified. You can check this by having a look at the m_FlagBits property in the header of the page. The bitmask should embed 0x200 when the checksum has been computed, in this case it is not.

I have shared with some of you one typical scenario where this could hurt. I'll agree, it is not very likely to happen, but it may be worth saying a word or two about it. Imagine a DBA is involved in the following scenario :

  1. He upgrades a database from 2000 to 2005/2008. PAGE_VERIFY remains under TORN_PAGE_DETECTION. He doesn't realize it and completely forgets about it.
  2. Then, a change in the application needs some data in a filegroup to be turned READONLY.
  3. The DBA suddenly discovers PAGE_VERIFY is still configured the old way, so he switches into CHECKSUM. But the checksums will only be stamped when all the pages will be read, modified and written back to the file.
  4. The READONLY data cannot be modified, so it will remain under TORN PAGE DETECTION protection only, even if PAGE_VERIFY=CHECKSUM.
  5. Murphy's law says this is when a readonly page could be damaged and no one would ever notice.

You want to make your own opinion ? Here is a little step by step you can follow:

First, create a database in a SQL Server 2000 instance, add some data and backup the database:

create database tpdtest
use tpdtest
create table T1(a numeric identity, b char(1))
insert into T1 values ('A')
backup database tpdtest to disk='F:\SQLDATA$SQL2000\BACKUP\tpdtest.bak' with init, stats

Restore the backup in a SQL Server 2005 or 2008 instance, add a new filegroup to host the existing table and turn it into READONLY mode:

restore database tpdtest from disk='F:\SQLDATA$SQL2000\BACKUP\tpdtest.bak'
with move 'tpdtest' to 'E:\SQLDATA$SQL2005\DATA\tpdtest.mdf',
move 'tpdtest_log' to 'F:\SQLDATA$SQL2005\LOG\tpdtest_log.ldf',
stats, replace
alter database tpdtest add filegroup RO_tpdtest
alter database tpdtest add file(name='RO_tpdtest',
filename='E:\SQLDATA$SQL2005\DATA\RO_tpdtest.ndf') to filegroup RO_tpdtest
use tpdtest
create unique clustered index UCQ_T1_a on T1(a) on RO_tpdtest
alter database tpdtest modify filegroup RO_tpdtest readonly

This is when the DBA realizes he needs to switch to CHECKSUM mode:

alter database tpdtest set PAGE_VERIFY CHECKSUM

And now, you can simulate a page trash by taking the database offline and using an hexa editor like XVI32 to modify the data right in the middle of a sector. At this step, it is important not to trash data across two sectors, because the TPD mechanism would detect it and raise a 824 error, this is why the address highlighted in figure 1 is a little further than the page boundary (in this case, 0x10000 is the address for data page 1:8). I'm typing a 'C' char instead of the 'A' char :

Figure1:

After a while, an end-user queries the table:

alter database tpdtest set online
use tpdtest
select * from T1
a b
-- --
1 C

Oooops. ..At this point I have no way to know what happened to my data. As a comparison, if I had run the same tests on a T2 table in a read-write filegroup, I would have been bitten by a 824 error:

alter database tpdtest set online
use tpdtest
select * from T2
Msg 824, Level 24, State 2, Line 1
SQL Server detected a logical consistency-based I/O error: incorrect checksum (expected: 0xb3834c64; actual: 0xb2834c64). It occurred during a read of page (1:15) in database ID 21 at offset 0×0000000001e000 in file 'E:\SQLDATA$SQL2005\DATA\tpdtest.mdf'.
Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB).
This error can be caused by many factors; for more information, see SQL Server Books Online.
CHECKSUM is a superset of TPD, and the error is detected.

Conclusion:

  • Include the switch to CHECKSUM mode in your migration process.
  • Remember the checksums will be implemented only when the data will be loaded in the buffer pool, modified and written back in the data file.
  • In some circumstances, read-only data may only remain under the TPD protection.

Hope this helps.

 

Rate

4.25 (32)

You rated this post out of 5. Change rating

Share

Share

Rate

4.25 (32)

You rated this post out of 5. Change rating