Database Corruption Problem with CheckDB results - How bad is it?

Question

Post reply

Database Corruption Problem with CheckDB results - How bad is it?

Bradley Deem

SSCrazy

Points: 2565
More actions
February 24, 2010 at 9:11 am

#386385

I'm quickly reading to determine the extent of the problem, but if I could draw on the experiences of the masses it would be greatly appreciated. We do have full backups and transactions logs available for restore. We are currently restoring our database.
I've placed the database in Emergency Mode so others have access to data if neeed. So what's the prognosis?
Msg 8909, Level 16, State 1, Line 1
Table error: Object ID 0, index ID -1, partition ID 0, alloc unit ID 0 (type Unknown), page ID (1:37026864) contains an incorrect page ID in its page header. The PageId in the page header = (0:0).
Msg 8998, Level 16, State 2, Line 1
Page errors on the GAM, SGAM, or PFS pages prevent allocation integrity checks in database ID 11 pages from (1:37026864) to (1:37034951). See other errors for cause.
Msg 8966, Level 16, State 2, Line 1
Unable to read and latch page (1:37034952) with latch type SH. 38(Reached the end of the file.) failed.
Msg 8998, Level 16, State 2, Line 1
Page errors on the GAM, SGAM, or PFS pages prevent allocation integrity checks in database ID 11 pages from (1:37034952) to (1:37043039). See other errors for cause.
Msg 8966, Level 16, State 2, Line 1
Unable to read and latch page (1:37043040) with latch type SH. 38(Reached the end of the file.) failed.
Msg 8998, Level 16, State 2, Line 1
Page errors on the GAM, SGAM, or PFS pages prevent allocation integrity checks in database ID 11 pages from (1:37043040) to (1:37051127). See other errors for cause.
CHECKDB found 6 allocation errors and 0 consistency errors not associated with any single object.

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 1

Fatal.

You have damage to a couple of your allocation pages (specifically PFS pages). This kind of corruption is irreparable.

Take a tail-log backup (if you can), then restore your full backup, all log backups followed by the tail-log backup that you took. You should be able to restore without data loss since you have a good set of backups.

You need to do some root cause analysis. Look in the windows event log, SAN logs, RAID logs, etc. This is typically an IO subsystem problem, you need to identify the cause or it may happen again.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 2

If you want some reading on corruption, you could start with this. http://www.sqlservercentral.com/articles/Corruption/65804/

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Bradley Deem SSCrazy Points: 2565 More actions · Answer 3

Thanks Gila. I was hoping to do a page restore, but I guess we're not looking good. I did read on BOL that we wouldn't be able to restore pages that dealt with PFS; I was just hoping I could get by.

This is the blog I was looking at for restoring corrupt pages.

http://blogs.msdn.com/sqlserverstorageengine/archive/2007/01/18/fixing-damaged-pages-using-page-restore-or-manual-inserts.aspx

Performing Page Restores

http://msdn.microsoft.com/en-us/library/ms175168.aspx

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 4

A page restore will not work here. Besides, it looks like there's more than just three pages damaged. The comment about not being able to latch a page with reason 'end of file' seems to indicate that SQL and windows are in disagreement over the size of the file.

Take the tail log backup and then restore from scratch over the damaged database.

p.s. Why did you post this in the SQL 2000 forum? It's either a 2005 or 2008 database.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Bradley Deem SSCrazy Points: 2565 More actions · Answer 5

Whoops, I suppose I wasn't reading the best in the heat of moment. You are correct, this is an SQL Server 2005 instance.

Paul Randal One Orange Chip Points: 29544 More actions · Answer 6

For the page latch message - when the final PFS page at the end of the file is damaged (which could be the case here), I couldn't tell how many pages from the final extent have been allocated or not - hence CHECKDB tries to read the pages and may fail some of them.

You need to do a full restore here.

A page being overwritten with zeroes (as is the case here) is the I/O subsystem - nearly 100% of the times I've seen.

Thanks

Paul Randal
CEO, SQLskills.com: Check out SQLskills online training!
Blog:www.SQLskills.com/blogs/paul Twitter: @PaulRandal
SQL MVP, Microsoft RD, Contributing Editor of TechNet Magazine
Author of DBCC CHECKDB/repair (and other Storage Engine) code of SQL Server 2005

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 7

Paul Randal (2/25/2010)
For the page latch message - when the final PFS page at the end of the file is damaged (which could be the case here), I couldn't tell how many pages from the final extent have been allocated or not - hence CHECKDB tries to read the pages and may fail some of them.

Ah, thanks for the explanation.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Paul Randal One Orange Chip Points: 29544 More actions · Answer 8

No problem. I really need to get around to writing a really detailed, comprehensive guide to corruption - nothing out there right now.

Paul Randal
CEO, SQLskills.com: Check out SQLskills online training!
Blog:www.SQLskills.com/blogs/paul Twitter: @PaulRandal
SQL MVP, Microsoft RD, Contributing Editor of TechNet Magazine
Author of DBCC CHECKDB/repair (and other Storage Engine) code of SQL Server 2005

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 9

Paul Randal (2/25/2010)
I really need to get around to writing a really detailed, comprehensive guide to corruption - nothing out there right now.

I wait with anticipation... 😀

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Steve Jones - SSC Editor SSC Guru Points: 740342 More actions · Answer 10

So, in addition to doing a restore, you need to check your hardware. This could happen again.

Bradley Deem SSCrazy Points: 2565 More actions · Answer 11

Agreed, we've looked over our SAN logs, but haven't found the "smoking gun" yet. We're in the process of looking over our fiber channel logs. Just an FYI, this corruption occured during our nightly reindexing job, that's been running for about a year now.

Thanks, again everybody your help.

Joy Smith San SSC-Insane Points: 24877 More actions · Answer 12

Kindly let us know if you could find the exact cause of the issue and how could you solve it. It will be helpfull to others including myself.

Bradley Deem SSCrazy Points: 2565 More actions · Answer 13

Unfortunately, I don't have an answer yet for the cause. We've restored the backup and run the reindexing job on another server (same SAN and disks though), but it didn't reproduce the error (which is a good thing). When we have an answer I'll be sure to post it.

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 14

Could be just about anything anything in the IO stack

Misbehaving filter driver (eg anti-virus)

Buggy HBA drivers

Faulty fibre switch

Problematic SAN controller

Glitch in SAN's write cache

Failing drives

etc

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass