Blog Post

Day 6 of 31 Days of Disaster Recovery: Dealing With Corruption in Allocation Pages

,

Day 6 of 31 Days of Disaster Recovery: Dealing With Corruption in Allocation Pages

31 Days of Disaster Recovery

31 Days of Disaster Recovery

Yesterday, I covered corruption in nonclustered indexes, the easiest type of corruption to handle. Today, I’m going to move on to something slightly more complex, yet still really simple to manage. Today, I’m going to talk about what to do when you encounter corruption of an allocation page. Allocation pages cannot be repaired not can they be single-page restored. If you have corruption of an allocation page, you need to restore the whole database.

If you missed any of the earlier posts in the series, you can check them out here:

    31 Days of disaster Recovery

  1. Does DBCC Automatically Use Existing Snapshot?
  2. Protection From Restoring a Backup of a Contained Database
  3. Determining Files to Restore Database
  4. Back That Thang Up
  5. Dealing With Corruption in a Nonclustered Index

What are Allocation Pages?

I’ve talked in great detail previously about allocation pages in my posts on dealing with tempdb contention here (part 1) and here (part 2). Allocation pages are special pages in the data files that track and manage extent allocations. There are 3 types of allocation pages that we are going to focus on today.

Global Allocation Map (GAM): Tracks which extents have been allocated. There is 1 GAM page for every 4 GB of data file. It is always page 2 in the data file and then repeats every 511,232 pages.

  • Page ID = 2 or Page ID % 511232

Shared Global Allocation Map (SGAM): Tracks which extents are being used as mixed (shared) extents. There is 1 SGAM page for every 4 GB of data file. It is always page 3 in the data file and then repeats every 511,232 pages.

  • Page ID = 3 or (Page ID – 1) % 511232

Page Free Space (PFS): Tracks the allocation status of each page and approximately how much free space it has. There is 1 PFS page for every 64 MB of data file. It is always page 1 in the data file and then repeats every 8,088 pages.

  • Page ID = 1 or Page ID % 8088

If the page ID is 1, 2, or 3, it’s obvious that it’s one of the above allocation pages. If it’s a high number, then you have a couple of ways to figure out what type of page it is. Of course, you can do the math and calculate whether it’s an allocation page or not. I’ve included the below script to do the math for you.

Declare @PageID int;
-- Enter page number
-- e.g., 8088 = PFS page
Set @PageID = 8088;
Select Case
    When @PageID = 1 Or @PageID % 8088 = 0 Then 'Is PFS Page'
    When @PageID = 2 Or @PageID % 511232 = 0 Then 'Is GAM Page'
    When @PageID = 3 Or (@PageID - 1) % 511232 = 0 Then 'Is SGAM Page'
    Else 'Is Not PFS, GAM, or SGAM page'
    End

Another way you could determine the page type is to dump the page using DBCC PAGE and look for the m_type value in the header output. The type values are different from the page IDs (m_pageid in DBCC PAGE output).

 Page Type  m_type 
 PFS  11 
 GAM  8 
 SGAM  9 

Follow the Steps

Now that we know what allocation pages are and how to identify them, we can move on to following the three general steps for handling corruption.

  1. Identify the corruption (DBCC CheckDB)
  2. Identify the objects and types of objects involved
  3. Take the appropriate steps to correct

Step 1 is to identify the corruption. For today’s exercise, I’m going to my cleverly named sample database PFSCorruption. If I run DBCC CheckDB on the database, I see that it is corrupted.

DBCC CheckDB(PFSCorruption)
    With No_InfoMsgs, All_ErrorMsgs, TableResults;

That gives us a lot of columns that we don’t need, so I’m going to pare that down to just the really important columns. This will make it easier for us to find the bits we really want. I don’t use this when troubleshooting corruption, this is just to make it easier to find the key info for this demonstration.

Declare @DBCC Table (
    Error int,
    Level smallint,
    State tinyint,
    MessageText varchar(2500),
    RepairLevel varchar(30) null,
    Status tinyint,
    DbId int,
    DbFragId int,
    ObjectId int,
    IndexId int,
    PartitionId bigint,
    AllocUnitId bigint,
    RidDbId int,
    RidPruId int,
    [File] int,
    Page int,
    Slot int,
    RefDbId int,
    RefPruId int,
    RefFile int,
    RefPage int,
    RefSlot int,
    Allocation bigint)
Insert Into @DBCC
Exec sp_executesql N'DBCC CheckDB(PFSCorruption)
    With No_InfoMsgs, All_ErrorMsgs, TableResults;';
Select Level,
    State,
    MessageText,
    RepairLevel,
    ObjectId,
    IndexId,
    [File],
    Page,
    RefFile,
    RefPage
From @DBCC;
 Level  State  MessageText  RepairLevel  ObjectId  IndexId  File  Page  RefFile  RefPage 
 16  5  Database error: Page (1:6) is marked with the wrong type in PFS page (1:1). PFS status 0xa expected 0×44.  NULL  0  -1  1  6  1  1 
 16  5  Database error: Page (1:7) is marked with the wrong type in PFS page (1:1). PFS status 0xa expected 0×44.  NULL  0  -1  1  7  1  1 
 10  1  CHECKDB found 2 allocation errors and 0 consistency errors not associated with any single object.  NULL  0  0  0  0  0  0 
 10  1  CHECKDB found 2 allocation errors and 0 consistency errors in database ‘PFSCorruption’.  NULL  0  0  0  0  0  0 

Identify the Objects

Again, we need to identify the real errors, and those are the ones that are Level 16. The higher the level, the more severe that error is so the higher level errors are the reals one we need to examine. Also, State 5 means that an unknown error occurred that terminated DBCC execution. Each of the 2 error messages mentions 2 different pages. The error messages can be misleading and make you think that pages (1, 6) and (1, 7) are corrupted, but the corruption is page (1, 1). Page (1, 1) has the wrong values for these two pages. So the columns we are interested in this time are RefFile and RefPage rather than File and Page.

We don’t need to do the math in this case. We already know that page 1 is the PFS page and in fact, the error message even called out that it is the PFS page. We know it’s an allocation page, so we know what action is appropriate.

Taking Appropriate Action

I said earlier that allocation pages cannot be repaired or restored. We need to do a full database restore. at this point, I would check my backup situation, back up the tail of the log, and then begin my restore process. If you don’t have backups to restore from, you will need to export the data, create a new version of the database, and re-import the data. It’s not going to be a fast process. There are a few people in the world that could hack the page with a hex editor to repair it manually, but that’s not something we are going to try to demonstrate here. If you try to do that, I highly advise you to make a copy of the files, and try to fix the copy of it.

Sample corrupt database and scripts: PFSCorruption.zip (116 KB)

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating