ERROR 823 in SQL 2000 -> Need help

  • Hi All,

    Good day and Happy New Year !!

    I need your inputs for the below issue -

    We have a 1.5 Terabyte SAP database which runs on SQL 2000 SP4 enterprize edition.The SQL instance is on Win2k3. For last couple of months I have observed that we are getting error 823 errors very frequently in the eventviewer -

    Error: 823, Severity: 24, State: 2

    I/O error (bad page ID) detected during read at offset 0x000005981be000 in file ...

    I found out a particular table for which the pages were corrupt and was able to fix the error with CHECKDB.

    We have a index defragment job set up on the database which runs DBCC INDEXDEFRAG on weekly basis on the database. Its been failing for some time with the error -

    Object ID 0, index ID 0, page ID (6:2932957). The PageId in the page header = (0:0). [SQLSTATE 42000] (Error 8909) Table error: Object ID 0, index ID 0, page ID (6:2932957). The PageId in the page header = (0:0). [SQLSTATE 42000] (Error 8909) I/O error (bad page ID) detected during read at offset 0x00000598246000 in file . [SQLSTATE HY000] (Error 823). The step failed.

    This time I checked the page 6:2932957 and could see that the page is zeroed out -

    dbcc traceon (3604)

    dbcc page (IOD,6,2932957,3)

    Output :

    PAGE: (0:0)

    -----------

    BUFFER:

    -------

    BUF @0x04B843C0

    ---------------

    bpage = 0x457AE000 bhash = 0x00000000 bpageno = (6:2932957)

    bdbid = 5 breferences = 1 bstat = 0x809

    bspin = 0 bnext = 0x00000000

    PAGE HEADER:

    ------------

    Page @0x457AE000

    ----------------

    m_pageId = (0:0) m_headerVersion = 0 m_type = 0

    m_typeFlagBits = 0x0 m_level = 0 m_flagBits = 0x0

    m_objId = 0 m_indexId = 0 m_prevPage = (0:0)

    m_nextPage = (0:0) pminlen = 0 m_slotCnt = 0

    m_freeCnt = 0 m_freeData = 0 m_reservedCnt = 0

    m_lsn = (0:0:0) m_xactReserved = 0 m_xdesId = (0:0)

    m_ghostRecCnt = 0 m_tornBits = 0

    Allocation Status

    -----------------

    GAM (6:2556160) = ALLOCATED

    SGAM (6:2556161) = NOT ALLOCATED

    PFS (6:2927856) = 0x70 IAM_PG MIXED_EXT ALLOCATED 0_PCT_FULL

    DIFF (6:2556166) = CHANGED

    ML (6:2556167) = NOT MIN_LOGGED

    Further, I can see that the pages in the range of 2932957 to 2932972 are all zeroed out. Below is the GAM Allocation details -

    PAGE: (6:2556160)

    -----------------

    BUFFER:

    -------

    BUF @0x01A6AF40

    ---------------

    bpage = 0x5DD5C000 bhash = 0x00000000 bpageno = (6:2556160)

    bdbid = 5 breferences = 1 bstat = 0x9

    bspin = 0 bnext = 0x00000000

    PAGE HEADER:

    ------------

    Page @0x5DD5C000

    ----------------

    m_pageId = (6:2556160) m_headerVersion = 1 m_type = 8

    m_typeFlagBits = 0x0 m_level = 0 m_flagBits = 0x0

    m_objId = 99 m_indexId = 0 m_prevPage = (0:0)

    m_nextPage = (0:0) pminlen = 90 m_slotCnt = 2

    m_freeCnt = 6 m_freeData = 8182 m_reservedCnt = 0

    m_lsn = (13006:314808:58) m_xactReserved = 0 m_xdesId = (0:0)

    m_ghostRecCnt = 0 m_tornBits = 0

    Allocation Status

    -----------------

    GAM (6:2556160) = ALLOCATED

    SGAM (6:2556161) = NOT ALLOCATED

    PFS (6:2555808) = 0x40 ALLOCATED 0_PCT_FULL DIFF (6:2556166) = CHANGED

    ML (6:2556167) = NOT MIN_LOGGED

    GAM: Header @0x5DD5C064 Slot 0, Offset 96

    -----------------------------------------

    status = 0x0

    GAM: Extent Alloc Status @0x5DD5C0C2

    ------------------------------------

    (6:2556160) - (6:2566216) = ALLOCATED

    (6:2566224) - = NOT ALLOCATED

    (6:2566232) - (6:2641136) = ALLOCATED

    (6:2641144) - = NOT ALLOCATED

    (6:2641152) - (6:2758048) = ALLOCATED

    (6:2758056) - = NOT ALLOCATED

    (6:2758064) - (6:2845040) = ALLOCATED

    (6:2845048) - = NOT ALLOCATED

    (6:2845056) - (6:2870064) = ALLOCATED

    (6:2870072) - = NOT ALLOCATED

    (6:2870080) - (6:2922512) = ALLOCATED

    (6:2922520) - = NOT ALLOCATED

    (6:2922528) - (6:2928136) = ALLOCATED

    (6:2928144) - = NOT ALLOCATED

    (6:2928152) - (6:2990800) = ALLOCATED

    (6:2990808) - = NOT ALLOCATED

    (6:2990816) - (6:3005304) = ALLOCATED

    (6:3005312) - = NOT ALLOCATED

    (6:3005320) - (6:3066176) = ALLOCATED

    (6:3066184) - = NOT ALLOCATED

    (6:3066192) - (6:3067384) = ALLOCATED

    As the database is huge in size and very frequently used, it is not possible to run a full checkdb on the database at this moment. The SAN guys sent a report that there are no issues at the storage level.

    Below are the question that I am not able to sort out so far -

    1. Is there a way that I can track this page back to a particular object in the database ?

    2. Is there a relation between the DBCC INDEXDEFRAG and the pages getting zeroed out ?

    3. What will be the best possible way to run a full consistency check on databases of this size ?

    4. What is the possible reason

    5. Can rebuild index be used instead of using DBCC INDEXDEFRAG in this case ?

    It would be really great if you could please provide with your valuable feedback on my questions.

    Regards,

    Shovan

  • shovankar (1/9/2013)


    1. Is there a way that I can track this page back to a particular object in the database ?

    Not from the information you've given, no. The output of CheckDB will probably identify the table involved.

    2. Is there a relation between the DBCC INDEXDEFRAG and the pages getting zeroed out ?

    No.

    3. What will be the best possible way to run a full consistency check on databases of this size ?

    DBCC CHECKDB (<Database Name>) WITH NO_INFOMSGS, ALL_ERRORMSGS

    You need to do a full checkDB of this DB or a restored backup of this DB (not a replicated copy), as you have no idea the extent or severity of the corruption.

    4. What is the possible reason

    Hardware problems. Something's misbehaving somewhere in the IO stack and that something has zeroed out one or more pages in the database.

    5. Can rebuild index be used instead of using DBCC INDEXDEFRAG in this case ?

    Maybe, but it's not the cause nor the fix for your corruption.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thanks a lot Gail for your valuable inputs. Since morning I was manually checking the pages and trying to backtrack the pages to tables manually. So far, I have found 2 tables with corrupted pages.

    Interestingly, for one of the table, I got the below error -

    Msg 8939, Level 16, State 28, Line 2

    Table error: Object ID 1143334087, index ID 2, page (6:2932956). Test (m_slots[0].GetOffset () == PAGEHEADSIZE && m_type == IAM_PAGE) failed. Values are 0 and 96.

    Msg 8939, Level 16, State 29, Line 2

    Table error: Object ID 1143334087, index ID 2, page (6:2932956). Test (Align(m_slots[-1].GetOffset ()) == Align(m_slots[0].GetOffset () + sizeof(IAMHEADER) + sizeof(DataRecHdr)) && m_type == IAM_PAGE) failed. Values are 0 and 94.

    Msg 8939, Level 16, State 28, Line 2

    Table error: Object ID 1143334087, index ID 2, page (6:2932956). Test (m_slots[0].GetOffset () == PAGEHEADSIZE && m_type == IAM_PAGE) failed. Values are 0 and 96.

    Msg 8939, Level 16, State 29, Line 2

    Table error: Object ID 1143334087, index ID 2, page (6:2932956). Test (Align(m_slots[-1].GetOffset ()) == Align(m_slots[0].GetOffset () + sizeof(IAMHEADER) + sizeof(DataRecHdr)) && m_type == IAM_PAGE) failed. Values are 0 and 94.

    Msg 7965, Level 16, State 2, Line 2

    Table error: Could not check object ID 1143334087, index ID 2 due to invalid allocation (IAM) page(s).

    It seems to be that the IAM page is corrupted. Please correct me if im wrong. Then there will be a big dataloss for this database !!

    I will try to run a full checkdb on this database post approval.

    Please let me your view on this.

    Regards,

    Shovan.

  • Please run the CheckDB and post the results. I can't tell you how to fix it (other than restore from a clean backup) without the full, complete and unedited output of a full CheckDB.

    No data loss from that bit that you posted, that's all in a nonclustered index. But then, you should be looking at your backups and considering how to restore, not thinking about repair (unless you have no backups)

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply