SQL SERVER detected a logical consistency-based I/O error: torn page

  • I have not yet repaired the msdb error ...

    So, I started with

    DBCC CHECKDB (msdb) WITH ALL_ERRORMSGS, NO_INFOMSGS

    I recieve the following:

    Msg 8967, Level 16, State 216, Line 1

    An internal error occurred in DBCC that prevented further processing. Contact Customer Support Services.

    Now, I executed DBCC CHECKTABLE ('backupmediafamily') to find out what table had the error.

    Msg 8967, Level 16, State 216, Line 1

    An internal error occurred in DBCC that prevented further processing. Contact Customer Support Services.

    Then, I decided to attempt to SELECT * FROM dbo.backupmediafamily, but this actually worked properly.

    I decided to try the second index, SELECT media_family_id FROM backupmediafamily, and BINGO!

    Msg 824, Level 24, State 2, Line 1

    SQL Server detected a logical consistency-based I/O error: incorrect pageid (expected 1:54080; actual 775:-653262834). It occurred during a read of page (1:54080) in database ID 4 at offset 0x0000001a680000 in file 'D:\MSSQL\Data\MSSQL.3\MSSQL\DATA\MSDBData.mdf'. Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.

    I think I have this tracked down to a physical hard drive issue on our SAN, but I have no way to verify that. The telling factor is that this same exact page was the issue on the 5th with a completely different table (we fixed that issue with the DBCC CHECKDB command in my previous post). According to MSDB.dbo.suspect_pages, I have 8 pages (1 64-KB block of physical data) that is suspect:

    DB-ID File Page Event count last error ts

    4154080713492009-03-05 14:39:26.057

    41540857132009-03-05 14:39:26.090

    4154081712009-03-05 14:39:26.057

    4154082122009-03-22 00:30:02.980

    4154085152009-03-23 17:45:03.477

    4154083162009-03-24 12:00:14.117

    4154086122009-03-22 16:00:09.950

    4154084142009-03-24 02:30:04.110

    41540801102009-03-25 13:04:39.470

    41540873122009-03-25 11:33:23.530

    As you can see, I have errors on the 5th and the 25th for the same pages 54080 and 54081. I have to now attempt to locate the actual physical address of these pages on the hard disk. Any ideas??

  • No idea how to tell what the physical address of the pages are on the hard disk - all I can say is that the offset of a page within the SQL data file is the page number x 8192 bytes. If NTFS has split the actual file over multiple contiguous chunks of disk, that makes things even harder, then you're dealing with how RAID is setup, drive geometries etc etc. But really all you want to know is which physical drive/array is the problem with - and then replace it.

    The 8967 is a known bug, when you already have corruption (key point) - see CHECKDB bug that people are hitting - Msg 8967, Level 16, State 216

    Regardless, you need to move the database onto different storage - the fact that the same errors occured on different days says that drive (or RAID array) is toast.

    Thanks

    Paul Randal
    CEO, SQLskills.com: Check out SQLskills online training!
    Blog:www.SQLskills.com/blogs/paul Twitter: @PaulRandal
    SQL MVP, Microsoft RD, Contributing Editor of TechNet Magazine
    Author of DBCC CHECKDB/repair (and other Storage Engine) code of SQL Server 2005

  • Paul, I have some really interesting information that I would like to get your opinion on. We currently have a support ticket open with Hitachi to verify integrity on the SAN, but I was digging a little deeper and thought this was most interesting.

    DBCC TRACEON (3604, -1)

    DBCC PAGE (msdb, 1, 54080, 3)

    PAGE: (775:-653262834)

    BUFFER:

    BUF @0x00000000A1FD1200

    bpage = 0x00000000A1448000 bhash = 0x0000000000000000 bpageno = (1:54080)

    bdbid = 4 breferences = 0 bUse1 = 54470

    bstat = 0x3c00809 blog = 0x21212159 bnext = 0x0000000000000000

    PAGE HEADER:

    Page @0x00000000A1448000

    m_pageId = (775:-653262834) m_headerVersion = 6 m_type = 46

    m_typeFlagBits = 0x0 m_level = 129 m_flagBits = 0x0

    m_objId (AllocUnitId.idObj) = 67111936 m_indexId (AllocUnitId.idInd) = 768

    Metadata: AllocUnitId = 216177180361621504 Metadata: PartitionId = 0

    Metadata: IndexId = -1 Metadata: ObjectId = 0 m_prevPage = (0:132096)

    m_nextPage = (2:-2105506296) pminlen = 4 m_slotCnt = 0

    m_freeCnt = 57 m_freeData = 0 m_reservedCnt = 1536

    m_lsn = (16782592:637544192:1792) m_xactReserved = 5377 m_xdesId = (0:66560)

    m_ghostRecCnt = 22 m_tornBits = 1089300228

    Allocation Status

    GAM (1:2) = ALLOCATED SGAM (1:3) = NOT ALLOCATED

    PFS (1:48528) = 0x40 ALLOCATED 0_PCT_FULL DIFF (1:6) = NOT CHANGED

    ML (1:7) = NOT MIN_LOGGED

    Msg 2514, Level 16, State 5, Line 1

    A DBCC PAGE error has occurred: Invalid page type - dump style 3 not possible.

    I also tried dump style 2, but I am not going to post that ugly mess on here 🙂

    What I found MOST interesting is the AllocUnitId ... I took the ID number and ran this:

    SELECT * FROM msdb.sys.allocation_units WHERE allocation_unit_id = 216177180361621504

    Empty Set ... There was no record of that allocation unit for 5 pages 54080, 54082, 54083, 54084, 54085 ... and they all appear to have the same issue. Page 54081 is in a different allocation unit, and it returned 144 rows properly using the dump style 3 flag.

    I am waiting to hear from Hitachi if there is anything on the SAN that might be the root cause, and if not, I am going to start asking about testing the other components in the physical server and connecting the server to the SAN. Do you have any additional ideas or suggestions?

  • This looks just like I/O subsystem corruption - the page is full of garbage.

    I'd like you to turn on page checksums, which will catch when the I/O subsystem is causing problems.

    ALTER DATABASE yourdb SET PAGE_VERIFY CHECKSUM

    See How to tell if the IO subsystem is causing corruptions?

    Thanks

    Paul Randal
    CEO, SQLskills.com: Check out SQLskills online training!
    Blog:www.SQLskills.com/blogs/paul Twitter: @PaulRandal
    SQL MVP, Microsoft RD, Contributing Editor of TechNet Magazine
    Author of DBCC CHECKDB/repair (and other Storage Engine) code of SQL Server 2005

  • hi Paul,

    I had a database that checkdb reported errors, I did a checktable for all the tables and few tables reported problems. See the output from each step. Interestingly, the row count and page count remained the same after checktable with REPAIR_ALLOW_DATA_LOSS option was run, So, I am wondering were there any data loss ? Please share your thoughts.

    Thanks,

    Ranga

    dbcc checktable ( 'dbo.CorruptTable', REPAIR_REBUILD )

    ------------------------------------------------------

    Msg 8939, Level 16, State 5, Line 1

    Table error: Object ID 0, index ID -1, partition ID 0, alloc unit ID 0 (type Unknown), page (1:40464). Test (m_headerVersion == HEADER_7_0) failed. Values are 6 and 1.

    The repair level on the DBCC statement caused this repair to be bypassed.

    Msg 8939, Level 16, State 7, Line 1

    Table error: Object ID 0, index ID -1, partition ID 0, alloc unit ID 0 (type Unknown), page (1:40464). Test (m_freeData >= PAGEHEADSIZE && m_freeData <= (UINT)PAGESIZE - m_slotCnt * sizeof (Slot)) failed. Values are 0 and 8186.

    Repairing this error requires other errors to be corrected first.

    Msg 8939, Level 16, State 8, Line 1

    Table error: Object ID 0, index ID -1, partition ID 0, alloc unit ID 0 (type Unknown), page (1:40464). Test (m_freeCnt = PAGEHEADSIZE && m_freeData <= (UINT)PAGESIZE - m_slotCnt * sizeof (Slot)) failed. Values are 0 and 8186.

    The error has been repaired.

    Msg 8939, Level 16, State 8, Line 1

    Table error: Object ID 0, index ID -1, partition ID 0, alloc unit ID 0 (type Unknown), page (1:40464). Test (m_freeCnt <= PAGESIZE - PAGEHEADSIZE) failed. Values are 44123 and 8096.

    The error has been repaired.

    Msg 8909, Level 16, State 1, Line 1

    Table error: Object ID 0, index ID -1, partition ID 0, alloc unit ID 0 (type Unknown), page ID (1:40465) contains an incorrect page ID in its page header. The PageId in the page header = (262:589825).

    The error has been repaired.

    Msg 8909, Level 16, State 1, Line 1

    Table error: Object ID 0, index ID -1, partition ID 0, alloc unit ID 0 (type Unknown), page ID (1:40466) contains an incorrect page ID in its page header. The PageId in the page header = (262:262145).

    The error has been repaired.

    Msg 8909, Level 16, State 1, Line 1

    Table error: Object ID 0, index ID -1, partition ID 0, alloc unit ID 0 (type Unknown), page ID (1:40467) contains an incorrect page ID in its page header. The PageId in the page header = (262:1441793).

    The error has been repaired.

    CHECKTABLE found 0 allocation errors and 6 consistency errors not associated with any single object.

    CHECKTABLE fixed 0 allocation errors and 6 consistency errors not associated with any single object.

    DBCC results for 'CorruptTable'.

    Repair: The page (1:40464) has been deallocated from object ID 789838126, index ID 0, partition ID 51762831425536, alloc unit ID 51762831425536 (type In-row data).

    Repair: The page (1:40465) has been deallocated from object ID 789838126, index ID 0, partition ID 51762831425536, alloc unit ID 51762831425536 (type In-row data).

    Repair: The page (1:40466) has been deallocated from object ID 789838126, index ID 0, partition ID 51762831425536, alloc unit ID 51762831425536 (type In-row data).

    Repair: The page (1:40467) has been deallocated from object ID 789838126, index ID 0, partition ID 51762831425536, alloc unit ID 51762831425536 (type In-row data).

    Msg 8928, Level 16, State 1, Line 1

    Object ID 789838126, index ID 0, partition ID 51762831425536, alloc unit ID 51762831425536 (type In-row data): Page (1:40464) could not be processed. See other errors for details.

    The error has been repaired.

    Msg 8928, Level 16, State 1, Line 1

    Object ID 789838126, index ID 0, partition ID 51762831425536, alloc unit ID 51762831425536 (type In-row data): Page (1:40465) could not be processed. See other errors for details.

    The error has been repaired.

    Msg 8928, Level 16, State 1, Line 1

    Object ID 789838126, index ID 0, partition ID 51762831425536, alloc unit ID 51762831425536 (type In-row data): Page (1:40466) could not be processed. See other errors for details.

    The error has been repaired.

    Msg 8928, Level 16, State 1, Line 1

    Object ID 789838126, index ID 0, partition ID 51762831425536, alloc unit ID 51762831425536 (type In-row data): Page (1:40467) could not be processed. See other errors for details.

    The error has been repaired.

    There are 218534 rows in 2213 pages for object "CorruptTable".

    CHECKTABLE found 0 allocation errors and 4 consistency errors in table 'CorruptTable' (object ID 789838126).

    CHECKTABLE fixed 0 allocation errors and 4 consistency errors in table 'CorruptTable' (object ID 789838126).

    DBCC execution completed. If DBCC printed error messages, contact your system administrator.

    dbcc checktable ( 'dbo.CorruptTable')

    --------------------------------------

    DBCC results for 'CorruptTable'.

    There are 218534 rows in 2213 pages for object "CorruptTable".

    DBCC execution completed. If DBCC printed error messages, contact your system administrator.

  • Please start a new thread rather than continuing an unrelated one.

    In this case you were lucky - the pages were linked into the table (probably by the corruption) and didn't have any data on them. You need to figure out how the corruption happened and sort out your backup strategy so that you don't have to run repair next time corruption occurs.

    Paul Randal
    CEO, SQLskills.com: Check out SQLskills online training!
    Blog:www.SQLskills.com/blogs/paul Twitter: @PaulRandal
    SQL MVP, Microsoft RD, Contributing Editor of TechNet Magazine
    Author of DBCC CHECKDB/repair (and other Storage Engine) code of SQL Server 2005

  • Thank you.

  • thank you

  • thank you

  • ya thank you :hehe:.

  • I got the error for tempdb while executing a script.Pls advise.

    Msg 824, Level 24, State 2, Line 74

    SQL Server detected a logical consistency-based I/O error: torn page (expected signature: 0x55555555; actual signature: 0x85505bd1). It occurred during a read of page (1:1224) in database ID 2 at offset 0x00000000990000 in file 'C:\Program Files\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSQL\DATA\tempdb.mdf'. Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.

  • Please start a new thread.

Viewing 12 posts - 16 through 26 (of 26 total)

You must be logged in to reply to this topic. Login to reply