Data Corruption after Server Crash with Disk error

  • On Friday, we had our database server crash and reboot twice while trying to run a backup. The hardware issue ended up being a disk within the Raid array hogging all of the I/O but never actually failing. We took disk out of the array and server performance is back to normal.

    On Sunday, the DBCC CheckDB found 3 tables with Data Consistency errors. But everything in the application was still functioning correctly. With everything working, we were not able to bring down the database. Today, we have started getting errors when running against this table. I ran the following against the table that is currently throwing the errors

    DBCC Checktable ('tblOrderApproval') WITH NO_INFOMSGS, ALL_ERRORMSGS

    and I received the following message

    Msg 8928, Level 16, State 1, Line 1

    Object ID 69575286, index ID 1, partition ID 72057594038910976, alloc unit ID 72057594043039744 (type In-row data): Page (1:5513080) could not be processed. See other errors for details.

    Msg 8939, Level 16, State 98, Line 1

    Table error: Object ID 69575286, index ID 1, partition ID 72057594038910976, alloc unit ID 72057594043039744 (type In-row data), page (1:5513080). Test (IS_OFF (BUF_IOERR, pBUF->bstat)) failed. Values are 113379337 and -1.

    Msg 8976, Level 16, State 1, Line 1

    Table error: Object ID 69575286, index ID 1, partition ID 72057594038910976, alloc unit ID 72057594043039744 (type In-row data). Page (1:5513080) was not seen in the scan although its parent (1:5405013) and previous (1:5605197) refer to it. Check any previous errors.

    Msg 8935, Level 16, State 1, Line 1

    Table error: Object ID 69575286, index ID 1, partition ID 72057594038910976, alloc unit ID 72057594043039744 (type In-row data). The previous link (1:5513083) on page (1:5513081) does not match the previous page (1:5513080) that the parent (1:5405013), slot 363 expects for this page.

    Msg 8978, Level 16, State 1, Line 1

    Table error: Object ID 69575286, index ID 1, partition ID 72057594038910976, alloc unit ID 72057594043039744 (type In-row data). Page (1:5513081) is missing a reference from previous page (1:5513083). Possible chain linkage problem.

    CHECKTABLE found 0 allocation errors and 5 consistency errors in table 'tblOrderApproval' (object ID 69575286).

    repair_allow_data_loss is the minimum repair level for the errors found by DBCC CHECKTABLE (ind-prod7-ws.dbo.tblOrderApproval).

    I know that the best option is to restore the last good backup. I am assuming at this time that the last good backup would be Thursday night, and there are too many changes since then to want to do a full restore.

    I also tried to rebuild the table with a select * into statement, but go the following

    Msg 824, Level 24, State 2, Line 1

    SQL Server detected a logical consistency-based I/O error: torn page (expected signature: 0xaaaaaaaa; actual signature: 0x5555aaaa). It occurred during a read of page (1:5513080) in database ID 6 at offset 0x00000a83ef0000 in file 'D:\mssql\MSSQL.1\MSSQL\Data\ind-prod7-ws.mdf'. Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.

    I know that one option is to run the repair and allow data loss, but are there any other options that I am missing. I have been lucky not to have much experience with data corruption in SQL Server.

    Any other suggestions on things to look at or try would be much appreciated.

    Thanks,

    Tim

  • Please run the following and post the full and complete output.

    DBCC CHECKDB (<Database Name>) WITH NO_INFOMSGS, ALL_ERRORMSGS

    Rebuilding is not going to help you. From the errors you've posted, your options are restore or repair. I would like to see what, if any, errors exist in the rest of the DB before recommending anything.

    Why would restoring from backup lose data? You can restore the full and then roll the log backups forward (assuming that the DB is in full recovery and has log backups. If it's a critical DB it should)

    This may be of interest. http://www.sqlservercentral.com/articles/65804/

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Ok, I guess I didn't think about rolling just the logs forward, since the blocks with corruption probably have not changed an so therfore won't be in the log backups. So there should not be any corruption in the backups then, correct? Now I just need to see if I have all of my backups and T-logs on hand from Last Thursday.

    Here is the full checkdb

    Msg 8964, Level 16, State 1, Line 1

    Table error: Object ID 21575115, index ID 1, partition ID 72057594045136896, alloc unit ID 71777533007953920 (type LOB data). The off-row data node at page (1:5500308), slot 1, text ID 982432350208 is not referenced.

    Msg 8914, Level 16, State 1, Line 1

    Incorrect PFS free space information for page (1:5507325) in object ID 21575115, index ID 1, partition ID 72057594045136896, alloc unit ID 71777533007953920 (type LOB data). Expected value 80_PCT_FULL, actual value 95_PCT_FULL.

    Msg 8965, Level 16, State 1, Line 1

    Table error: Object ID 21575115, index ID 1, partition ID 72057594045136896, alloc unit ID 71777533007953920 (type LOB data). The off-row data node at page (1:5507325), slot 3, text ID 972919734272 is referenced by page (1:6136007), slot 39, but was not seen in the scan.

    Msg 8965, Level 16, State 1, Line 1

    Table error: Object ID 21575115, index ID 1, partition ID 72057594045136896, alloc unit ID 71777533007953920 (type LOB data). The off-row data node at page (1:5507325), slot 4, text ID 972919799808 is referenced by page (1:6136007), slot 40, but was not seen in the scan.

    Msg 8965, Level 16, State 1, Line 1

    Table error: Object ID 21575115, index ID 1, partition ID 72057594045136896, alloc unit ID 71777533007953920 (type LOB data). The off-row data node at page (1:5507325), slot 5, text ID 982432350208 is referenced by page (1:6315359), slot 26, but was not seen in the scan.

    Msg 8965, Level 16, State 1, Line 1

    Table error: Object ID 21575115, index ID 1, partition ID 72057594045136896, alloc unit ID 71777533007953920 (type LOB data). The off-row data node at page (1:5507325), slot 6, text ID 982432415744 is referenced by page (1:6315359), slot 27, but was not seen in the scan.

    Msg 8965, Level 16, State 1, Line 1

    Table error: Object ID 21575115, index ID 1, partition ID 72057594045136896, alloc unit ID 71777533007953920 (type LOB data). The off-row data node at page (1:5507325), slot 7, text ID 982432481280 is referenced by page (1:6315359), slot 28, but was not seen in the scan.

    Msg 8965, Level 16, State 1, Line 1

    Table error: Object ID 21575115, index ID 1, partition ID 72057594045136896, alloc unit ID 71777533007953920 (type LOB data). The off-row data node at page (1:5507325), slot 8, text ID 982432546816 is referenced by page (1:6315359), slot 29, but was not seen in the scan.

    Msg 8965, Level 16, State 1, Line 1

    Table error: Object ID 21575115, index ID 1, partition ID 72057594045136896, alloc unit ID 71777533007953920 (type LOB data). The off-row data node at page (1:5507325), slot 10, text ID 982432612352 is referenced by page (1:6315359), slot 30, but was not seen in the scan.

    Msg 8965, Level 16, State 1, Line 1

    Table error: Object ID 21575115, index ID 1, partition ID 72057594045136896, alloc unit ID 71777533007953920 (type LOB data). The off-row data node at page (1:5507325), slot 11, text ID 982432677888 is referenced by page (1:6315359), slot 31, but was not seen in the scan.

    Msg 8965, Level 16, State 1, Line 1

    Table error: Object ID 21575115, index ID 1, partition ID 72057594045136896, alloc unit ID 71777533007953920 (type LOB data). The off-row data node at page (1:5507325), slot 12, text ID 982432743424 is referenced by page (1:6315359), slot 32, but was not seen in the scan.

    Msg 8965, Level 16, State 1, Line 1

    Table error: Object ID 21575115, index ID 1, partition ID 72057594045136896, alloc unit ID 71777533007953920 (type LOB data). The off-row data node at page (1:5507325), slot 13, text ID 982432874496 is referenced by page (1:6315359), slot 34, but was not seen in the scan.

    Msg 8964, Level 16, State 1, Line 1

    Table error: Object ID 21575115, index ID 1, partition ID 72057594045136896, alloc unit ID 71777533007953920 (type LOB data). The off-row data node at page (1:6316308), slot 0, text ID 982432415744 is not referenced.

    Msg 8964, Level 16, State 1, Line 1

    Table error: Object ID 21575115, index ID 1, partition ID 72057594045136896, alloc unit ID 71777533007953920 (type LOB data). The off-row data node at page (1:6316309), slot 0, text ID 982432415744 is not referenced.

    Msg 8964, Level 16, State 1, Line 1

    Table error: Object ID 21575115, index ID 1, partition ID 72057594045136896, alloc unit ID 71777533007953920 (type LOB data). The off-row data node at page (1:6316310), slot 0, text ID 982432481280 is not referenced.

    Msg 8964, Level 16, State 1, Line 1

    Table error: Object ID 21575115, index ID 1, partition ID 72057594045136896, alloc unit ID 71777533007953920 (type LOB data). The off-row data node at page (1:6316311), slot 0, text ID 982432481280 is not referenced.

    Msg 8964, Level 16, State 1, Line 1

    Table error: Object ID 21575115, index ID 1, partition ID 72057594045136896, alloc unit ID 71777533007953920 (type LOB data). The off-row data node at page (1:6316360), slot 0, text ID 982432546816 is not referenced.

    Msg 8964, Level 16, State 1, Line 1

    Table error: Object ID 21575115, index ID 1, partition ID 72057594045136896, alloc unit ID 71777533007953920 (type LOB data). The off-row data node at page (1:6316361), slot 0, text ID 982432677888 is not referenced.

    Msg 8964, Level 16, State 1, Line 1

    Table error: Object ID 21575115, index ID 1, partition ID 72057594045136896, alloc unit ID 71777533007953920 (type LOB data). The off-row data node at page (1:6316362), slot 0, text ID 982432677888 is not referenced.

    Msg 8964, Level 16, State 1, Line 1

    Table error: Object ID 21575115, index ID 1, partition ID 72057594045136896, alloc unit ID 71777533007953920 (type LOB data). The off-row data node at page (1:6316363), slot 0, text ID 982432743424 is not referenced.

    Msg 8964, Level 16, State 1, Line 1

    Table error: Object ID 21575115, index ID 1, partition ID 72057594045136896, alloc unit ID 71777533007953920 (type LOB data). The off-row data node at page (1:6316364), slot 0, text ID 982432743424 is not referenced.

    Msg 8929, Level 16, State 1, Line 1

    Object ID 21575115, index ID 1, partition ID 72057594045136896, alloc unit ID 72057594050379776 (type In-row data): Errors found in off-row data with ID 972919734272 owned by data record identified by RID = (1:6136007:39)

    Msg 8929, Level 16, State 1, Line 1

    Object ID 21575115, index ID 1, partition ID 72057594045136896, alloc unit ID 72057594050379776 (type In-row data): Errors found in off-row data with ID 972919799808 owned by data record identified by RID = (1:6136007:40)

    Msg 8929, Level 16, State 1, Line 1

    Object ID 21575115, index ID 1, partition ID 72057594045136896, alloc unit ID 72057594050379776 (type In-row data): Errors found in off-row data with ID 982432350208 owned by data record identified by RID = (1:6315359:26)

    Msg 8929, Level 16, State 1, Line 1

    Object ID 21575115, index ID 1, partition ID 72057594045136896, alloc unit ID 72057594050379776 (type In-row data): Errors found in off-row data with ID 982432415744 owned by data record identified by RID = (1:6315359:27)

    Msg 8929, Level 16, State 1, Line 1

    Object ID 21575115, index ID 1, partition ID 72057594045136896, alloc unit ID 72057594050379776 (type In-row data): Errors found in off-row data with ID 982432481280 owned by data record identified by RID = (1:6315359:28)

    Msg 8929, Level 16, State 1, Line 1

    Object ID 21575115, index ID 1, partition ID 72057594045136896, alloc unit ID 72057594050379776 (type In-row data): Errors found in off-row data with ID 982432546816 owned by data record identified by RID = (1:6315359:29)

    Msg 8929, Level 16, State 1, Line 1

    Object ID 21575115, index ID 1, partition ID 72057594045136896, alloc unit ID 72057594050379776 (type In-row data): Errors found in off-row data with ID 982432612352 owned by data record identified by RID = (1:6315359:30)

    Msg 8929, Level 16, State 1, Line 1

    Object ID 21575115, index ID 1, partition ID 72057594045136896, alloc unit ID 72057594050379776 (type In-row data): Errors found in off-row data with ID 982432677888 owned by data record identified by RID = (1:6315359:31)

    Msg 8929, Level 16, State 1, Line 1

    Object ID 21575115, index ID 1, partition ID 72057594045136896, alloc unit ID 72057594050379776 (type In-row data): Errors found in off-row data with ID 982432743424 owned by data record identified by RID = (1:6315359:32)

    Msg 8929, Level 16, State 1, Line 1

    Object ID 21575115, index ID 1, partition ID 72057594045136896, alloc unit ID 72057594050379776 (type In-row data): Errors found in off-row data with ID 982432874496 owned by data record identified by RID = (1:6315359:34)

    CHECKDB found 0 allocation errors and 31 consistency errors in table 'tblLogging' (object ID 21575115).

    Msg 8928, Level 16, State 1, Line 1

    Object ID 69575286, index ID 1, partition ID 72057594038910976, alloc unit ID 72057594043039744 (type In-row data): Page (1:5513080) could not be processed. See other errors for details.

    Msg 8939, Level 16, State 98, Line 1

    Table error: Object ID 69575286, index ID 1, partition ID 72057594038910976, alloc unit ID 72057594043039744 (type In-row data), page (1:5513080). Test (IS_OFF (BUF_IOERR, pBUF->bstat)) failed. Values are 113379337 and -1.

    Msg 8976, Level 16, State 1, Line 1

    Table error: Object ID 69575286, index ID 1, partition ID 72057594038910976, alloc unit ID 72057594043039744 (type In-row data). Page (1:5513080) was not seen in the scan although its parent (1:5405013) and previous (1:5605197) refer to it. Check any previous errors.

    Msg 8935, Level 16, State 1, Line 1

    Table error: Object ID 69575286, index ID 1, partition ID 72057594038910976, alloc unit ID 72057594043039744 (type In-row data). The previous link (1:5513083) on page (1:5513081) does not match the previous page (1:5513080) that the parent (1:5405013), slot 363 expects for this page.

    Msg 8978, Level 16, State 1, Line 1

    Table error: Object ID 69575286, index ID 1, partition ID 72057594038910976, alloc unit ID 72057594043039744 (type In-row data). Page (1:5513081) is missing a reference from previous page (1:5513083). Possible chain linkage problem.

    CHECKDB found 0 allocation errors and 5 consistency errors in table 'tblOrderApproval' (object ID 69575286).

    Msg 8929, Level 16, State 1, Line 1

    Object ID 373576369, index ID 1, partition ID 305957677629440, alloc unit ID 305957677629440 (type In-row data): Errors found in off-row data with ID 980962967552 owned by data record identified by RID = (1:855595:0)

    Msg 8929, Level 16, State 1, Line 1

    Object ID 373576369, index ID 1, partition ID 305957677629440, alloc unit ID 305957677629440 (type In-row data): Errors found in off-row data with ID 942430158848 owned by data record identified by RID = (1:5512509:1)

    Msg 8961, Level 16, State 1, Line 1

    Table error: Object ID 373576369, index ID 1, partition ID 305957677629440, alloc unit ID 71800601762136064 (type LOB data). The off-row data node at page (1:280727), slot 79, text ID 980962967552 does not match its reference from page (1:5512509), slot 1.

    Msg 8974, Level 16, State 1, Line 1

    Table error: Object ID 373576369, index ID 1, partition ID 305957677629440, alloc unit ID 71800601762136064 (type LOB data). The off-row data node at page (1:280727), slot 79, text ID 980962967552 is pointed to by page (1:855595), slot 0 and by page (1:5512509), slot 1.

    Msg 8951, Level 16, State 1, Line 1

    Table error: table 'tblOrderLine' (ID 373576369). Data row does not have a matching index row in the index 'tblOrderLine_IDX1' (ID 2). Possible missing or invalid keys for the index row matching:

    Msg 8955, Level 16, State 1, Line 1

    Data row (1:5512509:1) identified by (OrderLineId = 1016175) with index values 'OrderId = 116818 and OrderLineId = 1016175 and ProductNumber = 'CB6076' and SupplierCode = 'N''.

    CHECKDB found 0 allocation errors and 5 consistency errors in table 'tblOrderLine' (object ID 373576369).

    CHECKDB found 0 allocation errors and 41 consistency errors in database 'ind-prod7-ws'.

    repair_allow_data_loss is the minimum repair level for the errors found by DBCC CHECKDB (ind-prod7-ws).

    Thanks,

    Tim

  • Corruption won't be in the log backups anyway. Log backups contain changes made to the DB and, since the corruption was IO subsystem problems, it would not have been added to the SQL transaction log, which is what a log backup backs up.

    I would recommend a restore from backup. This is repairable, but you'd lose data from two of your Orders tables and from the Logging table.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thanks for your advice, We are going to try and do a restore and roll the logs forward. It looks like we have all of the backups that we need.

    Tim

  • So I was restoring the backups to a new copy of the database, I had gotten through almost all of the Transaction log backups, but when I tried to restore the backup from Noon today, I got the following message

    Processed 0 pages for database 'prod7-ws_new', file 'prod-ws_dat' on file 1.

    Processed 40667 pages for database 'prod7-ws_new', file 'prod-ws_log' on file 1.

    Msg 3456, Level 16, State 1, Line 13

    Could not redo log record (209754:10375:3), for transaction ID (0:52595938), on page (1:5507385), database 'prod7-ws_new' (database ID 8). Page: LSN = (209497:12355:7), type = 1. Log: OpCode = 6, context 2, PrevPageLSN: (209284:29688:1). Restore from a backup of the database, or repair the database.

    Msg 3013, Level 16, State 1, Line 13

    RESTORE LOG is terminating abnormally.

    Could there be corruption in this T-Log Backup? Does anyone know how to get around this?

    Thanks,

    Tim

  • Could have been corruption of the tran log backup, could also have been corruption in the full that you started with.

    Restore just the full and run a checkDB on it. Make sure that it's as clean as you think it is.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I restored the backup by it self and ran CheckDB and it did not find any issues. Then I started restoring T-Log backups and had gotten through about 90% of them until I got to this error. I searched and I could not really find a way past this issue, because there are no good full backups after this log since it was earlier in the current day that the log failed.

  • Likely that the log backup is damaged then. It's one backup that consistently fails? If you stop before that backup, bring the DB online and checkDB, is the database still OK?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thank you for your advice through this process. We ended up doing the following.

    After I hit the error rolling forward, I recovered the copy of the database to 8AM yesterday and kept it there as reference. We then took a backup of the corrupted production, and restored it down as another copy and ran the dbcc checkdb with repair_allow_data_loss clause. This gave us an idea of what we would see if we did this for production. The impact looked like it would be fairly minor So we took an outage, backed up the database, ran the repair_allow_data_loss and brought back up the application. Then ran another CheckDB to verify no more corruption. We still have the 8AM backup version if we run into needed missing data, but I think we are going to be ok. The application has been back up for about 5 hours and there has not been any issues yet.

    Thanks Again

    Tim

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply