File System Failure DB Recovery

  • I have had a file system failure corrupt a .ndf file in a database. I do not have good backups, I'm sorry, it's just not possible at this time. There are plans to migrate off of this old Win2KAS/SQL2005 to a brand new easy-to-backup server in a VM cluster. But that hasn't happened, and this file system failure caught us with our pants down.

    The server has 3 logical drives. Drive C: Boot drive, Drive E: Critical DB drive (2x16GB Mirrored), Drive G: additional storage (5x4GB RAID5 leaving 16GB usable). I came to work yesterday morning and the server was unable to read the G: file system. After working through that, (rebuild array/chkdsk/reboots/etc), a 3 non-critical databases were corrupted, 2 of which I have fully recovered so far, and the other is now de-prioritized. The important database had transaction logs on all three drives, a primary .mdf file on the E: drive, and an alternate .ndf on the G: drive which stored ONLY table indexes. I was very religious about that. That G: drive .ndf file is now 0 bytes.

    So I went a few rounds with EMERGENCY mode and CHECKDB, restarting SQL Server, etc. The last thing I did is:

    1. Move all of the files to alternate directories

    2. Create a new database with the same name and only a .ldf and .mdf on the E: drive

    3. DBCC CHECKDB (LOOPTESTS, repair_allow_data_loss)

    At this point the DB went into RECOVERY_PENDING, and I could access nothing. I then:

    1. Went back into EMERGENCY mode

    2. DBCC CHECKDB (LOOPTESTS, NOINDEX)

    At this point the DB remained in EMERGENCY mode, and I can see and access all (or at least many) of the tables. Which of course I count as a battle victory. Do I still have a chance at winning the war? Can I recover the DB to a normal state, or do I begin building a new DB copying everything as best I can?

    Thank you,

    Josh

    Here's the ERRORLOG for the steps above.

    2013-08-02 09:43:56.98 spid54 The operating system returned error 38(Reached the end of the file.) to SQL Server during a read at offset 0000000000000000 in file 'G:\MSSQL2005\data\LOOPTESTS_Indexes.ndf'. Additional messages in the SQL Server error log and system event log may provide more detail. This is a severe system-level error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.

    2013-08-02 09:43:57.01 spid54 Warning: The log for database 'LOOPTESTS' has been rebuilt. Transactional consistency has been lost. The RESTORE chain was broken, and the server no longer has context on the previous log files, so you will need to know what they were. You should run DBCC CHECKDB to validate physical consistency. The database has been put in dbo-only mode. When you are ready to make the database available for use, you will need to reset database options and delete any extra log files.

    2013-08-02 09:43:57.01 spid54 Warning: The log for database 'LOOPTESTS' has been rebuilt. Transactional consistency has been lost. The RESTORE chain was broken, and the server no longer has context on the previous log files, so you will need to know what they were. You should run DBCC CHECKDB to validate physical consistency. The database has been put in dbo-only mode. When you are ready to make the database available for use, you will need to reset database options and delete any extra log files.

    2013-08-02 09:43:57.01 spid54 EMERGENCY MODE DBCC CHECKDB (LOOPTESTS, repair_allow_data_loss) executed by SERVER\ME found 0 errors and repaired 0 errors. Elapsed time: 0 hours 0 minutes 0 seconds.

    2013-08-02 09:44:18.20 spid54 Setting database option EMERGENCY to ON for database LOOPTESTS.

    2013-08-02 09:44:18.29 spid54 Starting up database 'LOOPTESTS'.

    2013-08-02 09:44:18.34 spid54 The database 'LOOPTESTS' is marked EMERGENCY_MODE and is in a state that does not allow recovery to be run.

    2013-08-02 09:52:28.40 spid54 DBCC CHECKDB (LOOPTESTS, noindex) executed by SERVER\ME found 3 errors and repaired 0 errors. Elapsed time: 0 hours 8 minutes 7 seconds.

  • Please run the following and post the full and complete, unedited output.

    DBCC CHECKDB (<Database Name>) WITH NO_INFOMSGS, ALL_ERRORMSGS

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Server: Msg 2575, Level 16, State 1, Line 1

    The Index Allocation Map (IAM) page (3:7387) is pointed to by the next pointer of IAM page (0:0) in object ID 0, index ID -1, partition ID 0, alloc unit ID 72057594175750144 (type Unknown), but it was not detected in the scan.

    Server: Msg 8996, Level 16, State 1, Line 1

    IAM page (3:7387) for object ID 0, index ID -1, partition ID 0, alloc unit ID 72057594175750144 (type Unknown) controls pages in filegroup 0, that should be in filegroup 2.

    Server: Msg 8979, Level 16, State 1, Line 1

    Table error: Object ID 87671360, index ID 1, partition ID 72057594073776128, alloc unit ID 72057594081771520 (type In-row data). Page (1:128792) is missing references from parent (unknown) and previous (page (1:92062)) nodes. Possible bad root entry in system catalog.

    CHECKDB found 2 allocation errors and 0 consistency errors not associated with any single object.

    CHECKDB found 0 allocation errors and 1 consistency errors in table 'NEI_History' (object ID 87671360).

    CHECKDB found 2 allocation errors and 1 consistency errors in database 'LOOPTESTS'.

  • Script all objects, export all data, recreate the database, fix up your backup strategy.

    The history table may well fail when exporting.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I've already scripted off all objects. Which went fine, thankfully.

    Using the SQL Server Import and Export Wizard, just trying to do one table (a big one), get this error:

    Could not connect source component.

    Error 0xc0202009: Source - T2ACTests [1]: An OLE DB error has occured. Error code: 0x80004005.

    Error 0xc02020e8: Source - T2ACTests [1]: Opening a rowset for "[LOOPTESTS].[dbo].[T2ACTests]" failed. Check that the object exists in the database.

    Additional information:

    Exception from HRESULT: 0xC02020E8 (Microsoft.SQLServer.DTSPipelineWrap)

  • Try from management studio. If that fails, the object's lost.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 6 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply