Yeah! my first real corrupt DB

  • The Yeah! part is that this is a test db on a test server while testing a new san.

    I've watched the whole presentation by Paul on checkdb and yet I've stumbled on a case he didn't talk about and that I can't figure out on my own.

    I was developing a script to update a history table. Nothing big, maybe 95K rows, 45 MB.

    The db has 30 GB in datafiles with 15 GB free and another 145 GB free on the drives.

    Similar figures on tempdb, 100% free space in logs and data with 45 GB free space on the drive.

    I'm saying this because the only thing I found on Paul Randal's blog about this error was about lack of space! => http://www.sqlskills.com/BLOGS/PAUL/post/Database-snapshots-when-things-go-wrong.aspx

    What I do a start transaction, run 10 steps of a script, leave the tran opened and then keep developing for the final 2-3 steps. Which at that particular point was giving me primary key violations.

    I highlight my rollback statement and I get and error message similar to this (sorry but I didn't save the original) :

    Msg 926, Level 14, State 1, Line 1

    Database 'Test Upgrade Dynamics' cannot be opened. It has been marked SUSPECT by recovery. See the SQL Server errorlog for more information.

    Msg 5069, Level 16, State 1, Line 1

    ALTER DATABASE statement failed.

    Msg 3316, Level 21, State 2, Line 1

    During undo of a logged operation in database 'Test Upgrade Dynamics', an error occurred at log record ID (101653:4848:329). The row was not found. Restore the database from a full backup, or repair the database.

    Msg 3414, Level 21, State 2, Line 1

    An error occurred during recovery, preventing the database 'Test Upgrade Dynamics' (database ID 9) from restarting. Diagnose the recovery errors and fix them, or restore from a known good backup. If errors are not corrected or expected, contact Technical Support.

    This is what I have in the sql server logs (nothing more in the Windows log) :

    Date8/9/2011 4:56:36 PM

    JournalSQL Server (Actuel(le) - 8/9/2011 5:42:00 PM)

    Sourcespid54

    Message

    During undoing of a logged operation in database 'Test Upgrade Dynamics', an error occurred at log record ID (101653:4848:329). Typically, the specific failure is logged previously as an error in the Windows Event Log service. Restore the database or file from a backup, or repair the database.

    Awesome, I'll finally be able to test my "recovery" skills.

    ALTER DATABASE [Test Upgrade Dynamics] SET EMERGENCY

    DBCC CHECKDB('Test Upgrade Dynamics') WITH NO_INFOMSGS, ALL_ERRORMSGS

    And that returns... NOTHING

    Ok, then maybe the san or <whatever> caught the error and repaired it.

    ALTER DATABASE [Test Upgrade Dynamics] SET ONLINE

    Back to the same error message and db goes suspect.

    AFAIK checkdb doesn't validate the log (not sure here so please advise).

    I have the <brilliant> idea to run this to see if maybe the log is corrupt

    91 seconds later I get 2.8M rows back and NO ERROR.

    That's when I ran out of ideas and started this thread!

  • Edit.

    Link to Paul's amazing presentation : http://channel9.msdn.com/Events/TechEd/NorthAmerica/2008/DAT375

    I have good backups and I can erase that db anytime I want without any problems. I'm doing this more for the experience and maybe helping someone else out in the future via google or whatknot.

    Also while I have backups of prod DB, that particular db has never been backed up. It's in full recovery.

    Good luck! 😉

  • Ninja's_RGR'us (8/9/2011)


    Also while I have backups of prod DB, that particular db has never been backed up. It's in full recovery

    Which means it's in pseudo-simple recovery (log getting truncated on checkpoint) and hence a tail log backup won't work.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (8/9/2011)


    Ninja's_RGR'us (8/9/2011)


    Also while I have backups of prod DB, that particular db has never been backed up. It's in full recovery

    Which means it's in pseudo-simple recovery (log getting truncated on checkpoint) and hence a tail log backup won't work.

    Exactly.

    This not a a critical db by any means. It was supposed to be online for 1-2 days while testing migration scripts for the ERP. So it was not backed. Maybe another reason to ALWAYS backup, even for this case ;-).

    I needed an extra sandbox to play with so I used this one... and you know the rest of the story.

  • Ninja's_RGR'us (8/9/2011)


    Awesome, I'll finally be able to test my "recovery" skills.

    ALTER DATABASE [Test Upgrade Dynamics] SET EMERGENCY

    DBCC CHECKDB('Test Upgrade Dynamics') WITH NO_INFOMSGS, ALL_ERRORMSGS

    And that returns... NOTHING

    Ok, then maybe the san or <whatever> caught the error and repaired it.

    ALTER DATABASE [Test Upgrade Dynamics] SET ONLINE

    Back to the same error message and db goes suspect.

    Once you've set emergency you need CheckDB with repair_allow_data_loss (or other method of fixing cause of suspect) to go online. You can't just set online. CheckDB doesn't check the log, but there's something wrong somewhere (either in the log or the data file) that recovery is failing due to.

    Gut feel here - checkDB repair allow data loss will rebuild the log and then you'll be able to bring the DB online, but may be transactionally inconsistent as the rebuild of the log would have discarded the information on uncommitted transactions and incomplete transactions

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • p.s. you've either got some IO error, a memory error or you've hit a severe SQL bug. See if you can repo on completely different hardware, see if you can repo on same server, different drives (preferably local, not SAN) (first see if you can repo at all)

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Darn. I wish this would have had more options to test.

    Anyway I can backup the DB so we can retry a few things? I'm getting errors both in emergency and suspect mode that I can't back up.

  • GilaMonster (8/9/2011)


    p.s. you've either got some IO error, a memory error or you've hit a severe SQL bug. See if you can repo on completely different hardware, see if you can repo on same server, different drives (preferably local, not SAN) (first see if you can repo at all)

    I can't not try this on san, this is a VM and the "local" drives are all on san.

    I'll see if I can repro on our other servers (different san).

    Any article your recommend for running the correct tests to figure out the hardware issue? As I said windows log was errm useless in this case.

  • Vendor-specific hardware diagnostics.

    I'd also run SQLIOSim and see if it picks up any funnies.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • And in the news, Google stills works :pinch:!

    I should start Let Me google that for me (tm). 😀

    Took the server offline and copying the files. It's 50 GB so it'll take a while before I can run repair.

  • There are hundreds of cases I didn't talk about in that presentation - but glad you liked it!

    Sounds like memory corruption of a log block changing the slot number in a log record.

    What are the backup errors you're getting?

    How big is the database? (wondering if you can take if offline, zip it up and send it to me to play with)

    Paul Randal
    CEO, SQLskills.com: Check out SQLskills online training!
    Blog:www.SQLskills.com/blogs/paul Twitter: @PaulRandal
    SQL MVP, Microsoft RD, Contributing Editor of TechNet Magazine
    Author of DBCC CHECKDB/repair (and other Storage Engine) code of SQL Server 2005

  • In other news... Paul Randal is psychic, or can smell a corrupt database from several thousand km away. 🙂

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Paul Randal (8/9/2011)


    There are hundreds of cases I didn't talk about in that presentation - but glad you liked it!

    Sounds like memory corruption of a log block changing the slot number in a log record.

    What are the backup errors you're getting?

    How big is the database? (wondering if you can take if offline, zip it up and send it to me to play with)

    I loved the presentation. Do you have another one besides the teched?

    It says cannot take the backup in emergency/ suspect mode. Jonathan Keyas suggested taking the server offline and manual copy, which I'm running at the moment.

    The file sizes are around 40 GB, but 50% empty so I could probably get it down to 2 GB with winrar. Send me a mail with ftp info and I'll send it over later tonight.

  • GilaMonster (8/9/2011)


    In other news... Paul Randal is psychic, or can smell a corrupt database from several thousand km away. 🙂

    Nah, he said in the teched presentation that he monitors this specific forum (e-mail notifications). So I made sure to post in corruption to get the 2 best experts I know of :-D.

  • There are hundreds of cases I didn't talk about in that presentation

    No wonder it took you 5 years between 2k and 2k5 :hehe:.

    How many cases do you have to contend for?

Viewing 15 posts - 1 through 15 (of 78 total)

You must be logged in to reply to this topic. Login to reply