A story of corrupted Master database in SQL Server 2008

  • I assume the backups are not corrupted.

    Actually this is what I did just for a test. I shut down sql services, restore all my system databases from a backup, and brought sql backup, and it was successful. But then again, there was no corruption of any kind, just an exercise, regristry was not corrupted. So I am just curious, in case there is a corruption, would I be able to follow the same steps?

  • There is way but only for really desperate situations.

    You can install SQL server on another box. This computer should have the same name that the one you are in trouble. Once installed just shutdown SQL Server, pickup the mdf and ldf files and take them to the computer which has the corrupted database.

    Start SQL server in single user and restore the master from backup.

    I did it once a long time ago.

    Regards Ramon

  • I am curious why the user databases needed to be restored since a version of the Master, (and presumably msdb & tempdb), were restored? If this was my system, I would certainly investigate what the cause for corruption was to the best of my ability.

    -Steve

  • Mark S. Rasmussen (5/16/2011)


    Interesting story! You don't mention anything about what caused the corruption however?

    One thing is fixing the corruption, but I'd be very wary of just restoring the system back the way it was, without examining what caused the corruption in the first place. If it's a fried IO subsystem, it's probably just a matter of time before the corruption reoccurs.

    if you have a revenue producing server the first thing you want to do is get it back to the point of producing revenue to get people off your back. later on you can look through the logs and whatever to find the true cause. or just upgrade all your firmware/drivers as a first step. HP is usually buggy with theirs and we update our servers once or twice a year

  • alen teplitsky (5/16/2011)


    Mark S. Rasmussen (5/16/2011)


    Interesting story! You don't mention anything about what caused the corruption however?

    One thing is fixing the corruption, but I'd be very wary of just restoring the system back the way it was, without examining what caused the corruption in the first place. If it's a fried IO subsystem, it's probably just a matter of time before the corruption reoccurs.

    if you have a revenue producing server the first thing you want to do is get it back to the point of producing revenue to get people off your back. later on you can look through the logs and whatever to find the true cause. or just upgrade all your firmware/drivers as a first step. HP is usually buggy with theirs and we update our servers once or twice a year

    I won't disagree, but I certainly won't agree either. If it's a revenue producing system that's critical, I wouldn't risk any more corruption, possibly causing data loss down the road. In this case it may have been lucky for the master database to become corrupt so that the corruption was detected. Otherwise it could've continued for sometime longer, causing corrupt backup files and data loss as a result.

    If it's a that critical a system, I'd have a backup server/disk system ready to restore onto.

    Mark S. Rasmussen
    Blog: improve.dk
    Twitter: @improvedk
    Internals geek & author of OrcaMDF

  • Error 823

    I/O error <error> detected during <operation> at offset <offset> in file '<file>

    Proactively, a DBA can set an alert within the SQL Agent for 024 - Fatal Error: Hardware Error, to catch things before they go completely south. Ste this to all databases.

  • Thanks for sharing.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • Interesting article. Curious about why you did the rebuild (particularly after the first failure)? I would have thought moving directly to a restore of master would you have gotten you up and going faster.

    Glad it worked out for you.

  • I see people go both ways on this. Ultimately since you can't usually guarantee that xx piece of technology caused the corruption, you bring the system back online. I've seen Kimberly Tripp recommend this as well, get the system online, then track down issues and fix data.

    I might bring the system back up as a copy, get people moving, and then start investigating. I'd also be looking to increase the frequency of backups and copies until I solved this.

  • I have to question your priorities. The server was down because the master database was corrupted and you opted to deal with WMI permissions first?


    My blog: SQL Soldier[/url]
    SQL Server Best Practices:
    SQL Server Best Practices
    Twitter: @SQLSoldier
    My book: Pro SQL Server 2008 Mirroring[/url]
    Microsoft Certified Master: SQL Server, Data Platform MVP
    Database Engineer at BlueMountain Capital Management[/url]

  • Thank you much for the article. I appreciate everything listed - it is very helpful to have all that in one place.

    In case this is of assistance to anyone, in our environment on each server we always keep a cold backup on-hand of the master database after any major changes (initial install, SP, CU, etc.) so that we have something to start SQL Server with, if the master gets corrupt - it's often quicker than scenarios discussed. Sometimes we still have had to do scenarios like the one written here, but it's often sufficient.

  • I admit this is a newbie question but what does MOF stand for? Initial searches came up with matches, but no one else seems to want to explain it.

    I try hard to stay away from TLAs (three letter acronyms).

    TIA

  • Microsoft Operations Framework

    http://technet.microsoft.com/en-us/library/cc506049.aspx

  • Thanks for your story, I've never encountered it, but it's comforting to know the solution is available for me when I need it.

    You can be sure when the time comes that I'll need it, your article will come immediately to mind.

  • It will be helpful if someone could let us know about the reason for such corruption. If master DB is corrupted how will the WMI component too got corrupted? Is there any relation between the two?

    Also tell me some do's and don'ts to avoid such issues or to be on safer side.

    ______________________________________________________________Every Problem has a Solution; Every Solution has a Problem: 🙂

Viewing 15 posts - 16 through 30 (of 35 total)

You must be logged in to reply to this topic. Login to reply