Disaster Recovery: Rebuild system databases on SQL Server 2008 cluster

  • bgdjwoods (3/12/2012)


    Good article, however; before anyone decides to use this method as their form of DR, I would highly recommend they rethink this. Unfortunately in your situation you were limited, however; you would possibly still require the use for the backups. When you restore all of the system database, if the server was previously used to run jobs or store any SSIS packages, this would have been lost and would would require a restore of the MSDB backup. Also when the master is restored, all of the databases would have to be re-attached and the SQL logins would have been lost.

    As having been down this road multiple times and having also tested this a multitude of times, I would recommend that everyone thoroughly think out what solution is best given the amount of time they have to bring their server back online. Practice..practice..practice.. is crucial to ensure a smooth recovery process. I would recommend leaving a copy of backups local as well as at a remote location. I also employ a script daily that creates a script to recreate all of my logins.

    Hope this is helpful.

    I agree.

    Regards,

    Basit A. Farooq (MSC Computing, MCITP SQL Server 2005 & 2008, MCDBA SQL Server 2000)

    http://basitaalishan.com
  • Basit Farooq (3/12/2012)


    baxterr (3/12/2012)


    Great article! Did you meet the SLA? Congrats on the quick thinking.

    Thanks for liking the article. Yes we did meet client SLA.

    Hi and congrats, but... this is the part of your article that sort of confuses me:

    [from: Step 3: Post-Rebuild Tasks]

    After rebuilding the resource database I re-applied all service packs and hot fixes. I restored the master, msdb and model databases from the most recent backups. I bring the SQL Server instance resources online in Cluster Administrator.

    ... in this narrative it sounds as though you still got the most recent backups back on line and restored from the most recent backups *before* bringing SQL Server back on line.

    Presumably you couldn't meet the SLA until the SQL Server instance resources were back on line (your last step in this narrative).

    And, as stated in your narrative, you were able to retrieve these backups before that happened. So, it can't have been the backups from tape, which would have taken 3 hours to get. You must have been able to get the corrupt backup drive up within the SLA window.

    So did you do the rebuild because you were not sure you could get the SAN backups in time, to hedge your bets? That part is not clear.

    Additionally, if you had not been able to get the SN backups within the SLA window, and therefore were not able to get your SSIS jobs and packages up and running within the SLA window, would you still have met the SLA?

    I am not clear on the SLA obligation; therefore, I am not clear on whether the rebuild alone would actually have satisfied it.

    Thanks...

    >L<

  • I would not recommend this option unless you absolutely have to bring your SQL Server online within 1 hour to meet the SLA and simply reattach all your user databases without permissions, users, maintenance jobs, packages, configurations, etc. which is pretty useless.

    Rebuilding system databases and reapplying service packs/patches to sql server and restoring system databases from valid backup is also not a good idea. You will be asking yourself for more problems.

    Keep your end user database backups close, but system database backups even closer πŸ˜€

    cheers

    ~Leon

  • Leon Orlov-255445 (3/12/2012)


    I would not recommend this option unless you absolutely have to bring your SQL Server online within 1 hour to meet the SLA and simply reattach all your user databases without permissions, users, maintenance jobs, packages, configurations, etc. which is pretty useless.

    Rebuilding system databases and reapplying service packs/patches to sql server and restoring system databases from valid backup is also not a good idea. You will be asking yourself for more problems.

    Keep your end user database backups close, but system database backups even closer πŸ˜€

    cheers

    ~Leon

    In a disaster situation where your Master Databases is corrupt and you have backups at hand - Am I right in saying that you don't need to rebuild the System DB's and then restore, you can just go ahead with the restore without Rebuilding System Databases?

    How would you bring SQL online in order to do the restore of the System Databases?

    Thanks

  • The only problem with having database files sitting on clustered resouces is that you actually have a single point of failure - on the data. the mist important asset.

    I think that in critical systems there should be a away to have 2 files in sync, so if there is currpotion you have a chanse to get one of them working.

    best way would be to replicate everything to a slave db and in case of falure make the slave master. I wonder if 2012 supports this.

    Dani

  • A couple things I am unclear on:

    1. It sounded like you were deciding between rebuilding and restoring. It seems to me that

    a. If the instance was unstartable, then restoring is not an option until you rebuild the master. So you need both in any case.

    b. If the instance was startable, then rebuilding is unnecessary since you need to restore anyway in order to have access to the application databases.

    2. Where did the rebuild put the master database files? Some default location? The location in the startup parameters? If not the startup parm location, did it reset the startup parms? If it reset the startup parms, did it also change the values in the registry (a β€œFeature” of recent versions of clustered instances).

  • Hello,

    Great post!

    in addition to the points hayden_jones mentionned, i have others unclear points:

    Since you have locals SQL server on every cluster node, you have just rebuild the master database on active node, what about other nodes?

    Florent

  • Good post, it has generated some good points (bgdjwoods especially) and people are asking the OP why did you not do it like this, or do it this way. Sometimes this is not possible, one of SQL Server's plus points (and faults) is that there is many ways to "skin a cat" some are better than others depending on many things such as what you want to do, SQL version/edition/even service pack. As DBA's we all try to convince our clients/bosses/companies on the right way to do something (not always the Microsoft way either!) but it doesn't work like that and as a DBA you have to think smarter, you have to work with the tools at your disposal.

    Going back to the OP, when the clock is ticking, alot is depending you and you alone, you have to do what you can. He had a 1 hour SLA to get SQL back up and this he did. We've all been there and it can scary but as one poster says practice, practice, practice! 😎

    qh

    [font="Tahoma"]Who looks outside, dreams; who looks inside, awakes. – Carl Jung.[/font]
  • Hi,

    By restoring it that way did not you loose the logins and linked server information stored in your original master database? How did you recover those?

    Thanks

    Ameena

  • i dont understand. do you store the system databases on the local drive than the shared drives of cluster?

    2. if placed on shared drive of cluster if the shared cluster drive itself having issue then how come the restore help you.

  • Assuming that we have Active/Active cluster nodes or Active/Passive cluster nodes. In your article you said that if SQL Server is crashed we need to rebuild the Master with given procedure.

    My concern is Clustering provides high availability, when one instance is crashed it will automatically failover to second node and instance will be avialable over there. Please clear me if my understanding is wrong.

    Thanks!

    Siri

Viewing 11 posts - 16 through 25 (of 25 total)

You must be logged in to reply to this topic. Login to reply