Disaster Recovery: Rebuild system databases on SQL Server 2008 cluster

Introduction

Not long ago I was called at night by an operator because our company had experienced a power surge. As a result of this, all our SQL Servers rebooted. But upon reboot, one of our mission critical clustered SQL Server 2008 R2 instances failed to start automatically. Upon checking the error log, I noticed that following error appeared in the application error log:

During redoing of a logged operation in database 'master', an error occurred at log record ID (206:424:1). Typically, the specific failure is previously logged as an error in the Windows Event Log service. Restore the database from a full backup, or repair the database.

Cannot recover the master database. SQL Server is unable to run. Restore master from a full backup, repair it, or rebuild it. For more information about how to rebuild the master database, see SQL Server Books Online.

After doing a little more digging, I noticed that the file system of the drive that hosts the system databases files was corrupt. In order to meet the client's SLA I had to bring the SQL Server online within one hour. My first thought to recover system databases was to restore them from backups on backup drive, but unfortunately the backup drive was also corrupt as part of power failure.

I was left with two options: either restore the system databases from backups on tape or rebuild the system databases from the SQL Server 2008 installation media, which we have onsite. I quickly came to the conclusion that I had to rebuild the system databases. This was the most suitable solution for this situation because our tapes were stored on a DR site that is in a different city, and it would take at least 3 hours to get the tapes. Also by the time I finished rebuilding the databases, I would have my backup drive available because the maximum time for recovery for a backup drive is 1 hour.

In this article I will demonstrate you the steps I performed to rebuild all system databases (except resource database),and  then I will show you the steps I used to rebuild the resource database. In the end I will list the steps I performed to bring server back to the state before failure.

Step 1: Rebuild the system databases (except resource database)

The first step of the recovery process is to rebuild the following system databases i.e. master, model and msdb. I performed the following steps to rebuild master, model and msdb.

I first logged onto the active node and located SQL Server 2008 R2 installation media on the server. Using Cluster Administrator, I took all the SQL Server resources offline in the corresponding cluster application group. From a command prompt window, I used the following command to start the SQL Server setup to rebuild system databases:

Setup /QUIET /ACTION=REBUILDDATABASE /INSTANCENAME=InstanceName /SQLSYSADMINACCOUNTS=accounts [/SAPWD= StrongPassword ] [/SQLCOLLATION=CollationName]

The table below explains the parameters for the Setup program:

Parameter name

Description

/QUIET or /Q

Specifies that Setup run without any user interface.

/ACTION=REBUILDDATABASE

Specifies that Setup re-create the system databases.

/INSTANCENAME=InstanceName

Is the name of the instance of SQL Server. For the default instance, enter MSSQLSERVER.

/SQLSYSADMINACCOUNTS=accounts

Specifies the Windows groups or individual accounts to add to the sysadmin fixed server role.

[ /SAPWD=StrongPassword ]

Specifies the password for the SQL Server sa account. This parameter is required if the instance uses Mixed Authentication (SQL Server and Windows Authentication) mode. 

[ /SQLCOLLATION=CollationName ]

Specifies a new server-level collation. This parameter is optional. When not specified, the current collation of the server is used.

The command I executed to start the SQL Server setup is shown in the figure below:

When Setup has completed rebuilding the system databases, it returns to the command prompt with no messages as shown in the following figure:

I examined the Summary.txt log file as show in the following figure to verify that the process completed successfully:

This file is located at C:\Program Files\Microsoft SQL Server\100\Setup Bootstrap\Logs.

Step-2: Rebuild the Resource database

The second step of the recovery process is to rebuild the Resource system database. I performed the following steps to rebuild the resource system database:

I launched the SQL Server Setup program (setup.exe) from SQL Server 2008 R2 distribution media. In the left navigation area, I clicked Maintenance and then clicked Repair. The setup support rule and file routines ran to ensure that my server has prerequisites installed and my server passes setup validation rules. I clicked OK to continue. On the Select Instance page, I selected the instance to repair, and then clicked Next to continue. Now repair rules ran to validate the operation. I clicked Next to continue. From the Ready to Repair page, I seleted Repair option and then clicked Next to continue. The Complete page appeared that indicates the operation is finished successfully.

Step 3: Post-Rebuild Tasks

The last step of the recovery process is to bring the server to the state before failure. I performed the following steps to accomplish this task: 

After rebuilding the resource database I re-applied all service packs and hot fixes. I restored the master, msdb and model databases from the most recent backups. I bring the SQL Server instance resources online in Cluster Administrator.

Conclusion

In this article you have seen how I rebuilt system databases in a SQL Server 2008 cluster that failed to restart. You should rebuild system database only under conditions when you don’t have backups of system databases, system backups are corrupted or backups are not immidiately available. You will lose all your Linked Server, Logins, SQL Server Agent jobs when you rebuild system databases. To restore Linked Server, Logins, SQL Server Agent jobs either restore the master and msdb databases from most recent backups or recreate these objects by executing the scripts containing the definition of these objects. 

Rate

4.47 (17)

Share

Share

Rate

4.47 (17)

Related content

Backup and Restore Back to Basics with SQL LiteSpeed

This article analyzes the various options available for your backup and recovery process with SQL Server 2000 as well as an enhancement to your SQL Server backup and recovery process using a highly efficient backup and restore utility that provides significant time and disk space savings called SQL LiteSpeed.

4 (1)

2002-12-23

9,763 reads

Streamlining the Database Server Recovery Process

Are you tired of manually restoring each database on a new server when the original server has a melt down? Does the manual process seem slow, and prone to keystoke and mouse click errors? Would you like to have those restore scripts automatically built, so you only have to fire them off? Well this article will show you one possible method for speeding up and reducing errors will trying to perform a restore of all databases on a server.

5 (2)

2002-11-05

8,972 reads

Disaster In The Real World - #2

Back in April Steve Jones wrote up a disaster at work. Andy had one this week and wrote up the story too. Copy cat! Pretty soon everyone will be having a disaster and writing a story about it! Give these guys credit for letting you see what happens when it ALL goes bad. Disaster recovery is hard to sell and hard to do, reading the article might give you an idea that will save you some time and/or data one day.

4.75 (4)

2012-12-14 (first published: )

10,513 reads