Similar to a lot of DBAs, I find that there just isn't enough time in a day to get everything done. Try as you might, new issues continue to introduce themselves and old tasks re-emerge needing more attention than you first anticipated. Then, your boss drops by to ask when you're going to perform that much-needed Disaster Recovery (DR) test on the Production SQL Servers.
This is difficult, as there really is no clear answer to the question in front of you. Would your boss like you to do the things that are important to your end customer? "Yes." you say. After all, who wouldn't? Ok… but would your boss also like you to ensure that the company's critical data is being protected from substantial down-time, and actually prove that it is? "Yes..." you say again. This is about as far as a lot of people get before they start struggling with how they're going to find time to scrape together the steps for a DR test.
I want to introduce you to what is hopefully a big assist in trying to get that DR test done and being ready for a true DR situation. An easier way to solve your issue above does exist, which is to do yourself a favor and don't start from scratch. Let's walk through a solution that aims at trying to help streamline the process of failing over to a Log Shipping Standby server and back again. You can then go about using it as is or updating it to better suit your needs.
Let's get started...
Editor's Note: The most current version of the script is now here: http://streamlinelogshippingfailovers.codeplex.com/
Use SQL Management Studio to open up the master script provided with this article called SQL_DR_Master.sql. Scroll down past the declarations of variables to the three variables that must be set in order to execute the script. The first variable, @FailOverFromPRIMARY, controls whether to attempt to fail over to from the PRIMARY or not . If this variable is set to 'Y', a final transaction log backup will be attempted on the PRIMARY. Then, all the available transaction log backups will be applied to the SECONDARY to bring the system up to the latest point possible before making the databases available. If set to 'N', the solution will simply bring up the SECONDARY after trying to apply any available transaction logs first.
Next, update the @ScriptsLocation to a network location where you want the failover scripts to be created. An example would be '\\MyServer\MyFailoverFolder'. The scripts produced will then be created in the folder specified, which should be a safe and accessible place.
Lastly, update the @RunType to either 'Manual' or 'Automatic' to determine how Step 4 will be executed. If 'Manual' is chosen, a set of statements will be created and displayed for review, after which they need to be copied to a New Query and run. If 'Automatic' is chosen instead , Step 4 will take those same statements and run them automatically. For the sake of learning or when running it the first time, choosing 'Manual' might be best as you can then see what commands will be run.
Executing the Failover
Below is a table describing the main actions each script will take, along with the desired outcomes. The first 5 scripts fail over from the PRIMARY to the SECONDARY and the last 5 scripts fail back from the SECONDARY to the PRIMARY. Review the table below and take the time to go over the scripts it outputs to ensure things make sense to you and you don't foresee any issues. After confirming things look good with the scripts produced, you are ready to begin with the first step of failing over your server.
Creates the directory that will hold the scripts produced, and disables LS backup jobs.
New folder now available in the network location specified and LS Backup jobs disabled.
Removes the LS restore delay, applies all logs, and disables LS restore jobs.
SECONDARY's databases have had all currently available logs applied.
Performs log backups.
All databases are in NORECOVERY mode, waiting for failing back
Generates list of any remaining logs to apply to databases and restores them.
All of SECONDARY's databases have had all logs applied.
Databases remain inaccessible.
Executes restore command on each database.
All of SECONDARY databases are available and ready for use.
Performs log backups.
All databases are in NORECOVERY mode, waiting for LS to reengage
Restores available log backups.
All of PRIMARY's databases have had all currently available logs applied. Databases have limited accessibility.
Sets all databases back to multi-user mode.
All databases are fully accessible.
Enables all LS backup jobs.
Log backups are being made again by the PRIMARY.
Enables all LS restore jobs.
Log backups are being restored again by the SECONDARY and Log Shipping is running again.
To actually begin the Failover, open 01.sql in SQL Management Studio, verify it's connected to the correct server, and click Execute. Repeat this with 02.sql - 05.sql to fully complete the Failover. Refer to the table above as you go to help ensure things are going as expected.
After executing 05.sql, you will now be running on the SECONDARY server. You can then perform some tests on the SECONDARY (for example, temporarily pointing your application at it to ensure it works properly). When finished, execute 06.sql through 10.sql to fail back to the PRIMARY (carries over any changes made on the SECONDARY). Make sure to repoint your application back to the PRIMARY and do whatever testing is necessary to make certain things are as expected and completely ready for your users.
After you've successfully performed your failover test, remember to create any additional documentation around the use of this solution, if needed. Also, don't forget to periodically perform other steps that are needed for a proper failover as well (copy over logins and jobs, etc.). You and your boss can both rest a little easier now that you know you are better prepared for a true disaster.
Update: The script has been removed from this article, as it is out of date. The most current version of the script is maintained by Micah here: http://streamlinelogshippingfailovers.codeplex.com/