The continuity of operations during a disaster is as critical to running a business as just about anything. Somewhere, within your organisation, there is a document that provides all the technical details with respect to disaster recovery for production applications running on Microsoft SQL Server database management system - if not, this should help. The goal of this post is to clearly summarise the steps required to follow in the event of a database production system infrastructure failure.
Since we were using SQLBackup (in this documented case), we had many options to profit from during the automatic restoration of compressed backup files to a failover server. We took advantage of the great compression (SQL 2008 Entreprise has the option now too), parallelism by means of using multiple backup devices at a time (we found six to be a sweet spot for our disks), and finally, easily query-able file names (configured to state the database name and date). To carry out automated restores, which is basically a type of storage replication - in fact the most reliable I have ever seen, we ideally read directly from the backup log system tables, such as BackupHistory, BackupSet, BackupFile or Backuplog on the publishing server. However, I often set the master restore scripts statically, meaning that the SQL Agent Restore jobs on the disaster recovery server (subscriber) have their parameters manually set during testing and are usually left that way – but of course it is best to pull the meta data directly from the database system in case you move files around and forget to update the restore scripts (i.e. the dynamic method). When it comes time to change the automated restore jobs that are static, I recommend scripting the job and doing search/replace.
Here are the steps to follow during a disaster recovery situation:
1) If you haven’t heard from them directly already, please contact first line database administrator support and primary system admin at ### or the backup database administrator/system admin at #
2) After the production/original data publisher server failure, the restore/backup-subscriber server will be used as the primary database server (a.k.a. DRP server). Inform everyone by E-mail in department or internal/external clients (updated production impact list necessary).
3) Once the switch occurs to or the downtime actually happens to the DRP server, all application connections strings should to be changed to access the DRP (or automatically in the case of keep-alived style connections).
4) Disable Automatic Restore SQL Agents on the restore server.
5) Remember to disable all SQL Agent jobs on primary server, if possible.
6) Enable all maintenance and backup jobs on restore server (the new primary, until you have synched up the old primary).
Nobody has the outright desire to go through a disaster without being properly prepared. When I was asked to prepare a plan for Canada’s largest Pension Fund Institution (CDP.ca), I took it rather seriously, hence the length of this document (FULL Article Published by Simple-Talk here). We ran this through a real disaster recovery test over a weekend, and all worked out just fine (mega phew!). My goal with this is to share with you exactly how you can get your own disaster recovery plan in place, so that when the time comes at least the recovery step itself is not a disaster.
I have written a twenty pager with a full description of the disaster recovery method which is currently going through quality control, editing, et al., therefore, once that is done I shall place a link here.
For now, enjoy these critical scripts (thanks Steve, will update them soon in codebase) from a database called DBA_Tools. I am sure some of my readers have done this type of work in a different way, and I would love to hear how you prefer to setup your disaster recovery systems.