Sql Server 2008 DR Strategy - Advice Reqd

  • Hi SQL Guru’s,

    We plan to implement a DR strategy for our mission critical production server. We would need your advice on choosing the best strategy for implementing this.

    Production Database Server Environment:

    Database Server: Active/Passive Sql Server 2008 (Clustered Instance)

    The production Sql Server instance (Server X) will have all the database and log files (including system databases) in the SAN storage (LUN A). There are approx 10- 15 databases on the production server.

    Staging/DR Database Server Environment:

    Database Server: Active/Active Sql Server 2008 (Clustered Instance)

    The Staging Server will have all the database and logfiles (including system databases) on LUN C.

    DR Strategy Proposed:

    This being a mission-critical database server the RPO and RTO has been defined as 15 min.

    Fig(i):

    We have file replication happening from LUN A to LUN B. Please refer fig (i) for the setup defined. LUN B is not connected to the clustered database instance (Server Y) at normal case. At the time of disaster in production server X, LUN C will be disconnected from server Y, instead LUN B will be connected as the Storage for Server Y.

    The Server version and edition (Service Packs and Hot Fixes) will be the same for X and Y.

    We visualized three scenario’s to ensure availability of the standby server (Y) for application connectivity:

    Scenario 1:

    1. We will re-point the master database path (database and log file) in the start-up parameter via the Sql Server Configuration Manager.

    2. Since this master contains the details of all the user databases the server should automatically be able to recognize all the production databases.

    3. We would have to probably rename the server instance (using sp_addserver and sp_dropserver) to point to the original instance name (i.e Server Y) as the master will be pointing to Server X.

    Advantages:

    1. No need for login creation and remapping.

    2. SSIS and Jobs are already present in the “msdb” database so eliminating the need to transfer them separately.

    3. Resource database will also be mapped correctly.

    Concerns:

    1. We have not tested re-pointing the master databases via the start-up parameters. Will this method work?

    2. This being a clustered instance would any complication rise? If so, kindly elaborate.

    Please advice.

    Scenario 2:

    Before Disaster – one time activity:

    1. Install one more Sql Server instance on the Staging Environment called instance Z on Server Y (keeping the system databases alone on the SAN - not on LUN C).

    2. Keep a backup of “master” and “msdb” databases (from server X) on LUN A so that it gets replicated to LUN B.

    3. As a one time activity, attach LUN B. Restore the backup of master (from server X) on instance Z.

    4. We would have to probably rename the server instance (using sp_addserver and sp_dropserver) to point to the original instance name (i.e Server Z) as the master will be pointing to Server X.

    5. Repoint the resource database.

    6. Restore the msdb database on instance Z.

    7. All the user DBs should be automatically recognized. If not, manually attach the user DBs.

    8. Stop the Instance Z.

    9. Detach the LUN B.

    At the time Disaster

    9. At the time of disaster, we only need to detach LUN C, reattach the LUN B and start the instance Z.

    10. May require to fire the login script (to get the latest logins) and probably restore the “msdb” database to get the latest job and packages.

    11. May require attaching some of the newly created user DBs.

    Advantages:

    1. Very little overhead of making the Standby Server online for applications at the time of disaster.

    Concerns:

    1. A backup of “master” and “msdb” must be taken on LUN A everyday.

    2. The login script must be taken to LUN A everyday.

    3. Manual overhead to check for any newly created databases and attach them if required.

    Please advice.

    Scenario 3:

    1. Re-point the Server Y to store the system databases on the SAN (LUN which are not part of LUN C).

    2. Take the login script and “msdb” backup from Server X and keep on LUN A.

    3. At the time of disaster detach LUN C and attach LUN B on instance Y.

    4. Fire login script and restore msdb database.

    5. Attach all user databases manually on Server Y.

    Advantages:

    1. Minimal overhead of bringing the Server Y online as standby for Server X.

    Concerns:

    1. A backup of msdb must be taken on LUN A everyday.

    2. The login script must be taken to LUN A everyday.

    3. Manual overhead of attaching user databases.

    4. Increased RTO and RPO

    Please advice.

    Scenario 4:

    We may use Image backup. But this will definitely breach the RTO and RPO. Please advice if any other mechanism is possible.

  • The only DR it provides for is if both nodes of the server X cluster fail. This is assuming of course that both server X and server Y are in the same data center and not in separate data centers connected with dark fiber.

    Some things to think about for truer DR:

    What if a network switch fails?

    What if the SAN fails?

    What if a SAN switch fails?

    What if a UPS or PDU fails?

    What if the data center has a power outage?

Viewing 2 posts - 1 through 1 (of 1 total)

You must be logged in to reply to this topic. Login to reply