Disaster Recovery Week

Question

Post reply

Disaster Recovery Week

Steve Jones - SSC Editor

SSC Guru

Points: 734459
More actions
December 9, 2012 at 3:43 am

#75668

Comments posted to this topic are about the item Disaster Recovery Week

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply

Johan Bijnens SSC Guru Points: 135259 More actions · Answer 1

just a very primitive tip: Start small !

Start building your disaster recovery project with a very small scope, not that critical to business activity.

Only read the headers of the articles / BP / books you find about Disaster Recovery, so you know the topics to keep in mind and are not overwhelmed by the tiny details behind it.

That is for later on, after you review your first, second, third cycle of building your general DRP.

Pick any non-critical database and start from there on.

Once you are comfortable with the elementary concepts, and their impact to the topic from your point of view, make sure your project gets sponsored by your superiour(s), because at certain point in time, you will have to contact business responsibles to discuss your findings and align them budget wise.

There is more to disaster recovery than SQLServer, but it makes a good starting point.

Johan

Learn to play, play to learn !

Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:

- How to post Performance Problems
- How to post data/code to get the best help[/url]

- How to prevent a sore throat after hours of presenting ppt

press F1 for solution, press shift+F1 for urgent solution 😀

Need a bit of Powershell? How about this

Who am I ? Sometimes this is me but most of the time this is me

ashepard SSC Rookie Points: 37 More actions · Answer 2

Backups have only given me grief. Its restores that count & make me look good.

However I have gotten burned with "level of expectation"

1) A dual RAID-5 SAN died while I was on vacation. It was in the cleanest part of the city garage but too many disks failed. I came in, restored the database and maintanece jobs to another server in under six hours. I was proud - my boss was livid.

Why did it take six hours to restore and reconfigure an 18 gig database? He thought it should only take twenty min and the process be documented so anyone could do it.

Lesson learned - have a system up and ready.

2) We have several databases where we only keep six months or less of backups. I like to keep three years. Why? Problems do not always surface in six months. Year end processing has found issues with data from a year ago.

3) Would I use the cloud? Not even in an emergency - only a catastrophy. Why? (laughs) our internet connection is one of the things to go. Its not under our control - but the data center is. Those that can work while others are down make the money and keep their jobs.

crazy4sql SSCoach Points: 19590 More actions · Answer 3

I have a question on one of the DR scenario,

Lets say I have 14 TB database in oltp mode with no/least performance problem. (yes weired but lets accept it). Now lets say the system goes down because of any problem, and I have the valid set of latest backup. I can start restoring but the problem is how to meet the objective of RTO (recovery time objective) of 3 hrs. As restoring this ~tb size of full backup will definately take time to restore.

*************************************

I know there should be some faiover configured like replication,mirroring,cluster-failover or logshipping(least preferred). Or should have some offline site where backups getting checked and restored in time.

****************************************

Lets assume because of any reason these options was not considered and the only option now is restore. What would be my strategy to bring this database online within RTO?

----------
Ashish

Steve Jones - SSC Editor SSC Guru Points: 734459 More actions · Answer 4

crazy4sql (12/10/2012)
I have a question on one of the DR scenario,
Lets say I have 14 TB database in oltp mode with no/least performance problem. (yes weired but lets accept it). Now lets say the system goes down because of any problem, and I have the valid set of latest backup. I can start restoring but the problem is how to meet the objective of RTO (recovery time objective) of 3 hrs. As restoring this ~tb size of full backup will definately take time to restore.
*************************************
I know there should be some faiover configured like replication,mirroring,cluster-failover or logshipping(least preferred). Or should have some offline site where backups getting checked and restored in time.
****************************************
Lets assume because of any reason these options was not considered and the only option now is restore. What would be my strategy to bring this database online within RTO?

No option. You can't do it. Physically it will take time to read the backup file and rewrite the disk. If you haven't pre-built the log file, that has to be zero'd out. If you don't have IFI enabled, the data file also has to be zero'd out. Both take time.

I don't know what your I/O subsystem can do, but 14TB is substantial. At the very least, I'd build a cheap, whitebox system that had consumer level drives of 16-20TB. That could probably be done for $2-3k these days and it would get you running quickly if you had some sort of log shipping enabled.

pbarbin Old Hand Points: 362 More actions · Answer 5

Good timing on the article. Our company just completed our yearly DR practice test. It's not an exhaustive test, but we take some of the more critical systems and try to bring them up in a virtual network at our mirrored site (affectionately known as "the bubble").

As part of our DR strategy, we employ Sql Server Database mirroring. All web, app, and reporting servers are virtual and are replicated to our DR site.

To start the test, the network guys create the bubble and bring up domain controllers and other infrastructure needed to support the network.

Databases mirroring is broken. Mirrored Sql servers are brought into the bubble and renamed to match production server names.

Sql Server allows you to rename the server and instance easily with the sp_addserver, sp_dropserver commands. We then recover the mirrored dbs and have to setup

linked servers and verify all logins and users have the correct passwords and permissions. All of this is scripted prior to the mirror break to minimize our virtual downtime.

We hand over the environment to a small team of testers with test scripts and they go to work.

While we didn't plan perfectly, overall, the test was a success. We learn valuable lessons each time we run the test. Interestingly, many difficulties are due to the fact that we are simulating a disaster. But experiences tells us that in a real disaster there will be many other problems that we didn't forsee. This keeps us honest with ourselves that bringing up a production environment in a real disaster won't be as easy as it looks on paper.

Paul

crazy4sql SSCoach Points: 19590 More actions · Answer 6

No option. You can't do it. Physically it will take time to read the backup file and rewrite the disk. If you haven't pre-built the log file, that has to be zero'd out. If you don't have IFI enabled, the data file also has to be zero'd out. Both take time.
I don't know what your I/O subsystem can do, but 14TB is substantial. At the very least, I'd build a cheap, whitebox system that had consumer level drives of 16-20TB. That could probably be done for $2-3k these days and it would get you running quickly if you had some sort of log shipping enabled.

thanks Steve for the reply and suggestion. Just one doubt, can we consider "restoring with partial option" ?

----------
Ashish

Steve Jones - SSC Editor SSC Guru Points: 734459 More actions · Answer 7

You can restore a filegroup, which might help if you have data split by filegroups. Perhaps current data in one, archived in another(s). That might allow you to restore one group and be running.

Be aware, however, that if you put indexes on one and data in another filegroup, if you restore one, the app may not work well.

Rudy Panigas SSChampion Points: 10702 More actions · Answer 8

Hello Everyone,

Here are several observations I have notice over a period of 10+ years working on disaster recovery.

1) Most people have no idea of how their SQL production servers are setup. Like port numbers, Link Connections, sp_configure settings, etc. Documentation is very poor in 90% of the cases I've worked on. I have created a script that helps in this area and is posted on this site. Just seacsearch "SQL Server Documenter". It not perfect but a good start.

2) Backups. You won't believe how many people have never tested the restores of their databases. Backups are only as good as your restores. Lots for companies have large databases but do not replicate in any manner. Then they complain that the restores are too long.

3) Testing. You would be shocked how many companies have no DR testing. Some say "we can restore to the cloud" and yet have never tried it. If we do more DR testing then we can weed out the first two issues listed above.

4) Bare Metal builds. This seem like a lost art. With replication and HA solutions we forget that you still to know how to perform bare metal builds.

Well, that's all I have for now.

Rudy

john.w.walker Old Hand Points: 386 More actions · Answer 9

All of these articles this week are interesting, but they are not about disaster recovery - they are about business continuity. The difference is big, with a disaster you lose your infrastructure so you have to run off an alternate site such as SunGard or a cloud provider. Having backups and backup servers is great, but a disaster is when you have no place to restore your tape or USB drive or cloud backups, if you don't have a Disaster Recovery plan.

crazy4sql SSCoach Points: 19590 More actions · Answer 10

dont you think DR plan is also part of business continuity? Which business will plan DR without considering business continuity 😀

----------
Ashish

Rudy Panigas SSChampion Points: 10702 More actions · Answer 11

Rudy Panigas

SSChampion

Points: 10702

December 14, 2012 at 6:56 am

#1568676

DR + BCP = Success 🙂

Rudy

john.w.walker Old Hand Points: 386 More actions · Answer 12

All businesses should plan for DR and BC, but the weekly topic is DR and this is all about BC. I would thing SQL Server Central would know the difference. Even Steve's original story to start it is BC.

Steve Jones - SSC Editor SSC Guru Points: 734459 More actions · Answer 13

Which original story? Losing power in a data center? That's a DR, technology related issue.

Companies trying to be sure their data centers are running after hurricane Sandy, that's DR. There's a BC component to that, but the piece we are talking about here, with people working in technology, is the technology end of a disaster, which is usually known as DR.

If you want to be pedantic about it and dig into what is BC and what is DR, that's fine, but I think that's a bit of a waste of time.

john.w.walker Old Hand Points: 386 More actions · Answer 14

Right Sandy is DR. None of your stories are, no matter what scewy name you give someone that expects to read about DR when advertised Steve.