In short: restoring a system back to operational status after some kind of system failure. In database terms, this is usually either restoring a database from backup (on the production server) or failing over to a backup server (which may or not may not be in synch with the production box).
There is plenty information available on the web - just go to Google and search for definition disaster recovery or follow this link: http://www.google.com/search?sourceid=navclient&q=definition+disaster+recovery
Disaster Recovery (DR) is the process of restoring systems to an operational status after a catastrophic systems failure leading to a complete loss of operational capability.
Business Continuity (BC) is the forethought to prevent loss of operational capability even though there may be a catastrophic failure in some parts of the system. BC includes DR plans to restore failed system components.
There will always be failures, but, with the proper systems design and operational planning, there can be significantly reduced risks of operational degradation.
Simply put, design the system with the idea that one or more components in the system may fail and each of those failed components should have a backup to take over. The topic can be very complex or relatively easy, but hardened systems even demand a geographically separate backup location, usually in another state.
DR is the process and planning of the recovery and continuation of the system/enterprise in the event of a disruption to service with the minimal loss of data and the least down time. This can range from a simple server failure to the loss of an entire building due to fire. I wonder how many backup tapes still live in the server room next to the server. Does your DR plan allow for the theft of your hardware, including the backup tapes? Ask how you would cope if you arrive to find all the media,cpu's and memory stolen from your server room.
It is the planning and testing of these processes that constitutes a DR plan, not just having a backup for the last 7 days. The best quote I know is "the only good backup is one that has been successfully restored" no DR plan is a plan unless it's been tested, scenarios may range from a simple restore of a backup to restoring the enterprise in another building.
As mentioned above the business has to calculate how much data loss/downtime it can stand ( vs the cost ) and the DR plan has to allow for that, for all eventualities.
Although I hesitate to mention terrorist activities, we suffered a few attacks here in the UK, a colleague of mine had a backup strategy etc. etc. but after a bomb attack they were not allowed back into the building for several days so could not get to their "everything proof" safe to get the backups out. That's a simple example of DR planning.
If you carry out SQL Server backups using product 'x' then you need to make sure that
If you have some amazing backup procedure can you get hold of the physical backups.
Early on in my career we had a major fire and we were not allowed near the area where the backups were held. The fire brigade told us that the best place for a fire proof safe was near an outside wall.
It is stating the obvious but fire proof safes should not be put in storage cupboards where flamable materials are kept. Keeping them in rooms that are also used for paper storage, or worst still, cleaning materials is simply asking for trouble.
Well written article and several good points.
One question: Can you get my IT Director to read it?
Seriously, true DR/BC shouldn't be one person's responsibility; it takes a team. There is so much to know and take into account that one person would age quickly and prematurely if they tried to do it all for any but the smallest of organizations.
One thing you did leave out: Setting expectations. It is all well and good to talk to the Business Units about their needs but IT also needs to give feedback and set expectations for system recovery, data loss, recovery expenses etc... so that there aren't any surprises. This is different than getting the BU to agree to a set of requirements as the requirements may not be attainable, even after they are agreed to.
-- J.T."I may not always know what I'm talking about, and you may not either."
One other point about expectations is the that the business often tend to downplay theirs, especially when cost is involved, until something goes wrong .. then you find out about mission critical apps/data/etc they dismissed with comments like " we can manage without X' " no we don't need all the phones working - just a single line will do" etc. etc.
From a DBA point of view, beware reliance on other parts of IT, at an employment we tested DR to another site ( basically we paid to have an office available for 350 people with PC's etc. etc. ) anyway as part of my sql server docs I generated html docs from the servers with all sorts of important info. "Put them on the intranet" said the ops guys then they're part of DR and you'll always have them. OK so I arrive on site with my backups and get down to rebuilding my sql servers. I ask for the intranet files, " Oh we don't rebuild the intranet until day 2 " So beware of deending on others even within your own dept .. I'd not done my DR correctly as part of mine relied on another part. < grin >
Nicely said, a good balance of information and humor.
I was involved in the Sarbannes Oxley preparations last year at my company and we implemented the CobiT framework in IT (Control Objectives for Information Technology) to prepare. One of the sections I had to deal with was "Ensuring Continuous Service". Our charter was based on the fact that we lost our data center building and had to quickly come up elsewhere and how do we build preparations for dealing with this.
I jested that I thought it was a silly exercise (in my company) since I'd likely be cutting my resume rather than putting this place together.
Actually it was an interesting exercise, and given the scope (> 1000 Servers) it seemed somewhat silly. Everyone said to get a list of priorities from the business, but guess what? First they don't care. Each person's system (finance, HR, sales, etc) is the most important and rather than creating a big arguement, especially since most people in the business don't know about IT dependencies, I setup a list of servers I thought were important. DCs, LDAP, DNS servers first, mail, major business, etc. We continued on with the process, knowing that most of it would never be implemented because we'd never get to test it because of time and money.
DR is a nice idea, but beyond covering your basic items, hardware failure, data entry issue, virus, minor, limited scope stuff, I'm not sure there is value in doing a lot. Many of our systems are running the enterprise, mail, sql, etc., but for many of them a few hours or downtime or even a day isn't threatening to the business. We don't implement clusters because of the cost, complexity and the bang for the buck. A major press release (like last summer when my company was acquired) can disrupt business for a few days and very little gets done while IT systems hum along quietly. The reverse wouldn't be the end of the world for many places.
As for true disasters. Fire, bio, terrorism, etc. These are extraordinary events and complete planning for them (unless you're in a life endangering industry, hospital, power, etc.) isn't worth it to most businesses. That's why you buy insurance and you deal with them when you can. Clustering and a number of other solutions won't help if your data center goes up in a fire. In that case, you pick up the pieces, get your backup tapes from offsite (where they should be) and go forward.
I like the article because it gets me thinking. We all have a 'well I will get to that' list and documentation and backup are usually #1 and 2.
Our office just went through developing an IT DR plan. It was good to talk to people and see how long they could survive without access to some database, but what it really brought to light was that people could go a couple of days without the data, but there were processes in place on the business side that had no DR plan and those would be catastrophic to the company. Funny thing was IT spent the money to get a DR plan for our stuff but the business side didn't think the investment was worth it. Just shows you that in business we always talk about "us and them"("Users and IT" or "IT and users", but when it gets to the crux of the issue, it should be "WE".