From the soapbox: Does anyone know what disaster recovery is?

  • Comments posted to this topic are about the content posted at http://www.sqlser

  • In short: restoring a system back to operational status after some kind of system failure.  In database terms, this is usually either restoring a database from backup (on the production server) or failing over to a backup server (which may or not may not be in synch with the production box).

    There is plenty information available on the web - just go to Google and search for definition disaster recovery or follow this link: http://www.google.com/search?sourceid=navclient&q=definition+disaster+recovery

     

  • Disaster Recovery (DR) is the process of restoring systems to an operational status after a catastrophic systems failure leading to a complete loss of operational capability.

     

    Business Continuity (BC) is the forethought to prevent loss of operational capability even though there may be a catastrophic failure in some parts of the system. BC includes DR plans to restore failed system components.

     

    There will always be failures, but, with the proper systems design and operational planning, there can be significantly reduced risks of operational degradation. 

     

    Simply put, design the system with the idea that one or more components in the system may fail and each of those failed components should have a backup to take over.  The topic can be very complex or relatively easy, but hardened systems even demand a geographically separate backup location, usually in another state.

     

  • DR is the process and planning of the recovery and continuation of the system/enterprise in the event of a disruption to service with the minimal loss of data and the least down time. This can range from a simple server failure to the loss of an entire building due to fire. I wonder how many backup tapes still live in the server room next to the server. Does your DR plan allow for the theft of your hardware, including the backup tapes? Ask how you would cope if you arrive to find all the media,cpu's and memory stolen from your server room.

    It is the planning and testing of these processes that constitutes a DR plan, not just having a backup for the last 7 days. The best quote I know is "the only good backup is one that has been successfully restored" no DR plan is a plan unless it's been tested, scenarios may range from a simple restore of a backup to restoring the enterprise in another building.

    As mentioned above the business has to calculate how much data loss/downtime it can stand ( vs the cost ) and the DR plan has to allow for that, for all eventualities.

    Although I hesitate to mention terrorist activities, we suffered a few attacks here in the UK, a colleague of mine had a backup strategy etc. etc. but after a bomb attack they were not allowed back into the building for several days so could not get to their "everything proof" safe to get the backups out. That's a simple example of DR planning.

    [font="Comic Sans MS"]The GrumpyOldDBA[/font]
    www.grumpyolddba.co.uk
    http://sqlblogcasts.com/blogs/grumpyolddba/

  • If you carry out SQL Server backups using product 'x' then you need to make sure that

    • You can get hold of the installation disks for product 'x'?
    • Does product 'x' have a backup copy to protect against damage of the primary copy?
    • You know how to install it?
    • You know how to configure it?
    • Do you know your current configuration?

    If you have some amazing backup procedure can you get hold of the physical backups.

    Early on in my career we had a major fire and we were not allowed near the area where the backups were held.  The fire brigade told us that the best place for a fire proof safe was near an outside wall.

    It is stating the obvious but fire proof safes should not be put in storage cupboards where flamable materials are kept.  Keeping them in rooms that are also used for paper storage, or worst still, cleaning materials is simply asking for trouble.

     

  • Well written article and several good points.

    One question: Can you get my IT Director to read it?

    Seriously, true DR/BC shouldn't be one person's responsibility; it takes a team. There is so much to know and take into account that one person would age quickly and prematurely if they tried to do it all for any but the smallest of organizations.

    One thing you did leave out: Setting expectations. It is all well and good to talk to the Business Units about their needs but IT also needs to give feedback and set expectations for system recovery, data loss, recovery expenses etc... so that there aren't any surprises. This is different than getting the BU to agree to a set of requirements as the requirements may not be attainable, even after they are agreed to.

    -- J.T.

    "I may not always know what I'm talking about, and you may not either."

  • One other point about expectations is the that the business often tend to downplay theirs, especially when cost is involved, until something goes wrong .. then you find out about mission critical apps/data/etc they dismissed with comments like " we can manage without X' " no we don't need all the phones working - just a single line will do" etc. etc.

    From a DBA point of view, beware reliance on other parts of IT, at an employment we tested DR to another site ( basically we paid to have an office available for 350 people with PC's etc. etc. ) anyway as part of my sql server docs I generated html docs from the servers with all sorts of important info. "Put them on the intranet" said the ops guys then they're part of DR and you'll always have them. OK so I arrive on site with my backups and get down to rebuilding my sql servers. I ask for the intranet files, " Oh we don't rebuild the intranet until day 2 "  So beware of deending on others even within your own dept .. I'd not done my DR correctly as part of mine relied on another part. < grin >

    [font="Comic Sans MS"]The GrumpyOldDBA[/font]
    www.grumpyolddba.co.uk
    http://sqlblogcasts.com/blogs/grumpyolddba/

  • Nicely said, a good balance of information and humor.

    I was involved in the Sarbannes Oxley preparations last year at my company and we implemented the CobiT framework in IT (Control Objectives for Information Technology) to prepare. One of the sections I had to deal with was "Ensuring Continuous Service". Our charter was based on the fact that we lost our data center building and had to quickly come up elsewhere  and how do we build preparations for dealing with this.

    I jested that I thought it was a silly exercise (in my company) since I'd likely be cutting my resume rather than putting this place together.

    Actually it was an interesting exercise, and given the scope (> 1000 Servers) it seemed somewhat silly. Everyone said to get a list of priorities from the business, but guess what? First they don't care. Each person's system (finance, HR, sales, etc) is the most important and rather than creating a big arguement, especially since most people in the business don't know about IT dependencies, I setup a list of servers I thought were important. DCs, LDAP, DNS servers first, mail, major business, etc. We continued on with the process, knowing that most of it would never be implemented because we'd never get to test it because of time and money.

    DR is a nice idea, but beyond covering your basic items, hardware failure, data entry issue, virus, minor, limited scope stuff, I'm not sure there is value in doing a lot. Many of our systems are running the enterprise, mail, sql, etc., but for many of them a few hours or downtime or even a day isn't threatening to the business. We don't implement clusters because of the cost, complexity and the bang for the buck. A major press release (like last summer when my company was acquired) can disrupt business for a few days and very little gets done while IT systems hum along quietly. The reverse wouldn't be the end of the world for many places.

    As for true disasters. Fire, bio, terrorism, etc. These are extraordinary events and complete planning for them (unless you're in a life endangering industry, hospital, power, etc.) isn't worth it to most businesses. That's why you buy insurance and you deal with them when you can. Clustering and a number of other solutions won't help if your data center goes up in a fire. In that case, you pick up the pieces, get your backup tapes from offsite (where they should be) and go forward.

  • So basically DR is ensuring that your CV is up-to-date and backed up ready to mail at the drop of a hat

  • I like the article because it gets me thinking.  We all have a 'well I will get to that' list and documentation and backup are usually #1 and 2.

    Our office just went through developing an IT DR plan.  It was good to talk to people and see how long they could survive without access to some database, but what it really brought to light was that people could go a couple of days without the data, but there were processes in place on the business side that had no DR plan and those would be catastrophic to the company.  Funny thing was IT spent the money to get a DR plan for our stuff but the business side didn't think the investment was worth it.  Just shows you that in business we always talk about "us and them"("Users and IT" or "IT and users", but when it gets to the crux of the issue, it should be "WE".

     

    Michelle

     

     



    Michelle

  • I think you've hit the nail on the head there Michelle, at present I am trying to persuade the "decision makers" in our buisness to make available more resources & or budget to review and complete our DR plans. 

    As a mainly Oracle based company the smaller SQL Server environment is seen to not be business critical, even though a number of the SQL systems are both business critical and feeder systems to the Oracle production servers.  When I approached the Oracle team about "there" DR plans, they replied with "we have all that, we created it all when we implemented", this was about 4 years ago (currently they are also half way through a massive upgrade project, which has no DA plans).  I have subsequently taken it on myself to begin developing a DA plan which I hope will become a continuous project with a rolling scope, i.e Phase 1 will be to develop the necessary DR plan(s) and required processes from a SQL Server standpoint, Phase 2 will incorporate .NET applications using the SQL Server db's, Phase 3 will include associated systems, i.e. Oracle, Reporting, etc.. Phase 4 will include the IT Infrastructure supporting all previous systems, then finally Phase 5 will incorporate any business processes/systems left out of the previous phases, by then it will most probably be necessary to begin at Phase 1 again and work through the DR plan(s) reviewing there correctness, scope and validity.  I feel this will be the best approach to getting the "buy in" from the business.  Any ideas/comments/thoughts on this proposed method of attack would be greatfully appreciated.

    Cheers,

    Lloyd

    p.s. Is anyone aware of some good resources, i.e. articles, books, papers related to disaster recovery?  I have done some limited research on the internet and have found some useful articles/documents but would appreciate any more resources.

  • Having your CV up to date should always be part of your DR strategy!!!! . Even if you recover things, it may not be quick enough or your boss may decide to throw you (beep beep) in front of the bus to allowing any downtime.

    I don't mean to say that you shouldn't consider disasters, but the percentage of time these will happen, the true loss in $$ of downtime, etc have to be weighed. For Amazon.com, a few minutes of downtime are some serious $$, but only on the Internet for their retail systems. I'd bet that a day of downtime in their HR system wouldn't truly affect the company. Heck, their email may be down now due to the current MyDoom worm for all we know.

    I think the two posts above are worth reading again. You really have to assess the true business impact. Don't let anyone tell you that a day of downtime is more serious than it is, look back, see when you've been down and the impacts and make a smart decision. For most DR plans in most companies, the cost of the solution isn't worth it.

    But be sure that you do test your cheap and easy solutions, test restores, be sure you know where media is, have copies of CD keys somewhere (preferebly offsite), have phone numbers handy, vendor numbers, support contracts, etc.

     

    As far as the Internet goes, I haven't seen much in the way of planning docuements. Like many things, companies seem to consider this proprietary (I guess it could be used for an attacK) and not enough is published. Maybe we can get James to start a discussions

  • Great feedback, everyone!   It's true, there are lots of things to consider (setting expectations is a good example), and then again considering everything isn't always the best approach.  I've seen companies go all out on a comprehensive DR/BC plan, spending hundreds of thousands of dollars to cover every angle -- even when the overall risk is fairly low. 

    I think Steve makes a great point -- having a DR plan that attempts to encompass every possibility just isn't practical for most companies.  The key is getting everyone involved, make sure everyone understands risks and the consequencies for any given scenario.

    I've got more coming on DR/BC and backup in general.  I'd really like to keep the discussion going!

  • I think one thing that has come out of this thread is the need to prioritize your DR plan.

    If your documentation is stored on an intranet then  a DR plan stored on that intranet that puts restoring the intranet on day 2 is a bit of a chocolate fire guard.

    The example of Amazon's e-mail going down illustrates a situation where there may be little impact on the customer (always a good thing).  In my case we lost e-mail for a day and this had a massive impact on our customers and we damn near lost a few.

    It is very much horses for courses.

  • The type of business obviously has a significant role .. in my case I work within the mortgage lending industry and not being there means no business, e.g. customers just go somewhere else. Our DR plan attempted to get systems back up and running in a new office and location within a few hours and accepting business. OK so it was never tested in anger, but the test was pretty nerve racking AND you need to test to discover the holes .. otherwise when you do need it, it doesn't work .. just like the best disaster movies.

    [font="Comic Sans MS"]The GrumpyOldDBA[/font]
    www.grumpyolddba.co.uk
    http://sqlblogcasts.com/blogs/grumpyolddba/

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply