Document recovery procedure

  • Trying to document a process in an event of disaster and different reasons. In case if the database repair attempt is unsuccessful.Do you think listing the steps of recovery process does not make any sense, I figured that if they weren't smart enough to understand the recovery of database procedures then they had no business on my servers? Do you agree? Any comments?

  • My approach is I want the document to be idiot proof and to cover the exact steps to recover from a disaster.

    If I get woke up at 2:00 AM, I am groggy and a lot more prone to errors than at 2:00 PM when I'm on my 3rd cup of coffee.  At 2:00 AM it is much more likely I will make a mistake.  Therefore having a document I can follow will make it much nicer.

    On the other hand, capturing every possible disaster and recovery may be overkill; it depends on your environment.  Like if my test/dev environment died, it would be annoying but not a "wake me up at 2:00 AM' type of problem.  And it depends on what the disaster is.  Did the database die?  Did the instance die?  Did the VM die?  Did the VM host die?  Did the SAN die? Did the server room die?  etc.  I can document what I am responsible for so I am starting my document at the "restoring Instance from backup" step. and once the instance is up, the document should explain how to bring databases back  up.  Then if a disaster hits, if the instances are still good, but the database is toast, I can start at the database level.  If the VM died, I need IT to bring the VM back from backup (or worst case - from scratch) and once the VM is ready, I can go into the Instance recovery steps and Database recovery steps as needed.

    Plus I have a "post disaster checklist" where I make a new full backup of anything that went down and run checkdb on every instance that was down.

    My secondary goal is if I am on vacation and unreachable and a disaster hits, I want any other DBA to be able to grab the document and figure things out.  I'm not going to be putting absolutely everything in there (turn on laptop, start VPN, RDP to work desktop, start SSMS, etc... those steps should be expected), but anything that I feel is a step to recovery.  During a disaster, I don't want the stress of my boss standing over my shoulder watching me work and struggle to remember things as I am waking up.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • Admingod wrote:

    Trying to document a process in an event of disaster and different reasons. In case if the database repair attempt is unsuccessful.Do you think listing the steps of recovery process does not make any sense, I figured that if they weren't smart enough to understand the recovery of database procedures then they had no business on my servers? Do you agree? Any comments?

    If you're trying to document the process, why would you even consider taking a shortcut such as you have mentioned?  If your servers crash hard enough to require DR, then you will need it for yourself as Brian Gale mentioned.

    Also, the purpose of DR documentation is so that someone who doesn't actually know the process can do it because you might be dead.

    So, my answer is... NO!  I don't agree with what you said. 😉

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Piling on.

    Nope. Bad approach.

    Good approach: Assume you're not there. Provide as much detail as you can so a competent, but possibly ignorant, individual could get the job done. Go further. Script as much as the recovery process as you realistically can. Provide that script and documentation about it. How to run it, of course. More importantly, when, why and what, it does.

    Do all that.

    Then you're still not done. Practice. See if you can recover the database(s) and server(s). Does everything work? If not, don't assume knowledge and, "Oh, it'll be fine." Fix it so it does work. Now, get someone else to follow your documentation. Did they successful recover? If not, where did it fail? Fix that. Now get a different person entirely to follow the fix. Iterate until you've got a good set of documentation and a good set of recovery scripts.

    This really comes down to, are you dealing with information that will keep you employed and a roof over your family's head? Then why  on earth would you go, eh, it'll be fine.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply