Document recovery procedure

  • Admingod

    SSCertifiable

    Points: 5891

    Trying to document a process in an event of disaster and different reasons. In case if the database repair attempt is unsuccessful.Do you think listing the steps of recovery process does not make any sense, I figured that if they weren't smart enough to understand the recovery of database procedures then they had no business on my servers? Do you agree? Any comments?

  • Mr. Brian Gale

    SSC-Insane

    Points: 23075

    My approach is I want the document to be idiot proof and to cover the exact steps to recover from a disaster.

    If I get woke up at 2:00 AM, I am groggy and a lot more prone to errors than at 2:00 PM when I'm on my 3rd cup of coffee.  At 2:00 AM it is much more likely I will make a mistake.  Therefore having a document I can follow will make it much nicer.

    On the other hand, capturing every possible disaster and recovery may be overkill; it depends on your environment.  Like if my test/dev environment died, it would be annoying but not a "wake me up at 2:00 AM' type of problem.  And it depends on what the disaster is.  Did the database die?  Did the instance die?  Did the VM die?  Did the VM host die?  Did the SAN die? Did the server room die?  etc.  I can document what I am responsible for so I am starting my document at the "restoring Instance from backup" step. and once the instance is up, the document should explain how to bring databases back  up.  Then if a disaster hits, if the instances are still good, but the database is toast, I can start at the database level.  If the VM died, I need IT to bring the VM back from backup (or worst case - from scratch) and once the VM is ready, I can go into the Instance recovery steps and Database recovery steps as needed.

    Plus I have a "post disaster checklist" where I make a new full backup of anything that went down and run checkdb on every instance that was down.

    My secondary goal is if I am on vacation and unreachable and a disaster hits, I want any other DBA to be able to grab the document and figure things out.  I'm not going to be putting absolutely everything in there (turn on laptop, start VPN, RDP to work desktop, start SSMS, etc... those steps should be expected), but anything that I feel is a step to recovery.  During a disaster, I don't want the stress of my boss standing over my shoulder watching me work and struggle to remember things as I am waking up.

  • Jeff Moden

    SSC Guru

    Points: 996843

    Admingod wrote:

    Trying to document a process in an event of disaster and different reasons. In case if the database repair attempt is unsuccessful.Do you think listing the steps of recovery process does not make any sense, I figured that if they weren't smart enough to understand the recovery of database procedures then they had no business on my servers? Do you agree? Any comments?

    If you're trying to document the process, why would you even consider taking a shortcut such as you have mentioned?  If your servers crash hard enough to require DR, then you will need it for yourself as Brian Gale mentioned.

    Also, the purpose of DR documentation is so that someone who doesn't actually know the process can do it because you might be dead.

    So, my answer is... NO!  I don't agree with what you said. 😉

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".
    "If "pre-optimization" is the root of all evil, then what does the resulting no optimization lead to?"

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Grant Fritchey

    SSC Guru

    Points: 396618

    Piling on.

    Nope. Bad approach.

    Good approach: Assume you're not there. Provide as much detail as you can so a competent, but possibly ignorant, individual could get the job done. Go further. Script as much as the recovery process as you realistically can. Provide that script and documentation about it. How to run it, of course. More importantly, when, why and what, it does.

    Do all that.

    Then you're still not done. Practice. See if you can recover the database(s) and server(s). Does everything work? If not, don't assume knowledge and, "Oh, it'll be fine." Fix it so it does work. Now, get someone else to follow your documentation. Did they successful recover? If not, where did it fail? Fix that. Now get a different person entirely to follow the fix. Iterate until you've got a good set of documentation and a good set of recovery scripts.

    This really comes down to, are you dealing with information that will keep you employed and a roof over your family's head? Then why  on earth would you go, eh, it'll be fine.

    ----------------------------------------------------
    The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood...
    Theodore Roosevelt

    The Scary DBA
    Author of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd Edition
    Product Evangelist for Red Gate Software

Viewing 4 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply