Planning for Failure

  • Comments posted to this topic are about the item Planning for Failure

  • Well said! Don't let anyone pressure you into coming up with a fast solution, no matter who it is.

  • Also remember that database backups are not just for production servers, it should also go for Development or QA environments as well. Even if the data itself is replaceable, the potential for an unexpected disaster is much higher. Without a recent backup to restore from, it could take days to stand up a development environment, even if you have all your scripts in source control.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • When it doubt about a system level change, make a drive image first. I've done this many times, and every once in awhile it's a major lifesaver (we applied the patch, and it won't boot anymore). This gets more true the more archaic your equipment and OS gets, such as those finicky systems in the corner left over from over a decade ago that everyone's afraid to touch... but that have to be touched for a (security) upgrade or whatnot.

    Acronis True Image, Clonezilla, Norton Ghost, pick your poison; but until you know it works, try a restore to a near-identical drive/RAID.

    If you're really paranoid, drive image it, pull out the drives, put them on a shelf, put in new drives, restore the image, and _then_ do your upgrade; you know your image works (you just restored it), and you know you have the original drives sitting on a shelf, ready to be popped back in, as well.

    Regardless, restore your backups regularly; if you support PITR, _test it_. Maybe someone slipped in a job that truncates the transaction log and nobody's noticed yet.

  • I haven't often imaged the system, usually because in the event of a move to a new system, I'm dealing with new hardware, or I might be.

    This is one place where virtual systems have a nice advantage.

  • If I have a doubt, I move slower, not faster. I become more conservative, and if I am unsure about the outcome of an action, I've learned that time spent making another backup at that time is time well spent.

    More sound advice. Move too quickly and you may end up costing yourself a lot more time.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • The best way to prepare for the unplannable is to educate yourself and practice high-stress things.

    Volunteer to help out in real disasters, like Haiti. After you've had to deal with that kind of pressure, you'll be better able to remain calm, cool and collected while dealing with something as trivial as a crashed server. If the biggest emergency you've ever dealt with is a pop quiz in school, a crashed server seems big and intimidating. If the biggest emergency you've dealt with involves stopping an arterial bleed, the server issue isn't intimidating any more.

    No matter how much you've planned or prepared, the ability to deal with stress and pressure will make a gigantic difference.

    You also, of course, need to know what you're doing. Don't just memorize a few commands, or have a library of scripts that you don't understand but know how to plug values into the variables. Understand how the systems work. Know the theory. And then practice using it. You should do timed tests for how long it takes you to get a server back up and running after a system crash. Turn a server off, then use a stopwatch to time from pressing the start button to having it fully operational with point-in-time restores done. Do that five times. If the fifth isn't faster than the first, figure out what you could do to make it faster, and do it another five times.

    Preparation for disaster isn't just about making sure the servers and databases are prepared. It's also, and perhaps most importantly, about making sure YOU are prepared.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply