Many people will read the title above and think I'm crazy. You have to plan for disasters,right? You need detailed plans and procedures to ensure that you can put the infrastructure back together and get the business moving, right? 8 out of 10 businesses fail after a major disaster because of IT, right? Or at least some other very scary statistic like this.
While a lot of businesses may fail after a disaster, it's certainly not because they haven't done detailed practicing or their plan or even that they don't have a detailed recovery plan in place. It's more likely other issues and I'll lay out the argument next on why I don't think that you need a detailed recovery plan in place. First, I do think you should have something and I've written a few articles on the Incident Response - The Framework and Responding to an Incident.
Stop and think for a minute about what changes you've made to your infrastructure over the last two months. Patches, configuration changes, software added, new enhancements or bug fixes you've deployed, new servers. There's a huge list of things that you probably change every month. Now stop and take a minute and think of how many are documented. How many times when you changed something did you update a document somewhere the reflects the change. A diagram, notes, anything. Even a simple text or Excel based log?
Chances are relatively few updates have occurred. Now how many times did you think about it or do you suspect the person making the changes thought about documenting them? Probably very few. I've worked in large companies where it was policy that changes had to be documented and submitted to a Change Management group before the changes could be approved. Let me tell you, while more things were documented than in many places, lots slipped through the cracks. In fact, the best documentation I've ever seen was when I manager 4 people and could scream at them for any change not documented. Even better than when I was responsible for deployment and documentation :).
See people hate documenting everything. Not just developers, admins hate it worse since most of them can't type that well. Now extend that to any size of group and it gets more and more difficult to manage. Chance are very high that any plan you do develop will be out of date as soon as it gets published. What's more, it won't get updated regularly and will always be out of date. Therefore if you depend on that plan to get you through a disaster, somehow all your IT guys and girls are gone and you want to give this plan to a consultant to get you up and running, you'll surely fail. Possibly worse than if you just let someone run with no knowledge.
The other problem with having too detailed a plan is the distribution problem. Obviously, at least I hope it's obvious, you don't want to keep your DR plan on the network. At least not the only copy. You'll need at least one copy offsite, probably want a few of them, which means that any updates to the plan need to be distributed. The more items you include in a plan, the more likely that you'll leave some of them out. Especially if you are trying to keep up with the pace of change in any shop. It will far exceed your ability to keep up with it.
And once again, if you are planning on a comprehensive, complete plan, you'll be out of luck.
Having a detailed recovery plan, order of servers to recover, exact configuration, detailed plans for setting up an application, is a worthy goal, but one that's impossible to keep up to date. I'm not advocating ignoring all documentation, in fact the opposite, I'd advocate as much documentation as you can have. But for a disaster, have a general outline and depend on your people. Which means making sure they are dependable, which implies treating them as the valuable PEOPLE they are. Your employees, the ones that keep your company going, are not "resources", or "assets", or "knowledge workers". They're people and they have the ability and more importantly, the pride to get things back up and do a great job. Plan for the outline of what to do, expect things to go wrong, and trust your people to think on their feet.
Most of all, when things don't go well, don't blame them or get angry. Instead, repoint them in the direction of what you need and let them run with it.
Return to Steve Jones' home