One of the problems I observe, when doing a 'dry run' of a Disaster Recovery plan, is that people's hearts just aren't in it. It is like listening to the radio and being interrupted by the EBS (Emergency Broadcast System), intoning its mechanical drone "…had this been an actual emergency…you would have been instructed on where to duck and cover." When an emergency is real there is excitement, fear, and focus. When there is a simulated emergency, as with a disaster recovery test, there is warm pizza.
Most people tend to believe that the likelihood of disaster striking is remote, as did I at one time. However, living through a true emergency with no gas, electricity, ice, cash, water, cell phone, or beer for about 3 weeks, gave me a different perspective on disasters. I am speaking of Hurricane Ivan, which in 2004 scored a direct hit on my city, Pensacola, FL. The company I worked for at the time did not have a co-located area in a far-distant land, unaffected by 120 mile per hour winds.
In hopeless despair, we tracked the progress of the storm on the Weather Channel, as it bore down on the 13 unprepared souls that comprised our work force, as we desperately tried to tie up the loose ends of our ragged disaster recovery plan. Like many companies, we were reliant on uninterrupted power and network service. Neither lasted more than 5 hours, when the winds began to tear us asunder. We were down for nearly 2 full days and found more small holes in our DR plan than I care to recount. However, we survived. We made post-emergency plans. We invested in new technologies offering faster recovery times. We acquired a temporary co-location. We paid lots of money.
However, throwing hardware, tools and money at the problem is only part of the answer. The secret of a solid DR plan lies in thorough and sustained testing and tweaking. Your reaction to a real disaster happens in hours but building a sound disaster recovery plan takes months if not years. And when you do test your plan, you need to find some way to get into the "duck and cover" mindset.
Five years after Hurricane Ivan, and in a new company, no further hurricanes have struck with such devastation. I believe I have lost touch with the reactionary nature of a true disaster. I have become lazy and DR tests have taken a back seat to other daily concerns. When we do conduct the DR tests, the goal is to make sure everything works, not so much adhere to a service level agreement on how long it should take to get an application up and running. We go through the motions and are comforted by the fact that any failures that befall the exercise will be documented and rectified.
And then what will happen when another disaster really does strike? Will the automated scripts to restore all user databases really work flawlessly as predicted? Will there be enough time to copy hundreds of Gigabytes of backup files as the storm creeps closer and closer? Will virtual machines hold the load for production SQL Servers, assuming we had to transfer all processing from physical systems to virtual offsite? Is this the apocalypse? If so, why am I here and not driving about in the last of the V8’s searching for fuel and vengeance?
I believe that DR tests need to come with sirens. Before leaving for the DR testing facility, the siren should sound and it should be decreed that that this is not a test…this is an actual emergency. Duck and cover! Winds are blowing! If, after the test, one key system has not been restored, and is therefore not operational, because someone forgot to bring the installation media or an untested restore script fails and causes several more hours of manual intervention, then the consequences should be severe. That person responsible should have to drink warm beer and eat cold pizza in a dark room, writing detailed instructions by a faltering flashlight on what will not be forgotten on the next test.
Rodney Landrum (Guest Editor)