Blog Post

Disaster Recovery Exposure. - Part One



I've just read Paul Randal's sqlmag article about the need to test your disaster recovery strategy. In that article Paul mentions a short presentation that I did on the Thursday night at the Feb 2011 Internals event in Dallas.

In that presentation I spoke about my own IT related experiences, during the 4th September 2010 7.1 magnitude earthquake in Christchurch, New Zealand. Specifically relating to

  • Your own personal preparedness for such an event.
  • What your own priorities are going to be.
  • How you are going to deal with the aftermath of such an event.

If you'd like to read about my personal experiences, then you can do so here.

I'll expand on those IT related themes a little here. Although I was not in Christchurch for the 22 February 2011 6.3 magnitude aftershock (yes, that's right, the geo-tech guys consider the event to be a aftershock from the big one in September of the previous year),  I do have the experience of my friends, family and work colleagues to draw on and I was in Christchurch for the September one and all the countless aftershocks that followed.

Firstly a little background, the Sept 2010 quake - although the strongest - stuck at 4:35 in the morning on a Saturday and fortunately no lives were lost. However, the later February quake was centered closer to the city, was a lot shallower and stuck at lunchtime on a Tuesday, sadly ending the lives of many people (160+ as of the date of this post.) The later quake was also massively more destructive and as well as also affecting structures and civil infrastructure, it also had a big impact on IT services and infrastructure - which is what the remainder of this post is about.

During the February quake, people evacuated their buildings immediately - leaving personal items such as cell phones, car keys, wallets and handbags on their desks. When the quake was over - if the building was still standing - they were not allowed back into their buildings.

This also meant that any IT equipment in those buildings could not be physically accessed either - in actual fact all electric had gone off and any emergency generators had kicked in. However power was to remain off for weeks in some cases, far too long for generators to maintain power.

In addition to that, a lot of server rooms were in disarray. SAN's had been smashed, server racks fallen over and air-conditioning units torn from their mountings.

Now, for some company's, the IT equipment that they had on site represented their entire business.

For others, Christchurch was their disaster recovery center (everybody had been expecting a big earthquake in Wellington for years).

And for yet others, certain IT services were run directly out of Christchurch.

But in the weeks after the February Earthquake, nobody was allowed back into the center of the city. So in effect an entire business center was lost. Obviously this affected small businesses hard - some relocated to other parts of the country, some simply worked from home and others simply went out of business - they had lost their IT systems, building(s), and in some cases staff.

This alone proves the need for at least one disaster recovery center.

But what about businesses that did have a disaster recovery center in Christchurch. That center was now gone. Any tape backups in that center were also gone (or at least unable to be accessed). This in effect leaves a business vulnerable as their backup location has effectively disappeared. You'll need to plan for this eventuality.

But then there are other businesses who not only used Christchurch as a DR center, but also ran IT services out of that center. Not only had they lost their DR center, but they had also lost their IT infrastructure that held those services. While they may have had backups, they now needed some extra hardware to put them on in a new center.

Perhaps, this proves the need for two disaster recovery centers - geographically separated, preferably on different tectonic plates.

However many DR centers you have, if you lose one then you're increased your vulnerability and risk to some degree. I guess the more mission critical your business is then the more disaster recovery scenarios that you and your team should have worked through.

And there's a few words that are probably over used in our business

  • Mission Critical - Well, unless somebody is gonna lose their life (or pretty close to it) if your system goes down, you're probably not in this category. If you insist on using this term, then perhaps you might want to qualify it - who is it “mission critical” to?
  • Disaster Recovery - Well, we all know how to restore systems - that's our job. But to most people disasters involve lose of life - or something that comes pretty close. So does a user deleting important data via an application that lets them do that put you in a disaster recovery situation. In my opinion - NO. Don't get me wrong here, you'll probably need to restore - but in reality you're just talking about data loss and not a disaster. Sure, you might have downtime. Mostly when people talk about DR, they just mean data loss - but I guess the term data loss doesn't sound quite the same and grab people's attention like the DR words.

Just have a think about how you'd respond to the following

  • Your own personal preparedness for a true disaster recovery situation. - I'm not talking about your manager standing there shouting that their company is losing X amount of dollars while your 30Gb database restores and you suddenly realize that you haven't got instant file initialization on. I'm talking about how you'll react if caught in a natural disaster and you don't know where the people nearest and dearest to you are. Will you really want to go to work? Even if you answer yes, will you be effective in any way.
  • What your own priorities are going to be - Even if you know that everybody is alright, it's lightly that people will be relying on you. Maybe you'll need to get clean water, maybe food. Perhaps,you'll need to make the house weather proof. maybe you'll just have to look after someone. How will these priorities play out against your boss asking you to come to work.
  • How you are going to deal with the aftermath of such an event - When the event is over, you might not be able to return to work. You might have other things to do. You might have no workplace to go to. You're business may have not planned for this and they are running around in disarray. The moral of the story is plan for the unexpected. If the worst happens, you'll need a plan. Just like if you are restoring your company's database, you'll lightly work from a checklist (you do have a restore checklist, don't you?)

The best way to get answers to these questions is to brainstorm with your team. Seek advice from people who have been through these events - they'll lightly be totally willing to share their experiences. We all have fire drills (of course we do!), so why not drills for natural disasters. It may save your business,

But more importantly, it may save lives.

This blog post is now getting a little long - so I'll continue discussing some of the issues that could affect you in Part II.

Take it easy.





You rated this post out of 5. Change rating




You rated this post out of 5. Change rating