I was reading a blog post by Steve Jones on Monday and I started to write a comment on his blog, as it turned out it that comment turned into this post. Now this week is SQL Server disaster recovery week over on sqlservercentral.com so I thought I’d get involved with a post of my own. Steve’s post was titled ‘Grace Under Pressure’ and Steve talked about a power failure in a server room at a large company he used to work at. Steve was in the data centre at the time of the power outage, along with a Senior Executive, the Senior Executive didn’t seem to handle the situation all that well and definitely didn’t make it an environment whereby the system administrators and DBAs could go about their business, and let’s face it, a quite stressful business of recovering production servers that have unexpectedly lost power.
No doubt the root cause analysis of the issue revealed that the power outage was a direct result of the Senior Executive being in the data centre :-)
Joking aside I have experienced first-hand those types of disasters that Steve talked about in his post
Power Outages to server rooms
Server room flooding
SAN disk issues that caused a filer crash and took several servers with it
I have been very fortunate in my career, many years ago, my first full time DBA gig was a great job and I got to work with a lot of great people. The company was good enough to fund training courses and more importantly gave me a great deal of experience working with SQL Server and also the opportunity to learn from a great team of people. I got a taste of a first real disaster during my time here, I experienced lost power to a server room and it caused chaos.
Some of the simple things can help in this situation and because of the training and guidance I had received from some great people that used to work there when disaster did strike I was prepared with the following:
I was also fortunate to have a really great manager who ‘kept the wolves from the door’ dealing with the business dealing with high ranking people keeping them informed but more importantly keeping them off the back of the people trying to fix things. He knew they were stressed as the business was suffering, he also knew that if they interfered it would likely to take much longer to fix. It’s like a vicious circle, things take longer to fix, senior people get more irate and round and round the circle goes.
I visited New York last week, more about that in another post, and it was apparent that the hurricane/storm that hit a few weeks back was still having a severe impact on homes and business in NYC. There are still building closed, some are still flooded and being pumped out. The phones lines are still available for some; you have to pay cash in some bars and restaurants. No doubt the storm is still costing the NYC economy.
IT disasters can happen to anyone, to handle them with grace you need to be practiced, prepared, have a plan and work as a team with no infighting, finger pointing or assigning blame to fix the issue as quickly and efficiently as possible.