SQL Clone
SQLServerCentral is supported by Redgate
Log in  ::  Register  ::  Not logged in

You Need Two SLAs for Disaster Recovery

Continuing on with my MCM prep, I was listening to the High Availability/DR prep module today and I was once again surprised by something. Typically I have heard all kinds of talk for SLAs, usually in terms of network traffic. For databases, I have had SLA conversations that were for downtime that usually go like this.

Me: How much uptime do we need?

Manager: 100%

Me: We can’t really do that in a cost effective manner.

Manager: Why not, the telephone companies are always up?

Me: Well, even the telcos measure their uptime in terms of 9s?

Manager: (blank look)

Me: They talk about 99% reliability, 99.9%, 99.2%, each of those being a “9″ of availability. The high water mark seems to be companies aiming for five nines or 99.999%.

Usually at this point I need to write this down so they can understand why five nines are 99.999 and not 99.99999

Manager: Let’s go for five nines.

Me: That’s only 5 minutes of downtime a year. We can’t apply patches in 5 minutes. A better number is usually 99.9 for us, which means across the entire year we get a 8 hours of downtime. That’s a good number to aim for across a year.

Manager: We can’t be down for 8 hours!

Me: (blank look)

At this point I usually give up and go in search of someone that will better understand things.

However an SLA for downtime/uptime isn’t enough for SQL Server. You also have to think in terms of data loss. If we lose a server, what about transactions in flight? What about things not transferred to the mirror server or log shipped server? What about losing disks and no tail of the log backup?

An SLA for data loss is important as well. And like the conversation above, your business people will say zero data loss. Quiz them to find out what can be recovered and to what extent lost data costs the company. The compute the cost of your various HA solutions to decide how to handle things.

Filed under: Blog Tagged: disaster recovery, high availability, sql server, syndicated

The Voice of the DBA

Steve Jones is the editor of SQLServerCentral.com and visits a wide variety of data related topics in his daily editorial. Steve has spent years working as a DBA and general purpose Windows administrator, primarily working with SQL Server since it was ported from Sybase in 1990. You can follow Steve on Twitter at twitter.com/way0utwest


Posted by robertmcook on 19 January 2011

You make a great point about the overlooked difference between Availability and Recovery (RTO/RPO).  Another overlooked point about Availablity is the difference between Scheduled and Unscheduled downtime.  If there is an agreed to Scheduled downtime window then taking the service offline within that window should not count against your 9's.

Posted by Steve Jones on 20 January 2011

I should have mentioned scheduled downtime. Often that's still counted against my uptime as I should be trying to minimize that as well, but that's a good debate to have.

Posted by SQL Noob on 24 January 2011

i don't even try to argue this anymore. if PHB wants 5 9's then just get a quote from the vendor with all the equipment required and show it to PHB.

most times PHB will seem to have a stroke and ask how much it is for less 9's. when they find out that every 9 taken away is a huge savings they will usually settle for 99.9%

Leave a Comment

Please register or log in to leave a comment.