Continuing on with my MCM prep, I was listening to the High Availability/DR prep module today and I was once again surprised by something. Typically I have heard all kinds of talk for SLAs, usually in terms of network traffic. For databases, I have had SLA conversations that were for downtime that usually go like this.
Me: How much uptime do we need?
Manager: 100%
Me: We can’t really do that in a cost effective manner.
Manager: Why not, the telephone companies are always up?
Me: Well, even the telcos measure their uptime in terms of 9s?
Manager: (blank look)
Me: They talk about 99% reliability, 99.9%, 99.2%, each of those being a “9″ of availability. The high water mark seems to be companies aiming for five nines or 99.999%.
Usually at this point I need to write this down so they can understand why five nines are 99.999 and not 99.99999
Manager: Let’s go for five nines.
Me: That’s only 5 minutes of downtime a year. We can’t apply patches in 5 minutes. A better number is usually 99.9 for us, which means across the entire year we get a 8 hours of downtime. That’s a good number to aim for across a year.
Manager: We can’t be down for 8 hours!
Me: (blank look)
At this point I usually give up and go in search of someone that will better understand things.
However an SLA for downtime/uptime isn’t enough for SQL Server. You also have to think in terms of data loss. If we lose a server, what about transactions in flight? What about things not transferred to the mirror server or log shipped server? What about losing disks and no tail of the log backup?
An SLA for data loss is important as well. And like the conversation above, your business people will say zero data loss. Quiz them to find out what can be recovered and to what extent lost data costs the company. The compute the cost of your various HA solutions to decide how to handle things.
Filed under: Blog Tagged: disaster recovery, high availability, sql server, syndicated



Subscribe to this blog
Briefcase
Print
Posted by robertmcook on 19 January 2011
You make a great point about the overlooked difference between Availability and Recovery (RTO/RPO). Another overlooked point about Availablity is the difference between Scheduled and Unscheduled downtime. If there is an agreed to Scheduled downtime window then taking the service offline within that window should not count against your 9's.
Posted by Steve Jones on 20 January 2011
I should have mentioned scheduled downtime. Often that's still counted against my uptime as I should be trying to minimize that as well, but that's a good debate to have.
Posted by SQL Noob on 24 January 2011
i don't even try to argue this anymore. if PHB wants 5 9's then just get a quote from the vendor with all the equipment required and show it to PHB.
most times PHB will seem to have a stroke and ask how much it is for less 9's. when they find out that every 9 taken away is a huge savings they will usually settle for 99.9%