Continuing on with my MCM prep, I was listening to the High Availability/DR prep module today and I was once again surprised by something. Typically I have heard all kinds of talk for SLAs, usually in terms of network traffic. For databases, I have had SLA conversations that were for downtime that usually go like this.
Me: How much uptime do we need?
Me: We can’t really do that in a cost effective manner.
Manager: Why not, the telephone companies are always up?
Me: Well, even the telcos measure their uptime in terms of 9s?
Manager: (blank look)
Me: They talk about 99% reliability, 99.9%, 99.2%, each of those being a “9″ of availability. The high water mark seems to be companies aiming for five nines or 99.999%.
Usually at this point I need to write this down so they can understand why five nines are 99.999 and not 99.99999
Manager: Let’s go for five nines.
Me: That’s only 5 minutes of downtime a year. We can’t apply patches in 5 minutes. A better number is usually 99.9 for us, which means across the entire year we get a 8 hours of downtime. That’s a good number to aim for across a year.
Manager: We can’t be down for 8 hours!
Me: (blank look)
At this point I usually give up and go in search of someone that will better understand things.
However an SLA for downtime/uptime isn’t enough for SQL Server. You also have to think in terms of data loss. If we lose a server, what about transactions in flight? What about things not transferred to the mirror server or log shipped server? What about losing disks and no tail of the log backup?
An SLA for data loss is important as well. And like the conversation above, your business people will say zero data loss. Quiz them to find out what can be recovered and to what extent lost data costs the company. The compute the cost of your various HA solutions to decide how to handle things.