Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

Is 0% Downtime Possible?

By Steve Jones,

Is 0% Downtime Possible?

Introduction

The short answer is "no".

And there isn't really a long answer. So is this article over? Well, I suppose I should explain why this isn't possible, so let's begin by looking at other systems.

Who has 0% downtime?

No one. At least over time. My desktop has 0% downtime over the last 5 days, but 5 days isn't very long. The night light in my 3 year old's room has had 0% downtime for a couple months (it burns 24x7). However, 60 days isn't very long either.

What about outside my world? Cars? Well, since my Trooper is in the shop, it's got downtime. Plus, I don't drive it 24x7. Boats? Go in the shop regularly. Many can float 24x7 for years, but they aren't being used, plus the idea is to keep working in the event of some disaster. Lots of boats aren't very fault tolerant of storms.

What about critical systems? Like those in a nuclear power plant. Well, I worked in one for almost 3 years and they too have downtime. Partly planned, occasionally unplanned, but it still exists. And nearly every system in the plant has triple redundancy. It's an amazing place, but things still go wrong.

Medical systems. Not those either. There are failures in all types of equipment, which is why a hospital usually has spare systems for cardiac care, etc. They still experience failures, though I'd bet that thanks to the tremendous efforts of our doctors and nurses, most patients survive.

The group most often looked at as providing 0% downtime is the telcos or phone carriers. People expect that these systems will be always available. Why look at them? Well, they provide a service to a great many people across a large physical distance. Not many other systems do this. Do the telco's provide 0% downtime?

Ahhh...............................no.

Not a resounding NO because they do a pretty good job. Actually an excellent job, considering they use so much analog equipment. In fact, many network equipment companies were shooting for telco levels of high availability. But the fact remains that they cannot duplicate all single points of failure, though they catch most of them, and can get to minutes of downtime a year, year after year, for 99+% of their clients.

Even the electrical utility companies, who use a grid of wires so no one company provides all electricity for an area cannot help their downtime. Who can predict when some drunk will slam into a transformer or pole carrying wires? Not much different than your junior admin or CTO tripping over a power cord.

Why can't you do it on a database

There are two types of downtime, planned and unplanned. Of course, sometimes a planned action results in an unplanned downtime (like an SP installation), but both types exist in every application.

So if both types exist, then you cannot build a 0% downtime solution, right? Right. You can't. Not it's hard, not it takes lots of $$, you can't.

OK, I know lots of you will disagree. However I've tried and searched for answers and, while I'm sure I haven't seen/read/tried everything, I have seen lots. Let's go through some scenarios where you can try and achieve 0% downtime.

Well, a database must run on a computer, so you first want to ensure that your machine cannot fail. That starts with power. Well, power supplies fail (both the internal ones in the box and the ones at the other end of the outlet in the wall). You can buy machines with multiple power supplies (my main db server has 3 and only needs 2), even hot swappable ones. This solves the first issue. You can also use a generator/UPS combination. This will in all likelihood handle all your power issues with the exception of 1. Some knucklehead trips a power cord.

Don't laugh, I've been that knucklehead before.

OK, let's assume that you seal and bind the power cords so I can't trip over them. Power problem solved.

What about your disks? After all the data has to be somewhere. Well, most people use RAID. Works fine. Can be expensive, but it works. However, your disk system may need to be accessed by more than one machine (clustering is coming), so you need multiple connections. The big EMC arrays will handle all this on a network, but then you also have to be aware of the power and network connections between the machines. This is a place of potential failure (after all as you add more components and complexity the chance of failure increases), but these can be solved with redundant components.

OK, you're mostly covered. Now, what about the Operating System. Well, we here in SQL Server Central land tend to use Windows because, well, mainly because Microsoft hasn't ported SQL Server to any other system (yet). OK, we've all heard about issues with Windows. They exist, but I definitely have Windows server machines that have run for months without a reboot. Haven't said years, but mainly because of planned downtime, but occasionally unplanned.

OK, if the OS is unstable, then you can always cluster, right? Build an active/passive cluster (there's quite a few

Despite the reliability of my machines (meaning Windows NT/2000 and SQL Server), I have had issues. There have been times where seemingly stable code (I think all code is "seemingly stable"), the server has flipped out to the point where SQL Server is eating 99% of the CPU and work isn't being done. At least not new work. I assume that SQL Server is still processing some query that it was asked to by someone, but from the point of view of our Operations group, it ain't working.

Which really brings me to the second to the last place where 0% downtime is compromised. Application software. In all my years in this business, after meeting and working lots of programmers in many different fields, I have NEVER, NEVER, NEVER, NEVER, NEVER, EVER worked with anyone who produced bug free code.

Even after testing.

It's not most people's fault, but this industry is still unable to build reliable tools that can exhaustively test and verify an applications stability. Individual modules work well, but at some point, every complex application becomes too large to completely test, shortcuts are made, and errors occur.

And these errors result in downtime.

So what's the last thing to prevent 0% downtime?

You.

And me, of course. And every other human who works with computers. We make mistakes. No matter how many protocols, procedures, rules, double checks, etc., we will make mistakes. And we will cause downtime. Whether this is bad programming, a mistake in change control, incomplete testing, or, the biggest problem I see, user's demanding enhancements.

Even if you built a bullet-proof system, all the hardware protected from failure, people will want changes. Which will require human intervention. Which will require more testing. Which will require you to upgrade the code. And that upgrade, no matter how automated, will result in downtime. Now the post I read only had to worry about 12 hours of uptime a day. I'd still argue that no application that undergoes development and is upgraded yearly, will achieve 100% uptime over more than two years.

However, two years may be enough.

Conclusions

Now I know this isn't complete. I forgot to mention the lovely Service Packs that MS releases, some of which have not had the smoothest installation, especially in clustered environments. But I hope I'll get a good debate going and some other ideas that will allow me to update this article.

Most of this is based on personal experience, but I also looked around the Internet for this article and found some resources, but it took some time. Lots of nonsense includes the "downtime" word. Anyway, here are a few links:

  • META Report: Planned or Unplanned, It's All Downtime - Discussion about how to minimize downtime.
  • Case Study from Veritas - Interesting case study from Veritas about Umbro.com. They talk quite a bit about available every minute of every day, and 365 by 24x7, but then mention the comprehensive suite of products "minimizes both planned and unplanned downtime"
  • Beyond.com - From Google, this report on Beyond.com says they have the lowest downtime statistics. And they run at 99.9%!!!!!!
  • Utility Reliability Metrics - A look at some things that cause outages at utilities.
  • Canadian Lottery Setup - Another interesting note where the 3rd paragraph says "0% downtime" and the 4th says "nearly instantaneous. It's close to 0%, but nearly instantaneous ain't 0.
  • Hidden Cost of Downtime - Interesting. The first paragraph mentions most industrial assets run at 85-95% uptime.

I know there will be someone who disagrees, but there is no way to get 0% downtime. However, I'm sure I forgot some things, so please feel free to comment and let me know.

As always I welcome feedback on this article using the "Your Opinion" button below. Please also rate this article.

Steve Jones
©dkRanch.net February 2002
Return to Steve Jones Home

 

Total article views: 5956 | Views in the last 30 days: 0
 
Related Articles
ARTICLE

Downtime

How does someone handle downtime in a small business? Steve Jones has some thoughts today based on a...

ARTICLE

Downtime

Most of us are working to prevent downtime in our systems. However Netflix thinks a little forced do...

ARTICLE

Use Backup/Restore to Minimize Upgrade Downtimes

Learn how to minimize downtime while moving databases using Backup/Restore in SQL Server.

ARTICLE

Downtime

One thing most DBAs try to avoid whenever possible is unexpected downtime. It still happens, and we ...

FORUM

Downtime

Comments posted to this topic are about the item [B]Downtime[/B] A broken URL has no answers today :...

Tags
 
Contribute

Join the most active online SQL Server Community

SQL knowledge, delivered daily, free:

Email address:  

You make SSC a better place

As a member of SQLServerCentral, you get free access to loads of fresh content: thousands of articles and SQL scripts, a library of free eBooks, a weekly database news roundup, a great Q & A platform… And it’s our huge, buzzing community of SQL Server Professionals that makes it such a success.

Join us!

Steve Jones
Editor, SQLServerCentral.com

Already a member? Jump in:

Email address:   Password:   Remember me: Forgotten your password?
Steve Jones