Uptime = Downtime? (Database Weekly, Aug 3, 2009)

Question

Uptime = Downtime? (Database Weekly, Aug 3, 2009)

Steve Jones - SSC Editor

SSC Guru

Points: 734418
More actions
August 1, 2009 at 11:09 am

#134552

Comments posted to this topic are about the item Uptime = Downtime? (Database Weekly, Aug 3, 2009)

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1003853 More actions · Answer 1

This is a good subject. The military realized the fallacy of "If it ain't broke, don't fix it." a long, long time ago. Daily, weekly, monthly, quarterly, yearly, and "as required" maintenance procedures for virtually every piece of equipment is available, scheduled, and required to be completed. It's why we win wars... it helps keep stuff from breaking when you can least afford it to.

It's too bad that most people don't consider the same with most software. Good DBA's do by checking their overnight "health check" runs and whether or not their permanent "low hanging fruit" profiler run for RPC Complete and SQL Batch Complete picked up anything out of the ordinary. Good GUI designers will also put parametrics into their system to measure and log such things as the time it took to render and present a page, etc.

Sadly, not enough people take the time to do all of that. All they really care about is "time to market" and "it's good enough for this release". Rarely do they care about future performance (including up time) and the like because they don't even care about it in the present. "Wrap it and ship it" has become the mantra for many. "Set it and forget it" is the normal follow up to that.

[font="Arial Black"]"If you want it real bad, you'll likely get it that way."[/font] -- Jeff Moden, circa 2003

😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

umailedit SSCrazy Points: 2087 More actions · Answer 2

" It's why we win wars... "

We do?

I always thought in software it is a golden rule not fallacy. 'if it ain't broke don't fix it'. Maybe it is a fallacy for hardware but not software.

Mike Brockington SSC Eights! Points: 874 More actions · Answer 3

The key thing is that the maintenance schedules that all modern armies use are based on solid statistical studies, and a good understanding of the design specifications.

In contrast, Formula 1 teams have discovered that they used to be doing things all wrong: it was standard practice to strip an engine down after every race to check for defects, but recent rule changes that meant an engine had to be able to last for more than one race, actually made them more reliable - those that had lasted one race distance had proved themselves to be free from major defect, and were better left alone. Those that were rebuilt had an increased risk of a fault being introduced.

Throw away your pocket calculators; visit www.calcResult.com

Steve Jones - SSC Editor SSC Guru Points: 734418 More actions · Answer 4

software shouldn't break down like hardware does, but it always has defects in it. If we continue to delay fixing software, often we are building on a poor foundation and we cause a similar issue with "large downtime", but not addressing issues we're aware of.

Mike Brockington SSC Eights! Points: 874 More actions · Answer 5

Steve Jones - Editor (8/3/2009)
software ... always has defects in it.

Indeed, but the point I was trying to make was that by that same token, your fixes will also have defects, and sometime it is 'Better the Devil you know'.

As a rule, I find that there always plenty of 'improvements' that I can make to the things I look after, without needing to deal with other peoples mistakes.

A further point would be that most of us have to abide by ITIL or similar, under which the kind of tinkering that you seem to be suggesting would be likely to get you fired.

Throw away your pocket calculators; visit www.calcResult.com

Rudyx - the Doctor SSC-Forever Points: 43695 More actions · Answer 6

It seems to be more of a case of avoiding 'planned downtime' in order to increase 'availability' while at the same time increasing the possibility and length of 'unplanned downtime'.

Check out the following:

http://en.wikipedia.org/wiki/High_availability

Lots of good links in the wiki ...

RegardsRudy KomacsarSenior Database Administrator"Ave Caesar! - Morituri te salutamus."

Jeff Moden SSC Guru Points: 1003853 More actions · Answer 7

umailedit (8/3/2009)
" It's why we win wars... "
We do?
I always thought in software it is a golden rule not fallacy. 'if it ain't broke don't fix it'. Maybe it is a fallacy for hardware but not software.

Heh... that's why I'm able to make a living. People think that working software won't break. Then, the data reaches a point that every computer has... it's called the "tipping point". That's where working software that has (supossedly) good performance suddenly has very bad performance. It also happens when folks can least afford for it to happen... month end runs, tax time, etc, etc.

Just like a truck, the bearings and suspension have to be able to "carry the load". In software, most people don't anticipate the load. The software runs just fine until one day, the load just gets too big and suddenly that proverbial truck breaks.

Part of software maintenance is to check the performance of the code now and again so that as data grows, you can see if the software is beginning to go non-linear in performance and a few other measurements. For example, if a screen was returning in a half second and it's suddenly returning on a consistent basis in a second, something with the data is likely the problem and it's warning you that you're likely to have a substantial performance problem in the future.

Bottom line is, working code can break... don't let it be a surprise when you can least afford it.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Mike Brockington SSC Eights! Points: 874 More actions · Answer 8

Heh... that's why I'm able to make a living.

And how many times to you get brought in to sort out a system that had been 'upgraded' or 'fixed' by someone who had just been let go as a result?

Anyway, in the situation that you describe, the software has not 'broken'. A fundamental flaw that should have been uncovered in testing has been found.

Throw away your pocket calculators; visit www.calcResult.com

TomThomson SSC Guru Points: 104773 More actions · Answer 9

umailedit (8/3/2009)
I always thought in software it is a golden rule not fallacy. 'if it ain't broke don't fix it'. Maybe it is a fallacy for hardware but not software.

Maybe you are right - but I never yet saw any non-trivial sofware that wasn't broke and stayed that way for a significant amount of time (it ended up broke either because it had real bugs or because it had security problems or because new requirements had come or because the users didn't like it or because teh requirement specification was broke in the first place). And don't forget that a security flaw for which there are no exploits may leave the software looking as if it ain't broke, but when someone produces an exploit it will suddenly become obviously very broke indeed, so it's best to fix before that happens - so you have down time to apply and check patches even when the system ain't broke.

Tom

TomThomson SSC Guru Points: 104773 More actions · Answer 10

mike brockington (8/5/2009)
Heh... that's why I'm able to make a living.
And how many times to you get brought in to sort out a system that had been 'upgraded' or 'fixed' by someone who had just been let go as a result?
Anyway, in the situation that you describe, the software has not 'broken'. A fundamental flaw that should have been uncovered in testing has been found.

Wearing my theoretician hat, I wish people wouldn't keep on bringing out that hoary old chestnut: if you've worked in the real world (or thought seriously about the the implications of the undecidability of the halting problem, or of Goedels incompleteness theorem) you will be aware that exhausitive testing is often either not technically feasible or not commercially viable.

And wearing my experienced engineer hat, I have exactly the same wich. I think that you ought to be aware that requirements statements for complex systems often are based on seriously incorrect assumptions - and although this can sometimes be spotted early (I've found solid reasons to shift data volume estimates by 3 decimal orders of magnitude when reviewing requirements before now) it can't always be spotted before the system is built and in operation - not least because if you are providing something very new you may find that demand grows either a lot faster or a lot slower than you expected, but there's no way to know until you have done it - so you end up either with a system that is broken because it is over-engineered and too expensive, or one which is broken because it is underengineered and can't cope with the demand.

Tom

Jeff Moden SSC Guru Points: 1003853 More actions · Answer 11

mike brockington (8/5/2009)
Heh... that's why I'm able to make a living.
And how many times to you get brought in to sort out a system that had been 'upgraded' or 'fixed' by someone who had just been let go as a result?

I'm not sure what the relevance of that question is, but the answer is (sadly) almost never. There are two reasons for that... 1) people don't fix what they think isn't broken because they don't know any better until it breaks (even QA testing doesn't catch it because they don't know how to test for scalability in many cases) and 2) no one is held accountable for code that breaks whether it be a real flaw or "just" a scalability flaw. They incorrectly call it "agile programming" or "continuous improvement" and so long as the development schedule was met, people will actually be rewarded for writing crap code instead of being fired.

Anyway, in the situation that you describe, the software has not 'broken'. A fundamental flaw that should have been uncovered in testing has been found.

BWAA_HAAA!!!! Sure... you've seen how people test even on this forum and other forums... 10 - 20 rows max. Even if they find such a flaw it will sometimes be deemed as an "acceptable risk to meet delivery schedule". The really spooky part is, some people think that's actually OK to do.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)