Cascading Human Error

Question

Cascading Human Error

Steve Jones - SSC Editor

SSC Guru

Points: 734418
More actions
May 2, 2011 at 9:06 pm

#145266

Comments posted to this topic are about the item Cascading Human Error

Viewing 15 posts - 1 through 15 (of 22 total)

You must be logged in to reply to this topic. Login to reply

paul.knibbs SSCoach Points: 15320 More actions · Answer 1

It's not just Microsoft and Amazon you need to worry about with this sort of thing. I'm sure we've all heard of (or even been affected by) the situations where a virus scanner decides a critical system file is dodgy after an update and bricks the machine. I've personally been involved in a situation where a colleague inadvertently wired up a network so it was looping back on itself, and then wondered why a broadcast storm disabled the whole thing! Simple errors like this having major knock-on effects has been happening for a long time--the difference being, once all your stuff is in the "cloud" you have no direct control over getting it fixed; you just have to sit on your hands while management get increasingly irate and you have no answers for them. That's one reason (and not the only one) why I don't believe in the Cloud and hope it never takes off as a concept.

brdudley SSCrazy Points: 2320 More actions · Answer 2

Your post reminded me of a great quote:

The factory of the future will have only two employees, a man and a dog. The man will be there to feed the dog. The dog will be there to keep the man from touching the equipment.

Warren G. Bennis

The past 30 or 40 years of IT has been a cycle of centralzing, then distributing computing. We keep trying to balance costs and control against agility and responsiveness. And it crosses all aspects -- hardware, software, programming, and management.

As an industry, we probably need to spend more time considering risks and failure modes. We can't eliminate the human in the equation, so we need to do as much as possible to prevent errors, localize their damage, and have realistic contingencies for when they occur. Finally, when things really go wrong, there needs to be someone taking responsibility and communicating.

I think the cloud has a role, it just isn't clear what that role is. We'll have to endure some hardships during the experimentation.

bwillsie-842793 Ten Centuries Points: 1359 More actions · Answer 3

brdudley (5/3/2011)
The past 30 or 40 years of IT has been a cycle of centralzing, then distributing computing. We keep trying to balance costs and control against agility and responsiveness. And it crosses all aspects -- hardware, software, programming, and management.
As an industry, we probably need to spend more time considering risks and failure modes. We can't eliminate the human in the equation, so we need to do as much as possible to prevent errors, localize their damage, and have realistic contingencies for when they occur. Finally, when things really go wrong, there needs to be someone taking responsibility and communicating.

Agreed. The 1970's days of mainframe "IT Empire Building" have returned with a vengence in the form of server farms, virtual servers, software as a service, single vendor "solutions", and the "cloud" (Time sharing).

Microsoft is the present day equivalent of IBM in the 1960s & 70s.

Only the names have been changed to ensure profit margins...

jyates SSC Veteran Points: 213 More actions · Answer 4

Steve,

I think what you're really referring to is the risk involved during transitions in or changes to a highly-automated and well-functioning system. The risks are exaggerated when scaling is applied. In this sense, scaling effectively operates as the reverse of diversification.

trubolotta SSC Veteran Points: 219 More actions · Answer 5

Human error. Human malice. The vulnerabilities are the same. A sensible risk management strategem has always been not to place all of one's eggs in one basket, or in one nebula as the case may be.

Eric M Russell SSC Guru Points: 125518 More actions · Answer 6

It's the kind of error an operator could makes as a wrong choice on a menu or the entry of the name of the last network worked on instead of the one needed. In short, it was a human error that's all too likely to occur with anyone momentarily preoccupied with the price of mangoes or a flare up with a spouse.

The public perception of computer systems is that they are mindless robots who interpret commands from operators literally and then follow through with them relentlessly, even when the operator realizes he made a mistake. There is actually a lot of truth to that perception. Regarding human error when inputing parameters or choosing menu options for a critical operational process, this can easily happen when the process is performed manually using a tool like SSMS or through a command prompt.

However, there are ways to make the process more intelligent and fool resitant (not necessarily fool proof). For example if the operator clicks on a server group and chooses an option to start deploying a service pack, the process should query a table to determine if that service pack has already been applied to each server and when. If the query determines a duplicate or invalid set of parameters, then the process should then prompt the operator for comfirmation or even block the process from starting until an override is given.

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

chrisfradenburg SSCrazy Eights Points: 9592 More actions · Answer 7

What I find interesting is that everyone is jumping on the human error portion of this. Yes, it was an individual that started the tear. However, it was the system that grabbed the corners and started running. From what I gathered from the article, with just the human error there would have been a slowdown, it would have been caught and fixed, and that would have been it. It was the system aggressively trying to make sure it had backups and after it wasn't able to do that tried again...and again...and again. And that system activity is what was the direct cause of most of the outage. Without that the outage would have been far smaller.

The systems need to be designed in such a way that they can recognize that what they're trying isn't working and give up and leave it to the operators to sort out.

paul.knibbs SSCoach Points: 15320 More actions · Answer 8

The public perception of computer systems is that they are mindless robots who interpret commands from operators literally and then follow through with them relentlessly, even when the operator realizes he made a mistake. There is actually a lot of truth to that perception.

This goes back to what I was saying earlier about this being a very old problem indeed--there's a famous quote from Charles Babbage that describes this exact issue:

"On two occasions I have been asked, 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question."

Revenant SSC-Forever Points: 42467 More actions · Answer 9

paul.knibbs (5/3/2011)
The public perception of computer systems is that they are mindless robots who interpret commands from operators literally and then follow through with them relentlessly, even when the operator realizes he made a mistake. There is actually a lot of truth to that perception.
This goes back to what I was saying earlier about this being a very old problem indeed--there's a famous quote from Charles Babbage that describes this exact issue:
"On two occasions I have been asked, 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question."

IMO any good app has well thought-through rules for input validation and a log that allows rollback in case of a mistake. Expectations for these features are definitely much higher than they used to be, and continue to go up.

Eric M Russell SSC Guru Points: 125518 More actions · Answer 10

paul.knibbs (5/3/2011)
The public perception of computer systems is that they are mindless robots who interpret commands from operators literally and then follow through with them relentlessly, even when the operator realizes he made a mistake. There is actually a lot of truth to that perception.
This goes back to what I was saying earlier about this being a very old problem indeed--there's a famous quote from Charles Babbage that describes this exact issue:
"On two occasions I have been asked, 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question."

It is possible for a database application (not using AI but just basic SQL ranking, fuzzy indexes, and reference tables) to return the correct result for the following questions:

"Who is the president of England?"

"List all customers with the last name Edwerds"

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

Steve Jones - SSC Editor SSC Guru Points: 734418 More actions · Answer 11

Eric M Russell (5/3/2011)
It is possible for a database application (not using AI but just basic SQL ranking, fuzzy indexes, and reference tables) to return the correct result for the following questions:
"Who is the president of England?"
"List all customers with the last name Edwerds"

There was a feature called English Query that was supposed to do this, but it didn't work well.

SQLRNNR SSC Guru Points: 281334 More actions · Answer 12

I find it fascinating and telling at the same time. Human error is prone to happen. What is done to mitigate that chance of error though?

Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events

Eric M Russell SSC Guru Points: 125518 More actions · Answer 13

Steve Jones - SSC Editor (5/3/2011)
Eric M Russell (5/3/2011)
It is possible for a database application (not using AI but just basic SQL ranking, fuzzy indexes, and reference tables) to return the correct result for the following questions:
"Who is the president of England?"
"List all customers with the last name Edwerds"
There was a feature called English Query that was supposed to do this, but it didn't work well.

I never used it, but my impression was that English Query was just a tool that parsed engligh phrases into a valid SQL select statement.

Knowing that "president" ~= "prime minister" or soundex("Edwerds") = soundex("Edwards") requires cross-reference tables and function based indexes. Google's search engine does this routinely. It won't assume to know exactly what you're talking about but will instead rank what it considers possible matches. For example, if the user enters "Mary Edwerds" AND 36052, it will return "Mary Edwerds" first (if any), followed by "Mary Edwards" in 36052, followed by "Mary Edwards" not in 36052.

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

Revenant SSC-Forever Points: 42467 More actions · Answer 14

Eric M Russell (5/3/2011)
Steve Jones - SSC Editor (5/3/2011)
Eric M Russell (5/3/2011)
It is possible for a database application (not using AI but just basic SQL ranking, fuzzy indexes, and reference tables) to return the correct result for the following questions:
"Who is the president of England?"
"List all customers with the last name Edwerds"
There was a feature called English Query that was supposed to do this, but it didn't work well.
I never used it, but my impression was that English Query was just a tool that parsed engligh phrases into a valid SQL select statement.
Knowing that "president" ~= "prime minister" or soundex("Edwerds") = soundex("Edwards") requires cross-reference tables and function based indexes. Google's search engine does this routinely. It won't assume to know exactly what you're talking about but will instead rank what it considers possible matches. For example, if the user enters "Mary Edwerds" AND 36052, it will return "Mary Edwerds" first (if any), followed by "Mary Edwards" in 36052, followed by "Mary Edwards" not in 36052.

I am not sure about Google, but based on history of the previous searches, Bing would assume and offer 'president of [The Bank of] England'.