Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 123»»»

Cascading Human Error Expand / Collapse
Author
Message
Posted Monday, May 2, 2011 9:06 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: Administrators
Last Login: Today @ 6:13 PM
Points: 33,198, Visits: 15,341
Comments posted to this topic are about the item Cascading Human Error






Follow me on Twitter: @way0utwest

Forum Etiquette: How to post data/code on a forum to get the best help
Post #1102057
Posted Tuesday, May 3, 2011 4:03 AM
SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Today @ 8:06 AM
Points: 1,631, Visits: 5,580
It's not just Microsoft and Amazon you need to worry about with this sort of thing. I'm sure we've all heard of (or even been affected by) the situations where a virus scanner decides a critical system file is dodgy after an update and bricks the machine. I've personally been involved in a situation where a colleague inadvertently wired up a network so it was looping back on itself, and then wondered why a broadcast storm disabled the whole thing! Simple errors like this having major knock-on effects has been happening for a long time--the difference being, once all your stuff is in the "cloud" you have no direct control over getting it fixed; you just have to sit on your hands while management get increasingly irate and you have no answers for them. That's one reason (and not the only one) why I don't believe in the Cloud and hope it never takes off as a concept.
Post #1102199
Posted Tuesday, May 3, 2011 5:05 AM
Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Yesterday @ 5:41 AM
Points: 1,382, Visits: 1,554
Your post reminded me of a great quote:


The factory of the future will have only two employees, a man and a dog. The man will be there to feed the dog. The dog will be there to keep the man from touching the equipment.
Warren G. Bennis



The past 30 or 40 years of IT has been a cycle of centralzing, then distributing computing. We keep trying to balance costs and control against agility and responsiveness. And it crosses all aspects -- hardware, software, programming, and management.

As an industry, we probably need to spend more time considering risks and failure modes. We can't eliminate the human in the equation, so we need to do as much as possible to prevent errors, localize their damage, and have realistic contingencies for when they occur. Finally, when things really go wrong, there needs to be someone taking responsibility and communicating.

I think the cloud has a role, it just isn't clear what that role is. We'll have to endure some hardships during the experimentation.
Post #1102219
Posted Tuesday, May 3, 2011 6:11 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Wednesday, July 17, 2013 12:22 PM
Points: 107, Visits: 290
brdudley (5/3/2011)
The past 30 or 40 years of IT has been a cycle of centralzing, then distributing computing. We keep trying to balance costs and control against agility and responsiveness. And it crosses all aspects -- hardware, software, programming, and management.

As an industry, we probably need to spend more time considering risks and failure modes. We can't eliminate the human in the equation, so we need to do as much as possible to prevent errors, localize their damage, and have realistic contingencies for when they occur. Finally, when things really go wrong, there needs to be someone taking responsibility and communicating.



Agreed. The 1970's days of mainframe "IT Empire Building" have returned with a vengence in the form of server farms, virtual servers, software as a service, single vendor "solutions", and the "cloud" (Time sharing).

Microsoft is the present day equivalent of IBM in the 1960s & 70s.

Only the names have been changed to ensure profit margins...
Post #1102253
Posted Tuesday, May 3, 2011 7:14 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Tuesday, December 11, 2012 9:54 AM
Points: 8, Visits: 183
Steve,
I think what you're really referring to is the risk involved during transitions in or changes to a highly-automated and well-functioning system. The risks are exaggerated when scaling is applied. In this sense, scaling effectively operates as the reverse of diversification.
Post #1102311
Posted Tuesday, May 3, 2011 7:58 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Tuesday, April 24, 2012 5:33 AM
Points: 15, Visits: 120
Human error. Human malice. The vulnerabilities are the same. A sensible risk management strategem has always been not to place all of one's eggs in one basket, or in one nebula as the case may be.
Post #1102360
Posted Tuesday, May 3, 2011 8:09 AM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Today @ 1:47 PM
Points: 1,651, Visits: 4,707
It's the kind of error an operator could makes as a wrong choice on a menu or the entry of the name of the last network worked on instead of the one needed. In short, it was a human error that's all too likely to occur with anyone momentarily preoccupied with the price of mangoes or a flare up with a spouse.

The public perception of computer systems is that they are mindless robots who interpret commands from operators literally and then follow through with them relentlessly, even when the operator realizes he made a mistake. There is actually a lot of truth to that perception. Regarding human error when inputing parameters or choosing menu options for a critical operational process, this can easily happen when the process is performed manually using a tool like SSMS or through a command prompt.
However, there are ways to make the process more intelligent and fool resitant (not necessarily fool proof). For example if the operator clicks on a server group and chooses an option to start deploying a service pack, the process should query a table to determine if that service pack has already been applied to each server and when. If the query determines a duplicate or invalid set of parameters, then the process should then prompt the operator for comfirmation or even block the process from starting until an override is given.
Post #1102373
Posted Tuesday, May 3, 2011 8:39 AM
SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Monday, August 4, 2014 8:10 AM
Points: 1,635, Visits: 1,972
What I find interesting is that everyone is jumping on the human error portion of this. Yes, it was an individual that started the tear. However, it was the system that grabbed the corners and started running. From what I gathered from the article, with just the human error there would have been a slowdown, it would have been caught and fixed, and that would have been it. It was the system aggressively trying to make sure it had backups and after it wasn't able to do that tried again...and again...and again. And that system activity is what was the direct cause of most of the outage. Without that the outage would have been far smaller.

The systems need to be designed in such a way that they can recognize that what they're trying isn't working and give up and leave it to the operators to sort out.
Post #1102396
Posted Tuesday, May 3, 2011 8:47 AM
SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Today @ 8:06 AM
Points: 1,631, Visits: 5,580

The public perception of computer systems is that they are mindless robots who interpret commands from operators literally and then follow through with them relentlessly, even when the operator realizes he made a mistake. There is actually a lot of truth to that perception.


This goes back to what I was saying earlier about this being a very old problem indeed--there's a famous quote from Charles Babbage that describes this exact issue:

"On two occasions I have been asked, 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question."
Post #1102406
Posted Tuesday, May 3, 2011 9:06 AM


SSCarpal Tunnel

SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

Group: General Forum Members
Last Login: Friday, August 22, 2014 8:50 AM
Points: 4,425, Visits: 3,417
paul.knibbs (5/3/2011)

The public perception of computer systems is that they are mindless robots who interpret commands from operators literally and then follow through with them relentlessly, even when the operator realizes he made a mistake. There is actually a lot of truth to that perception.


This goes back to what I was saying earlier about this being a very old problem indeed--there's a famous quote from Charles Babbage that describes this exact issue:

"On two occasions I have been asked, 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question."

IMO any good app has well thought-through rules for input validation and a log that allows rollback in case of a mistake. Expectations for these features are definitely much higher than they used to be, and continue to go up.
Post #1102428
« Prev Topic | Next Topic »

Add to briefcase 123»»»

Permissions Expand / Collapse