Recent PostsRecent Posts Popular TopicsPopular Topics
 Home Search Members Calendar Who's On

 Reverse Engineering Disasters Rate Topic Display Mode Topic Options
Author
 Message
 Posted Thursday, August 28, 2014 8:24 AM
 SSCertifiable Group: General Forum Members Last Login: Today @ 1:04 PM Points: 5,412, Visits: 3,138
 John Pluchino (8/28/2014)Truth Tables and Boolean Algebra expressions can be mathematically specified using variables that represent events, conditions, and characteristics of a system. Then by stepping through all of the mathematical permutations and combinations that can be logically specified (e.g. the sun comes up in the East and sets in the West is one that happens everyday) you could "engineer" an exhaustive pattern of disaster events that require a solution. You will (that's right I did not say "could" I said "will") reveal every possible disaster event, even ones that you might think are impossible (e.g. the sun comes up in the West and sets in the East). Use a random number generator to invoke varying sequences of events over time and you have a convenient way to repetitively walk-through scenarios that you can actually use to practice failures and recoveries. Having done that, sleep well knowing that your team is ready for anything that might come their way.Perfect except the following:could "engineer" an exhaustive pattern of disaster events excludes many, many scenarios.Also, the sun coming up from the West as opposed to the East is an effect of an undefined cause but I suspect that your tongue was firmly in your cheek Gaz-- Stop your grinnin' and drop your linen...they're everywhere!!!
Post #1608303
 Posted Thursday, August 28, 2014 10:01 AM
 SSCrazy Group: General Forum Members Last Login: Today @ 12:00 PM Points: 2,332, Visits: 1,390
 Thanks Steve. Although we think we have things well under control at all times, time has tendency to erode even the best of plans. To avoid compounding a natural, criminal, or other disaster by not being ready or able to recover and thus introducing a much more severe disaster, is far more devastating than just an earthquake, fire or other loss. And the newer ways malware or poorly constructed systems can cause corruption or loss of data are more sophisticated then when our plans may have been made. We must create a plan and then update that plan on a regular basis with potentially new or advanced technologies and strategies. We cannot make a plan once and then think we are good for all time, for again time erodes the value of all plans, strategies, and schemes.Good that you remind us of the need to create a solid plan if we have none, and to readdress our plan if we do. You provide a wonderful service to those of us who are busy and could forget to go back and make certain our future is secure because our data is safe. Not all gray hairs are Dinosaurs!
Post #1608333
 Posted Thursday, August 28, 2014 10:04 AM
 SSCertifiable Group: General Forum Members Last Login: Today @ 3:13 PM Points: 7,156, Visits: 15,255
 This might be a good time to start stealing concepts from neighboring disciplines. There are lots of templates and standards around capturing reliability analysis results and potential issues. From ishikawa ("fishbone") diagrams showing intefactions and potential operations for failure, to FMECA ( Failure Modes, Errors and Criticality Analysis), to even Sigma analysis (as in SixSigma), there are lots of great starter kits to work from. It's just kind of strange that IT all too often does not take the time to do such things.Simply taking the time to think through the issues is great, but capturing the results in an easily disseminated format will help a LOT. it's harder to "crucify" someone for a bad outcome if the potential for bad outcome was documented and lsigned off before the work started (or went into play). ----------------------------------------------------------------------------------Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
Post #1608335
 Posted Thursday, August 28, 2014 10:14 AM
 SSC-Dedicated Group: Administrators Last Login: Today @ 12:38 AM Points: 33,267, Visits: 15,436
 Miles Neale (8/28/2014)Thanks Steve. Although we think we have things well under control at all times, time has tendency to erode even the best of plans. To avoid compounding a natural, criminal, or other disaster by not being ready or able to recover and thus introducing a much more severe disaster, is far more devastating than just an earthquake, fire or other loss. And the newer ways malware or poorly constructed systems can cause corruption or loss of data are more sophisticated then when our plans may have been made. We must create a plan and then update that plan on a regular basis with potentially new or advanced technologies and strategies. We cannot make a plan once and then think we are good for all time, for again time erodes the value of all plans, strategies, and schemes.Good that you remind us of the need to create a solid plan if we have none, and to readdress our plan if we do. You provide a wonderful service to those of us who are busy and could forget to go back and make certain our future is secure because our data is safe.You are welcome and thanks for the kind words. Follow me on Twitter: @way0utwestForum Etiquette: How to post data/code on a forum to get the best help
Post #1608339
 Posted Thursday, August 28, 2014 1:18 PM
 SSC-Enthusiastic Group: General Forum Members Last Login: Yesterday @ 1:01 PM Points: 181, Visits: 354
 The Happy Path is dead! Long live the Happy Path!Now, as soon as the tape backups are returned, we can get back to work. The more you are prepared, the less you need it.
Post #1608394
 Posted Thursday, August 28, 2014 1:45 PM
 SSCommitted Group: General Forum Members Last Login: Today @ 9:29 PM Points: 1,970, Visits: 5,122
 Some good points you make there Steve, thanks!But now a days computers are such a robust stable platforms, with such a stable reliable storage media, used by very competent and highly educated users which never make mistakes, connected by utterly reliable connections, powered by a grid that never ever goes down, processing data that never has errors, driven by code which performs correctly and brilliantly all the time, guided by the never ever changing business requirements, funded by directors which are willing to open they'r pockets whenever it's needed, in an environment where deadlines are welcomed as an excuse for a celebration.... What could ever make us stray off this happy path?
Post #1608404
 Posted Thursday, August 28, 2014 2:05 PM
 SSC-Enthusiastic Group: General Forum Members Last Login: Yesterday @ 1:01 PM Points: 181, Visits: 354
 Many years ago, I was in a meeting with a CIO and his AS/400 expert. The AS/400 guy went on how the 400 never goes down.....Two days later, the AS/400 crashed. On the SQL Server side, I had already setup fully automated backups with testing. The more you are prepared, the less you need it.
Post #1608413
 Posted Thursday, August 28, 2014 6:56 PM
 SSChasing Mays Group: General Forum Members Last Login: Today @ 4:19 PM Points: 632, Visits: 2,197
 Well when I walked into the server at my last company for the first time I looked around. Then I looked up. They had water sprinkler as a fire suppression system. I told them that it was a bad idea to mix servers and water. So about four months later they had the sprinklers removed. But the didn't replace the system with a chemical suppression system for six years. I was just stunned at the stupidity of management. ----------------Jim P.A little bit of this and a little byte of that can cause bloatware.
Post #1608461
 Posted Friday, August 29, 2014 4:58 PM
 SSCrazy Eights Group: General Forum Members Last Login: Today @ 8:47 AM Points: 8,832, Visits: 9,389
 I like the editorial, it's presenting a quick view of an important issue, but I don't like the referenced article "What would it take..." because it doesn't look at cost effectiveness. Nor does it look at how risk averse stake holders are or are not.One of the important things in designing most systems is cost effectiveness. If the value of my data is 5 million dollars per year I can't afford to spend a hundred thousand dollars per week on protecting it - the net value would then be less than 0 per year. So absolute values and costs have to be considered. But that doesn't go far enough, absolutes may not be adequate on their own.Often the aim is to maximise probable net value, so that expected net value has to be considered rather than worrying only about absolutes; so when I look at the value of the data and a loss scenario, I need to look at the probability of that scenario as well as the cost of potecting against it, and that is a hugely complex thing to do because loss scenarios may be mutually exclusive, or independent, or somewhere in between and that all affects the expected value of protecting against combinations of them. There's also a question of philosophy here; should one take only protective actions with positive expected value, or also individual actions with zero expected value, or even individual actions with small negative expected value? Is it enough to compute what sets of actions will deliver maximum expected net value over time (over how much time?), always assuming that that computation isn't so complex as to be impractical? Probably it isn't - how risk averse are the people who lose or gain by the decision? If they are very risk averse some disaster prevention actions with negative expected net worth will be acceptable; if they are very risk accepting they may want to omit some actions with zero probable net value (because omitting them increases ROCE and although it increases risk it doesn't change expected outcome) and perhaps even omit some actions with positive expected net value.Gary's comments above address this point to some extent - a disaster that is unlikely to happen and isn't at all severe comes out at 1 on his scale, so recovery wouldn't be implemented, which is probably reasonable behaviour. But it doesn't go far enough; back in October 1962 the we had a period when the reasonable estimate pf probability of losing every data centre and all staff outside of Polynesia and Central Africa was perhaps 0.4, which is maybe a score of 4 on his probability scale and losing all data completely must surely be severity 5, giving a score of 20 which is well within his "do this" zone so at that time everyone should have been building data centres in thos places and arranging for backup copies to be shipped there regularly along with training staff and basing them there; but the cost would have been enormous and the benefit tiny (because there would be no market left to do business in if the 0.4 probability event happened) so the expected net worth was zilch which wouldn't repay the cost, so there was no point in doing it. Maybe that example comes under a Ragnarok exception to Gary's scoring, butI think it's possible to construct less extreme examples that demonstrate the importance of computing net worth (aka cost effectiveness). Tom
Post #1608778
 Posted Saturday, August 30, 2014 1:24 AM
 SSCertifiable Group: General Forum Members Last Login: Today @ 1:04 PM Points: 5,412, Visits: 3,138
 Tom, there is nothing to stop a company documenting that they cannot find a cost effective solution and are prepared to take the risk. They just can't ignore it if they follow the process. It makes someone responsible which makes it more likely to be planned for. Gaz-- Stop your grinnin' and drop your linen...they're everywhere!!!
Post #1608810

 Permissions