Reverse Engineering Disasters

  • Comments posted to this topic are about the item Reverse Engineering Disasters

  • I don't normally comment on editorials, but with this I am compelled to. It's spot on.

    With of all the technical publications out there its a shock that a paragraph like this is not the first page in every book!

    A.W.

  • The only place where I have worked that did not keep backups off site used a 2 hour fire safe.

    The office was next door to the fire station.

    -------------------------------Posting Data Etiquette - Jeff Moden [/url]Smart way to ask a question
    There are naive questions, tedious questions, ill-phrased questions, questions put after inadequate self-criticism. But every question is a cry to understand (the world). There is no such thing as a dumb question. ― Carl Sagan
    I would never join a club that would allow me as a member - Groucho Marx

  • One standard way to prioritise defects is to use the following two factors and multiply them and deal with the high numbers:

    Likelihood - 1 (really unlikely) to 5 (almost certain to occur)

    Severity - 1 (cosmetic) to 5 (business critical)

    This means that each defect is rated 1 to 25. Obviously 25s HAVE to be resolved but anything 5 and under are usually optional. Different places I have seen this have often had thresholds set for both ensuring and deferring resolution. Some have had an additional rule that all severity 5s must also be fixed.

    Perhaps this should be, or even may be being, applied in Disaster Recovery planning.

    Gaz

    -- Stop your grinnin' and drop your linen...they're everywhere!!!

  • Spot on: "Murphy's Law" is in my experience very close to being a Law of Nature, when it comes to the area of computer technology.

    People who count on always traveling the "happy path" (nicely put!) are for sure in for an awakening!

  • hjp (8/28/2014)


    ...People who count on always traveling the "happy path" (nicely put!) are for sure in for an awakening!

    It is bad enough when they have this attitude in software development, system rollouts, etc. but it is inexcusable when planning for disaster recovery as it really is stated in the title.

    Gaz

    -- Stop your grinnin' and drop your linen...they're everywhere!!!

  • That was a good article. It lets me know we're doing it right, but I'm sure there's something we haven't thought of yet. 😉

  • Ed Wagner (8/28/2014)


    That was a good article. It lets me know we're doing it right, but I'm sure there's something we haven't thought of yet. 😉

    Isn't there always?

    Most DR plans I have seen also stipulate that "if A) and B) and C) and D) and E) and F) and G) occur simultaneously then forget it". There comes a time when the likelihood of all scenarios occurring simultaneously means that it probably is armageddon, ragnarok, rapture, etc. (pick your own ;-)) in which case there is little point worrying about a computer system based on a civilisation that has pretty much been wiped out.

    Gaz

    -- Stop your grinnin' and drop your linen...they're everywhere!!!

  • Truth Tables and Boolean Algebra expressions can be mathematically specified using variables that represent events, conditions, and characteristics of a system. Then by stepping through all of the mathematical permutations and combinations that can be logically specified (e.g. the sun comes up in the East and sets in the West is one that happens everyday) you could "engineer" an exhaustive pattern of disaster events that require a solution. You will (that's right I did not say "could" I said "will") reveal every possible disaster event, even ones that you might think are impossible (e.g. the sun comes up in the West and sets in the East). Use a random number generator to invoke varying sequences of events over time and you have a convenient way to repetitively walk-through scenarios that you can actually use to practice failures and recoveries. Having done that, sleep well knowing that your team is ready for anything that might come their way.

  • John Pluchino (8/28/2014)


    Truth Tables and Boolean Algebra expressions can be mathematically specified using variables that represent events, conditions, and characteristics of a system. Then by stepping through all of the mathematical permutations and combinations that can be logically specified (e.g. the sun comes up in the East and sets in the West is one that happens everyday) you could "engineer" an exhaustive pattern of disaster events that require a solution. You will (that's right I did not say "could" I said "will") reveal every possible disaster event, even ones that you might think are impossible (e.g. the sun comes up in the West and sets in the East). Use a random number generator to invoke varying sequences of events over time and you have a convenient way to repetitively walk-through scenarios that you can actually use to practice failures and recoveries. Having done that, sleep well knowing that your team is ready for anything that might come their way.

    Perfect except the following:

    could "engineer" an exhaustive pattern of disaster events

    excludes many, many scenarios.

    Also, the sun coming up from the West as opposed to the East is an effect of an undefined cause but I suspect that your tongue was firmly in your cheek 🙂

    Gaz

    -- Stop your grinnin' and drop your linen...they're everywhere!!!

  • Thanks Steve. Although we think we have things well under control at all times, time has tendency to erode even the best of plans. To avoid compounding a natural, criminal, or other disaster by not being ready or able to recover and thus introducing a much more severe disaster, is far more devastating than just an earthquake, fire or other loss.

    And the newer ways malware or poorly constructed systems can cause corruption or loss of data are more sophisticated then when our plans may have been made. We must create a plan and then update that plan on a regular basis with potentially new or advanced technologies and strategies. We cannot make a plan once and then think we are good for all time, for again time erodes the value of all plans, strategies, and schemes.

    Good that you remind us of the need to create a solid plan if we have none, and to readdress our plan if we do. You provide a wonderful service to those of us who are busy and could forget to go back and make certain our future is secure because our data is safe.

    🙂

    Not all gray hairs are Dinosaurs!

  • This might be a good time to start stealing concepts from neighboring disciplines. There are lots of templates and standards around capturing reliability analysis results and potential issues. From ishikawa ("fishbone") diagrams showing intefactions and potential operations for failure, to FMECA ( Failure Modes, Errors and Criticality Analysis), to even Sigma analysis (as in SixSigma), there are lots of great starter kits to work from. It's just kind of strange that IT all too often does not take the time to do such things.

    Simply taking the time to think through the issues is great, but capturing the results in an easily disseminated format will help a LOT. it's harder to "crucify" someone for a bad outcome if the potential for bad outcome was documented and lsigned off before the work started (or went into play).

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Miles Neale (8/28/2014)


    Thanks Steve. Although we think we have things well under control at all times, time has tendency to erode even the best of plans. To avoid compounding a natural, criminal, or other disaster by not being ready or able to recover and thus introducing a much more severe disaster, is far more devastating than just an earthquake, fire or other loss.

    And the newer ways malware or poorly constructed systems can cause corruption or loss of data are more sophisticated then when our plans may have been made. We must create a plan and then update that plan on a regular basis with potentially new or advanced technologies and strategies. We cannot make a plan once and then think we are good for all time, for again time erodes the value of all plans, strategies, and schemes.

    Good that you remind us of the need to create a solid plan if we have none, and to readdress our plan if we do. You provide a wonderful service to those of us who are busy and could forget to go back and make certain our future is secure because our data is safe.

    🙂

    You are welcome and thanks for the kind words.

  • The Happy Path is dead! Long live the Happy Path!

    Now, as soon as the tape backups are returned, we can get back to work.

    The more you are prepared, the less you need it.

  • Some good points you make there Steve, thanks!

    But now a days computers are such a robust stable platforms, with such a stable reliable storage media, used by very competent and highly educated users which never make mistakes, connected by utterly reliable connections, powered by a grid that never ever goes down, processing data that never has errors, driven by code which performs correctly and brilliantly all the time, guided by the never ever changing business requirements, funded by directors which are willing to open they'r pockets whenever it's needed, in an environment where deadlines are welcomed as an excuse for a celebration....

    What could ever make us stray off this happy path?

    😎

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply