• Google has the right idea. Failure WILL happen and the issue is to design around it. Their approach is resilience

    Unfortunately this reality has not fully found its way into the IT world. Failure should be considered an unfortunate byproduct of system operation and handled accordingly.

    ...

    -- FORTRAN manual for Xerox Computers --