• mike brockington (4/2/2009)


    that is why training and familiarization, disaster drills, etc. come in handy

    Thinking about it further, I think you have stumbled on the biggest difference - both Fire and Ambulance services are primarily there for emergency situations, prevention is largely a different department and/or an activity carried out during quiet periods.

    For most IT people, the reverse is true: our performance is generally assessed on how _little_ time we spend on emergency care, as prevention is preferred.

    This leads into the classic dilemma over training: should we put time and effort into learning diagnostic techniques, and purchasing diagnostic tools, or should we concentrate on ensuring that disasters never happen? While many large organisations will be prepared to have a dedicated disaster team, is Google's approach of massive redundancy not an even better idea?

    Exactly, that's where I ended up in that long-winded stream of conscious reply to you 🙂 I think Google has it right in ways and that covers a hardware or even a majority of software errors but there are still troubleshooting scenarios that come up. I think in IT we do need more training (not to the extent of the fire service, that would be a wasteful use of IT budgets), more tool familiarity and we should look at redundancies more. Google still has their failures though, the massive redundancy is great until you introduce a glitch across a massively redundant array of machines (yes I am very much oversimplifying that).

    Anyway I think your response and this conversation brings up some interesting points.

    __________________________________________________

    Mike Walsh
    SQL Server DBA
    Blog - www.straightpathsql.com/blog |Twitter