On Learning from Experience and Misfortune

  • Comments posted to this topic are about the item On Learning from Experience and Misfortune

    Best wishes,
    Phil Factor

  • If you aren't learning from your mistakes, you're not learning. One morning on the way to work about 30 years ago I had an auto accident, and I should have gone home from there. Instead I went in to work. And blew out the payroll database. I learned that I should have had explicit database assignments to clear out the test database before copying the live payroll system over to perform quality checks before processing and exporting data.

    Lesson learned. I was able to recover quickly from backups and the downtime was minimal. And I fixed my testing methodologies.

    Lesson not learned: I was on vacation and left explicit instructions to call me with any problems as there was no backup DBA. A "Contractor" applied an update to the live payroll system that I was not aware that he was going to be doing, and he blew up the database. He spent over four hours undoing his change and recovering data. They did not call me. Had they called me, I could have talked them through doing a restore and had them back up in less than half an hour.

    I never did get to flog that contractor. *sigh*

    Everybody has done stupid mistakes, hopefully more in the early part of their career and with the number rapidly declining as time goes on. The post mortem is very important to figure out what to do to prevent a recurrence of the event.

    -----
    [font="Arial"]Knowledge is of two kinds. We know a subject ourselves or we know where we can find information upon it. --Samuel Johnson[/font]

  • Phil, thank you for your article. As a recent Management Information Systems student, we read case studies of IT mistakes and disasters in various organizations. It was helpful to learn from the mistakes of others as we saw their technical mistakes. It also made us more sober, knowing that anyone can make mistakes.

    Thank you for pointing out that in the disaster cases we hear of, there are often many things to point out that were done right. It is easy to just look at the negative.

    Why do you think we are so reticent about our disasters in IT? Embarrassment? Wanting to keep up an image that we don't make mistakes?

  • I don't write for Sqlservercentral as often as I like but I have started to write up the mistakes I made when carrying out the subject of the articles.

    There is an element of ego in writing and no one likes to dwell on their mistakes or even admit that they made any. My experience is that when I write an article admitting to some mistakes I get thanked publicly and privately.

  • No one wants to admit mistakes.

    I realized last night at bedtime that I had made a flaw in the logic of a SQL statement for an important project. But I also realized that I had full backups of said database, like a good accidental DBA should have.

    😀

  • You are so right. Paul Randall was sharing some DR experiences he has been involved with last year in one of 'SQL Skills' DR and HA classes and it was mind-boggling. One story was about a government agency that lost an estimated billion US dollars in tax revenue because a technician pulled the wrong SAN component during a routine upgrade. As I recall the story, they lost their backups (they had the production databases AND backups on the same SAN). No one lost their job nor did it make the news. I guess he's got all kinds of good stories like that one.

  • Much smaller scale than the US gov't losing a billion dollars, but I know someone working for a city whose IT department has the vendors do all the DBA tasks and maintenance rather than have a full time DBA. One such database was their red light camera ticketing system. The vendor wasn't monitoring DBCCs and didn't notice some corruption had crept in, meanwhile the SAN was developing problems and randomly dropped drives. One such drive was the database.

    Database blew out along with the backups being unrecoverable.

    The vendor is still getting paid. Interestingly, the city in question stopped their red light program at about the same time. I have no idea if it was coincidence.

    -----
    [font="Arial"]Knowledge is of two kinds. We know a subject ourselves or we know where we can find information upon it. --Samuel Johnson[/font]

  • Often issue management stops at the point of resolution. This misses the opportunity to analyse why was it unforeseen and what, if anything, can be put into place to stop re-occurrence.

    Gaz

    -- Stop your grinnin' and drop your linen...they're everywhere!!!

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply