A Double Dose of Trouble

  • Comments posted to this topic are about the item A Double Dose of Trouble

  • I'm doing a major implementation of cross-platform changes this evening - tempting fate?

  • Let's hope not.

  • For goodness sake it is just a date in the Gregorian. It is a different date in the Hindu, Chinese and many other calendrers! But then two planes crashed on Friday 13th 1972 and the Costa Concordia capsized five years ago today. Also Halloween (31) is 13 reversed, and they are both primes. Best touch some wood just in case ...

  • So, if a major hack attack, or an odd unexplained IT disaster, does happen to occur on Friday 13, then it may not be purely a coincidence, for the same reason that it wasn't a coincidence that airplanes crashed into the World Trade Center on September 11 (911). Here in the US, #911 is the number to call for reporting crime in progress or emergencies, so the date of the planned terrorist attack was intended to be ironic. Back in the '90s some computer viruses were programmed to sit in the background and drop their payload on Friday 13.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Hi all,
    My worst story as a dba on double dose of trouble is this:
    The day began as usual: Check the backups, check the space of the SAN, check the open issues on the ERP, etc Since there is no month close there were just a few additional work on my pile. And then just happened: the big troubles have the bad habit to appear just like a minor error. A user report that they can´t access to the ERP. I believe that is just another forgotten password  or a blocked account case. I realized that the things are going bad when I was unable to access to the DB. First dose: The database is not on. Hummm, rare but not a cause of panic. In that company we had a datacenter offsite where it mantains all the hardware. This includes: upgrades, maintenance, physical security, etc. Since I have no any report of a hardware trouble I assumed that It was just a "issue" on the db. Perhaps caused by too many open cursors. Since the last time that I have rebooted the DB and APP servers, was a year ago, I follow the standard procedure to do this.
    After a few minutes to complete the task I checked the health of de DB and all was ok, the APP was working fine and I regain all the control and I believe that all will be ok. Bad asumption. The same error appaears one hour later. And then started to worry. Afther check the log issues in red hat, there was a spooky error: a hard drive malfunction. After checked the drive and not throws any error I started to believe that maybe it was caused by a missing upgrade or a patch or something like that. Second dose: To short the history, I have spend the whole day and a part of the other, just to figure out that there were a trouble with the optic fiber from the san to the hub. Since the server have a dual fiber channel and a redundant fiber attached, it was necesary to shut down the channel with trouble in order to the redundant link comes up.
    As you mention on the post, there was a chain of mistakes. My datacenter never get an alert from the infrastructure, (at least, they claim that), and I failed to prove the redundancy of the fiber channel. There were some other mistakes but I believe this two resumes the big ones.
    Lesson learned: Murphy, allways Murphy.
    Miguel.

  • It pays to be working for the government when you screw up (it's just 'nobody's fault'):

    http://www.denverpost.com/2017/06/12/gold-king-mine-inspector-general-clears-epa/

    Horrendous environmental disaster (would have landed private individuals in jail)

    ...

    -- FORTRAN manual for Xerox Computers --

  • Hi,
    On my previous post, I forgot to add: the DB server that I was telling the story, was an Oracle Server. And I might to add, I´m a truly believer in Oracle. Now I´m doing my firsts steps in SQL, and I´m still learning, so, please be patient with me.

    Miguel.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply