Some of my greatest mistakes

  • Comments posted to this topic are about the item Some of my greatest mistakes

  • Great article, my experience is you learn for more and far more quickly in a crisis than at any other time. Amazing how stress and the constant management badgering lead to an upping of the game.

    I've made so many different mistakes over the years, some of them real clangers that I can't really pinpoint a greatest. I'm sure there will be others to come yet maybe one of those will get the title.

  • One of my devs had exactly this problem a few weeks back. My advice was:
    * always alias tables
    * always prefix column name references with the table alias
    * it's a lot clearer to join tables using a JOIN clause;
    * don't use "where <id> in ( <subselect> )" unless there's a compelling reason to do so.

  • My biggest mistake was 10 years ago when I emailed an employee about 46,000 alerts due to an infinite loop.


    -- infinite loop example
    DECLARE @t table (tkey int not null);
    INSERT into @t values (1),(2),(3);

    DECLARE @i int;

    SELECT @i = tkey FROM @t;

    WHILE @i IS NOT NULL
    BEGIN
        PRINT @i;

        DELETE FROM @t where tkey = @i;

        -- @i is not set if there are no rows in @t
        SELECT @i = tkey from @t;

        -- that was an infinite loop, should have used the following line instead
        -- SET @i = (SELECT TOP(1) tkey from @t);
    END

  • Another good practice is to wrap any data affecting statements in a transaction but comment out the COMMIT TRAN. You can see how many rows are affected and issue a rollback if the number looks wrong.

  • Thanks for sharing.  I always enjoy learning from these types of experiences.  One thing I always do no matter how trivial of a script I run against a production database is to wrap the entire script in a transaction as follows:

    BEGIN TRAN
    DELETE FROM table WHERE id = 5
    ROLLBACK 

    This allows me to see how many rows are affected while performing a safe operation.  If I see "1,000,000 rows affected." when I'm expecting 1 row, then obviously something is wrong and I probably just blew out my log file.  If it does work as expected, then I simply swap out the ROLLBACK for a COMMIT or exclude the BEGIN TRAN and ROLLBACK lines from the batch which will auto-commit.  This has saved me from many unexpected and potentially epic fails.

    Please keep in mind that this won't completely ROLLBACK all changes in all scenarios but in general, it's a good practice.

  • I love pieces like this.  I think DBAs need to know that experts make big mistakes too.  We have a webshow, DBAs@Midnight, and we used to go to PASS and interview some of the biggest names.  One of the questions we always asked everyone was "What's the biggest mistake you've ever made in production?".  We wanted to regular DBAs to hear that even the biggest of the big have made mistakes.  BTW, my favorite answer came from Itzik.
    These pieces are always a delight.  If you pin me down sometime at a conf, I'll tell you how I brought down a cluster for 80 sites.

    Watch my free SQL Server Tutorials at:
    http://MidnightDBA.com
    Blog Author of:
    DBA Rant – http://www.MidnightDBA.com/DBARant

    Minion Maintenance is FREE:

  • "One of the questions we always asked everyone was "What's the biggest mistake you've ever made in production?""

    >>>>  This is my favourite question to ask interviewees.  I mention an example mistake from my past as part of the question.
    Denial that you have made any mistakes in production most likely means you: 1) have extremely limited production experience,
    2) have very small task lists with no deadlines, [this is a variation on #1]    or 3) are completely unacquainted with truth and cannot face reality. 
    How we take ownership of and respond to our mistakes is a test of character as well as technical skill. 
    Ideally, we admit our mistake(s), have some documentation of what we did, and have some idea on how to repair the damage.
    Depending on the situation, we might need to create a backup of the disaster, before attempting corrective action.
    Also, depending on the impact and scope of the mistake, communication with multiple levels of management and the business might be required.
    At a minimum, tell your manager and your immediate team.  Hopefully your company culture has the wherewithal to effectively deal with mistakes.
    (Firing is not effective.  Mistakes, by definition, are not done with malice aforethought.)
    For  extremely small mistakes, you might be able to fix it before the whole company knows about it, but no guarantee.
    The ripple effect can be amplified into a  tsunami.

    Proceed with caution!!!  Assumptions are required, but one or more of them are what led to the mistake in the first place. 
    Verbalize, document and acknowledge your assumptions.
    You might find that you're assuming your last step was the only mistake, when the larger mistake occurred three or nine steps earlier.
    Stop.  Assess.  Communicate.  Discuss.  Create a restore point if possible.  Assess.  Take corrective action.
    If this post sounds like I have made mistakes, then it accurately reflects reality.  I have had a little success in not repeating some of the more painful mistakes, though.

    The humility of admitting it is possible for us to make mistakes, regardless of our experience and knowledge, is a virtue to be practised every day.

  • john.wb - Thursday, May 18, 2017 6:44 AM

    Another good practice is to wrap any data affecting statements in a transaction but comment out the COMMIT TRAN. You can see how many rows are affected and issue a rollback if the number looks wrong.

    Not sure I agree that this is a good practice.  
    This was the "good practice" at a previous place I worked.  This caused a significant number of deadlocks while the users looked over the results of the query before doing the commit.

    Michael L John
    If you assassinate a DBA, would you pull a trigger?
    To properly post on a forum:
    http://www.sqlservercentral.com/articles/61537/

  • Michael L John - Thursday, May 18, 2017 12:09 PM

    john.wb - Thursday, May 18, 2017 6:44 AM

    Another good practice is to wrap any data affecting statements in a transaction but comment out the COMMIT TRAN. You can see how many rows are affected and issue a rollback if the number looks wrong.

    Not sure I agree that this is a good practice.  
    This was the "good practice" at a previous place I worked.  This caused a significant number of deadlocks while the users looked over the results of the query before doing the commit.

    Ad hoc queries on a live server aren't an amazingly good idea, especially untested ones.
    Better would be to test your script on the dev instance first with an approach of "do it", "check it", "roll back". Then check again with rollback on the live instance just in case something has changed, then run on live with a commit when you're absolutely happy.

  • david.wright-948385 - Thursday, May 18, 2017 1:05 PM

    Michael L John - Thursday, May 18, 2017 12:09 PM

    john.wb - Thursday, May 18, 2017 6:44 AM

    Another good practice is to wrap any data affecting statements in a transaction but comment out the COMMIT TRAN. You can see how many rows are affected and issue a rollback if the number looks wrong.

    Not sure I agree that this is a good practice.  
    This was the "good practice" at a previous place I worked.  This caused a significant number of deadlocks while the users looked over the results of the query before doing the commit.

    Ad hoc queries on a live server aren't an amazingly good idea, especially untested ones.
    Better would be to test your script on the dev instance first with an approach of "do it", "check it", "roll back". Then check again with rollback on the live instance just in case something has changed, then run on live with a commit when you're absolutely happy.

    Sure, in a perfect world. 
    These were one-off queries that corrected data in production caused by bugs in the software.  Which is an entirely too big of a subject to cover in this thread!

    Michael L John
    If you assassinate a DBA, would you pull a trigger?
    To properly post on a forum:
    http://www.sqlservercentral.com/articles/61537/

  • Michael L John - Thursday, May 18, 2017 12:09 PM

    john.wb - Thursday, May 18, 2017 6:44 AM

    Another good practice is to wrap any data affecting statements in a transaction but comment out the COMMIT TRAN. You can see how many rows are affected and issue a rollback if the number looks wrong.

    Not sure I agree that this is a good practice.  
    This was the "good practice" at a previous place I worked.  This caused a significant number of deadlocks while the users looked over the results of the query before doing the commit.

    Not a fan of this idea either.

    I stopped a nightly batch because I left a transaction open. After that, I always ensure that I leave no transaction open when I stop working.
    A better option is to use the transaction with a ROLLBACK and once the  process looks correct, change the ROLLBACK for a COMMIT and run the process again.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Nelson Petersen - Thursday, May 18, 2017 9:39 AM

    How we take ownership of and respond to our mistakes is a test of character as well as technical skill.  

    I agree and it is also a salutary lesson in how the chain of command deal with your mistakes.
    I worked on a system where downtime cost huge sums of money every minute.  The deployment to production was going to be a big bang and had to take place at 2am.  I had a one hour drive to get to work to do the deployment.
    The general rule was that if you did an early morning deployment you'd get breakfast and be allowed home after 8 hours.  In this case, due to the complexity of the system I stayed until 3pm before driving home.
    What I didn't see as I left the carpark was the lead service DBA chasing after my car to stop me leaving.  The system had gone wrong and the indications were that it was a DB problem.
    Knowing that I lived an hour away they carried on with their diagnostics before calling me at home.  Unfortunately they were a bit early and got my autistic son on the phone.  By they I mean the lead DBA, his boss and several managers up the chain.  The conversation went something like this
    "Hi, I'm xxxx I work with your Dad, could I speak to him please"?
    "No.  He's at work"
    "Could you ask him to ring me when he gets back"?

    15 minutes later I've walked through the door and the phone goes and my son picks it up.
    "Hi, its xxxx again, is your Dad home yet"?
    "Yes but I'm in the middle of a death match.  Go away"

    The phone goes again and once more my son picks it up
    "Hi, its xxxx again, It's very important that I speak to your father"?
    "Well you can't, he's on the bog"

    All this takes place on the office speaker phone in front of a substantial portion of the chain of command all passing bricks over significant downtime.
    Eventually they get hold of me, I dial on, fix the problem, the system is confirmed as up and running and I slink off to bed.

    The next day an early meeting is called for absolutely everyone involved in the deployment and management of the project.  Everyone look harried and as the meeting progresses it gets nasty, really nasty.  Of course I know why it went wrong, it's my fault and I know that no matter what it'll all come down to me.  My heart is beating so hard that I can barely hear through the pounding in my ears and I have that horrible metallic taste in my mouth that fear summons.  So I interrupt the bosses bosses boss and say "I can cut this all short. It's my fault, this is what I did, why it went wrong and what I did to fix it".
    The room went deathly quiet although there are people looking like they've been given an escape route from hell.  The bosses bosses boss says "David, stay behind, the rest of you leave..NOW".  My boss asks if he should say and is told "Yes, under the circumstances you should hear this".
    I cannot afford to lose this job, I've got 3 kids and my wife isn't going back to work until the youngest starts school.  I know what is coming.....only what unfolds is slightly different.
    The bosses bosses boss says "Thank you for owning up so quickly, that is one line of enquiry that has been addressed and closed.  However, I feel you were rather premature and incorrect in assuming responsibility for the failure.  Yours was not the only mistake and neither was it the worst or most significant.  You can leave this office with your head held high, learn from the mistake and don't make it again.  There are a lot of hard questions that will be answered but they are not yours to answer".
    I had a lot of time for that particular senior manager.  He expected mistakes but he expected people to own their mistake and deal with it honestly.  He had no time for the slopey shoulders brigade.

  • Nice one David.

    These issues come up time and again: the need for an environment that support honesty, while still maintaining accountability. The need to identify and learn from mistakes, while at the same time maintaining staff's confidence in themselves and the cohesion and productivity of the team. It's a difficult balance to maintain, but it's a key part of what a manager brings to the table, especially in a technical environment.

  • In 1994 on my first consulting gig at a small Manhattan bank running a quad core Intel cube with SCO Unix and Informix I brought my ksh preferences file to the bank on a floppy disk. On my first Friday at 5:15 PM I went to the server room and found a sliding spring fed door covering the 3.5 floppy disk slot on the cube but the door kept closing shut after I opened it so I couldn't insert my floppy. Luckily there was a lock button with an illustration that looked just like the spring controlled door to keep it from slamming shut - if you pushed the button. I pushed the button. It was the power switch. The drives and fans spun down with the sound of doom. I knew immediately what I'd done. I ran SCO at home. I waited 30 seconds and pushed the power again. I walked to the far side of the floor in the Citibank building where the two best developers were and while confessing to the lead what I had done his second was slapping his 19 inch monitor with his HummingBird X/Term client frozen with code open in VI. We were the only 3 people in the bank still trying to work with the system. Everything came back fine. We still laugh about it now. I wasn't laughing then. Trial by fire.

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply