Extensive blocking from implicit transactions

  • Hello folks,

    We had an issue with blocking last week - LOTS of blocking - which led to SQL Running out of worker threads and me getting a phone call in the middle of the night. The problem is, I've been asked to supply the root cause for the blocking.

    I captured all the transactions running at the time with sp_blitzwho. This showed multiple head blockers. They were all delete statements against the same small table with wait type: LCK_M_U.  Earlier in the evening there were deadlocks involving hundreds of these queries (and threadpool). There doesn't appear to be a connection with the spid victims of the deadlock and the blocked transactions that I can see.

    All the blockers, all the transactions are implicit transactions which I know are prone to causing blocks. This is compounded by the fact that the database has Read Committed Snapshot set to False.

    Is there a way to find out the root cause of the blocking retrospectively? My employer is looking for the smoking gun but I'm not sure what to tell them.

    Thanks,

    Pete

     

  • Thanks for posting your issue and hopefully someone will answer soon.

    This is an automated bump to increase visibility of your question.

  • As you point out, implicit conversions on table columns (example, table column is VARCHAR(), lookup or join value is NVARCHAR()) is really bad especially for blocking.  That, of course, is also going to help a whole lot with your deadlocking problem.

    That might not be the only issue though.

    An "illegal DELETE" might be at fault.  This is where you have a DELETE that uses a JOIN but the table being deleted from isn't explicitly included in the FROM Clause.  This can (and it's not predictable) cause a massive Halloweening problem that (for example) can literally take hours to resolve when it "goes bad" where in normally only take a minute or two or even a second or two to normally run.

    The other problem is in the order of the deletes and what your using as the lookup column to do the deletes.  Yes... you can tell it to do things like delete all (for example) rows that have an inactive status that have a last modified date before a given date and it will usually work just fine.  A better way (IMHO), is to find all the rows that meet that condition and create a temp table of Clustered Index keys and then use that to identify which rows in the real table to delete.  That will also allow easy "chunking" of the rows if necessary.  The big key (no pun intended) is to delete the rows in order by the Clustering Key.

    As for your question of "Is there a way to find out the root cause of the blocking retrospectively?", the answer is probably not without setting up some form of monitoring or alerting but I could be wrong.  The default trace probably won't contain that info just because of a rapid turn over (it's too small, IMHO, and can't change its size) and I don't know if Extended Events or the "Server Health" has that kind of information in it because I've simply never used either.  Hopefully, someone knows the answer to that question and this post will serve as a "bump" as well.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply