Wondering why a procedure acquired a table lock for the first time in 6 years

  • Please forgive the vague nature of this post, but I've been up all night dealing with issues so I'm likely to leave out some details.

    Basically, our application has a procedure that runs every night around 9:15 PM and takes about 4 minutes. It updates 35-40K records in the a key table of our application (one procedure call per record). This procedure has run like clockwork for nearly 6 years, every night, with no code changes in that time.

    Yesterday our battery backup on our SAN failed during a power cutover so we lost the storage (which triggered a reboot of all our SQL Servers in the AG). After power was restored all servers came up fine and operations were normal throughout the rest of yesterday. Last night this job ran for around 30 minutes and, most importantly, acquired an exclusive lock on the key table I mentioned above which it appears not to have released until all 35-40K calls were completed.

    Myself and our other DBA are racking our brains as to what would cause a procedure to acquire an exclusive lock and run that long when it never had before. We pulled the plan and it matched our other environments, so there is something particular that pushed this procedure over the edge. We're just not sure what it is.

    Can anyone help? I can provide whatever details you feel I left out, but I wanted to get this post out while the incident was fresh in my mind. Thanks!

    Matt

  • Stale statistics

    Plan regression due to data volume changes

    Lack of lock memory, resulting an a lock escalation

    etc

    Could be a number of things, hard to say without any details (code, plan, etc)

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (1/7/2015)


    Stale statistics

    Plan regression due to data volume changes

    Lack of lock memory, resulting an a lock escalation

    etc

    Could be a number of things, hard to say without any details (code, plan, etc)

    Thanks for the quick reply!

    The statistics were updated directly before the run and the data volume is largely unchanged (and the plan is identical to the one from before the restart). Can you elaborate on the "lack of lock memory" theory? What would be easy way to check that? Memory stats on the server were pretty much healthy, but we did not dig into those in any way.

  • After the fact, you wouldn't. 🙁

    In short, if a single session is holding a lot of locks on a single object (>4000) or lock memory is taking up a large portion of the available memory (and I forget what 'large' is), then row or page locks get escalated to table locks.

    I think you're just going to have to dig in and monitor. Waits, locks, latches, that kind of thing. I'd start with what waits the query is incurring, and compare the actual execution plans carefully. Not just what operators are there, but row counts, number of executions, parallelism, etc.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (1/7/2015)


    After the fact, you wouldn't. 🙁

    In short, if a single session is holding a lot of locks on a single object (>4000) or lock memory is taking up a large portion of the available memory (and I forget what 'large' is), then row or page locks get escalated to table locks.

    I think you're just going to have to dig in and monitor. Waits, locks, latches, that kind of thing. I'd start with what waits the query is incurring, and compare the actual execution plans carefully. Not just what operators are there, but row counts, number of executions, parallelism, etc.

    There were indeed more than 4000 locks based on my analysis, so I wonder if another "batch job" procedure was running at the same time and provided the memory pressure needed to get this escalated to a table lock. I'm digging through our monitoring logs but I've asked our app team to look into what application jobs were running simultaneously that do not usually run at this time. Many, many thanks for the quick replies on this.

    Matt

  • Not 4000 locks total. 4000 locks on the same object from the same session. If that did happen, then that would have triggered a lock escalation to table.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 6 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply