Deadlocks

  • There are many examples of the SELECT/UPDATE deadlock. Here are a few:

    SELECT CJ.JobID,

    CJ.NeedsVerification,

    JC.Name,

    JC.IsOutsideMatrix,

    JC.Quantity,

    JC.DriverDebit,

    JC.PassengerDebit,

    JC.AccountDebit

    FROM CompletedJob CJ WITH (INDEX (NeedsVerification_AccountInvoiceNum_AccJobTypeFK ))

    JOIN JobCharge JC

    ON CJ.JobID = JC.JobFK

    WHERE JC.Quantity IS NOT NULL

    AND CJ.Status = 'Complete'

    AND CJ.NeedsVerification = 1

    UPDATE CompletedJob

    SET Status = @P1,

    NeedsVerification = @P2,

    AccJobTypeFK = @P3,

    VehicleName = @P4,

    DriverName = @P5,

    PayeeName = @P6,

    DriverStatementFK = @P7,

    AccountInvoiceFK = @P8,

    DriverStatementNum = @P9,

    AccountInvoiceNum = @P10

    WHERE JobID = @P11

    (These two statements are issued by different processes). Checking the execution plan, the SELECT first uses the forced index on NeedsVerification etc followed by a bookmark lookup to get back to the main CompletedJob table. The three columns in the index are all updated by the UPDATE statement. So to summarise: the SELECT locks the index then the CompletedJob table, whereas the UPDATE locks the table then the index.

    Another similar example has the same SELECT statement deadlocking with: UPDATE CompletedJob

    SET NeedsVerification = @P1,

    DriverStatementFK = @P2,

    DriverStatementNum = @P3,

    AccountInvoiceFK = @P4,

    AccountInvoiceNum = @P5

    WHERE JobID = @P6

    A third example is SELECT AO.NominalPickupTime,

    DJ.VehicleName,

    DJ.DriverName

    FROM DespatchJob DJ

    JOIN Movement MM

    ON DJ.JobID = MM.DespatchJobFK

    JOIN ActiveOrder AO

    ON MM.JobID = AO.JobID

    WHERE DJ.DespatchMethod = @P1

    AND AO.NominalPickupTime <= @P2

    AND DJ.ActiveTaxiFK IS NULL

    UPDATE DespatchJob

    SET Status = @P1,

    ActiveTaxiFK = @P2,

    VehicleName = @P3,

    DriverName = @P4,

    RegistrationNumber = @P5

    WHERE JobID = @P6According to the execution plan, the SELECT statement does an IndexSeek on ActiveOrder.NominalPickupTime, then a NestedLoops join on table Movement, then an IndexSeek on DespatchJob.ActiveTaxiFK, then a NestedLoops join and a bookmark lookup on DespatchJob. Or to summarise: the SELECT locks the index DespatchJob.ActiveTaxiFK followed by the table DespatchJob, whereas the UPDATE locks the table followed by the index.

    I realise this is all very tortuous. But these are genuine examples of deadlocks which occur daily on a single customer site. There are many other instances but life is too short to list them all.

    The common factor is that the order of index reads in the SELECT conflicts with the order of index writes in the UPDATE. Result: unavoidable deadlock.

  • Could you perhaps post the schema of the two Job tables and the definitions of the indexes that these example queries use?

    What are the execution times of these selects and updates? How many rows in the tables, how many returned by the select, how many updated by the updates? How frequent are these statements?

    Any triggers on the tables?

    Would you mind running a couple of those selects with Showplan_all on, running to text and zipping and attaching the exec plan to your post?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • OK, I have attached a ZIP containing the DDL for all tables and indexes, plus execution plans and statistics for the two SELECT statements. These were measured this afternoon on the Customer's live system (SQL Server 2000 on Windows Server 2003, Dell PowerEdge 2900, 2 x Xeon Quad-Core 2.33GHz, 4MB Ram).

    Each UPDATE hits exactly one row (because the column JobID is the primary key in each case). I don't have stats for these - it is a live system after all. There is a single update trigger on each table for replication.

    The SELECT on CompletedJob may return hundreds of rows - however it is looking for a transient state in the lifecycle of a Job so sometimes it will hit fewer. This query is run perhaps 10 or 20 times per hour.

    The SELECT on DespatchJob will hit, at most, a handful of rows - fewer than 5, often none at all. This is run every two seconds.

    The UPDATEs are run perhaps every 5 to 10 seconds on average.

    The CompletedJob and Movement tables have about 400,000 rows. The DespatchJob and ActiveOrder tables have about 1,000 rows. Total database size is running at 5.6 GB on this site.

  • David,

    Your execution plan shows why you are deadlocking:

    1451 | |--Bookmark Lookup(BOOKMARK:([Bmk1000]), OBJECT:([Taxi1].[dbo].[CompletedJob] AS [CJ]))665Bookmark LookupBookmark LookupBOOKMARK:([Bmk1000]), OBJECT:([Taxi1].[dbo].[CompletedJob] AS [CJ])[CJ].[Status], [CJ].[JobID], [CJ].[NeedsVerification]70.5142140.218748267.7565637E-51182.0717533[CJ].[Status], [CJ].[JobID], [CJ].[NeedsVerification]NULLPLAN_ROW01.0

    It looks to me like you are forcing an incorrect index here. What happens is that your select graps a Shared Lock on the Nonclustered index, but it can't satisfy the query completely, so it has to make a lookup to the clustered index. However at the same time, your update has taken a exclusive lock on the clustered index. This will in turn require a Exclusive Lock on each of the non-clustered indexes to update them as well. Since you can't take a shared lock for reading off the clustered index for the select, and the update can't take an exclusive lock on the non-clustered index because the select has a shared lock, you end up deadlocked.

    You need to create an covering index for the select statement, and remove your index hint to allow the optimizer to select to covering index to resolve your deadlocking here.

    EDIT:

    For the above, the following index should cover the query with seeks:

    CREATE INDEX [NeedsVerification_AccountInvoiceNum_AccJobTypeFK1] ON [CompletedJob](Status, JobID, NeedsVerification)

    And the following should help with the other query:

    CREATE INDEX IX_DespatchJob_Test ON DespatchJob (JobID, DespatchMethod, ActiveTaxiFK, VehicleName, DriverName)

    CREATE INDEX IX_Movement_Test2 ON Movement (JobID, DespatchJobFK)

    Jonathan Kehayias | Principal Consultant | MCM: SQL Server 2008
    My Blog | Twitter | MVP Profile
    Training | Consulting | Become a SQLskills Insider
    Troubleshooting SQL Server: A Guide for Accidental DBAs[/url]

  • GilaMonster (7/9/2008)

    That, IMHO, is the height of laziness. "It's broken, but don't worry about fixing it."

    It is very unfortunate, but I have been locked in a battle with the Microsoft Sharepoint team for almost 9 months, and recently gave up, about adding 2 indexes to their Queue database to resolve persistent deadlocking. The SQL Engineers agreed with the indexes I recommended, but the Sharepoint Teams answer has consistently been:

    Microsoft Support

    "Office SharePoint Server 2007 Search does deadlock while crawling. On average it is acceptable to see 1 deadlock for every 200-500 documents crawled or 1 deadlock every 2-3 minutes. It is recommended to run the SQL script in KB (http://support.microsoft.com/kb/943345) to reduce fragmentation of the SharePoint Databases. Follow the recommendation in KB 943345 to determine how often to defragment the SharePoint databases. "

    It got to where I had to provide business impact, which there really isn't much since the application handles the deadlock and continues to resubmit the transaction until it is successful. Our impact is me getting phone calls after hours because the Sharepoint admins need help with something else, and the Level 1 support guy wants to look at why there are deadlocks persistently in the logs. I guess that isn't important enough to fix the problem.

    Jonathan Kehayias | Principal Consultant | MCM: SQL Server 2008
    My Blog | Twitter | MVP Profile
    Training | Consulting | Become a SQLskills Insider
    Troubleshooting SQL Server: A Guide for Accidental DBAs[/url]

  • Jonathan,

    you are right to some extent. The index I am using - designed to pick out a few hundred rows from hundreds of thousands - is the correct index; usually the Query optimizer will pick this index without me needing to force it with a hint. Occasionally - for whatever reason - it chooses some other less efficient plan, which is why I have resorted to the hint.

    Bear in mind that we are running the identical software application with the identical database schema across hundreds of Customer sites, large and small, busy and quiet. The application contains probably thousands of SELECT statements and hundreds of UPDATEs, INSERTs and DELETEs. Experience has taught us that we need to examine the database performance in detail, particularly locks, deadlocks and execution plans, and it's noticeable how these can vary from site to site.

    You state that using a non-covering index forces bookmark look-ups, and that these are the root of the problem. That is precisely correct, and that is precisely what I am complaining about.

    Surely it cannot be necessary to design all indexes to cover their base tables so that no bookmark lookups take place ? What is that going to do to the database performance ? Where in the documentation does it state that this is a required technique to avoid deadlocks ?

    Is this a "known" problem ? There seems to be a certain amount of denial on this thread.

    I guess from your second post that you have been bitten as badly as I have - in fact worse, since at least I can change the schema I use !

    I guess, when all's said and done, I shall have to live with this until we decide to move to '2008. But I did want to put on record the fact that this astonishing design flaw is present in SqlServer, and that the techniques for avoiding it - whether avoiding bookmark lookups by only using covering indexes, or using snapshot isolation, or reverting to Oracle - are simply workarounds.

  • David Griffiths (7/14/2008)

    Surely it cannot be necessary to design all indexes to cover their base tables so that no bookmark lookups take place ? What is that going to do to the database performance ? Where in the documentation does it state that this is a required technique to avoid deadlocks ?

    Is this a "known" problem ? There seems to be a certain amount of denial on this thread.

    I guess from your second post that you have been bitten as badly as I have - in fact worse, since at least I can change the schema I use !

    I guess, when all's said and done, I shall have to live with this until we decide to move to '2008. But I did want to put on record the fact that this astonishing design flaw is present in SqlServer, and that the techniques for avoiding it - whether avoiding bookmark lookups by only using covering indexes, or using snapshot isolation, or reverting to Oracle - are simply workarounds.

    Bart Duncan covers this topic pretty good in his blogging:

    http://blogs.msdn.com/bartd/attachment/747119.ashx

    http://blogs.msdn.com/bartd/archive/2006/09/13/Deadlock-Troubleshooting_2C00_-Part-2.aspx

    Even in SQL 2008, you are going to face this problem, because you have a flaw in your indexing design that will lend itself to deadlock scenarios. I have a table in a vendor database that has 36 indexes on it to prevent deadlocking from occuring due to their code design and table structure. It works just fine. The cost of IO isn't all that great, but it isn't all that bad either. If that is what I have to do to prevent deadlocks then that is what I do. If it were an internal table or application, we would have been doing some refactoring long ago, but I can only do so much with somethign that we purchased. Keep in mind, that this table is an extreme scenario. In most cases you are only talking about a couple of extra indexes to cover your queries better. You will see faster performance by avoiding the lookup as an added plus, so I don't see what the objection would be.

    Oracle is a completely different monster when it comes to Locking and deadlock scenarios. It is true that you can't get a Deadlock from Selects like this, but you can instead get a ORA 1555 Snapshot to old when the undo/redo buffer space runs out. There are trade offs in both environments. This isn't what I would call a design flaw in SQL Server, it is actually function just as they planned for it to, and when you get down into the roots of the storage internals, and concurrency, it actually kind of begins to make some sense.

    Jonathan Kehayias | Principal Consultant | MCM: SQL Server 2008
    My Blog | Twitter | MVP Profile
    Training | Consulting | Become a SQLskills Insider
    Troubleshooting SQL Server: A Guide for Accidental DBAs[/url]

  • David Griffiths (7/14/2008)


    You state that using a non-covering index forces bookmark look-ups, and that these are the root of the problem. That is precisely correct, and that is precisely what I am complaining about.

    Surely it cannot be necessary to design all indexes to cover their base tables so that no bookmark lookups take place ? What is that going to do to the database performance ?

    Improve it dramatically.

    Bookmark lookups are expensive operations. Very expensive operations since they are done one row at a time. If you force a bookmark lookup for 200 riows, that means 200 seeks to the base table (cluster or heap). See Linchi Shea's article that was linked in this week's database weekly

    It is not necessary to cover all queries, but every performance tuning books will tell you to cover the frequently running or very important ones.

    To quote Kimberly Tripp "Covering indexes are as close to a silver bullet as there is in performance tuning"

    I'm not saying it will resolve your issue, but there's a good chance that it will.

    How long do your selects currently take to run?

    Is this a "known" problem ? There seems to be a certain amount of denial on this thread.

    There's no denial. I'm trying to help you fix your problems. If you'd prefer to say they are unfixable and a fault of SQL, that's fine too.

    Up until now, I've considered this kind of deadlock theoretically possibel, but not likely to occur in a real system, due to the frequency of queries necessary.

    But I did want to put on record the fact that this astonishing design flaw is present in SqlServer, and that the techniques for avoiding it - whether avoiding bookmark lookups by only using covering indexes, or using snapshot isolation, or reverting to Oracle - are simply workarounds.

    Have you logged it on connect? If so, please post the link so I can add comments and a vote.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Jonathan,

    thanks for the links. Bart Duncan's explanation of the problem is excellent. I wish I'd seen it before !

    You say 'This isn't what I would call a design flaw.' I suppose it depends on the meaning of 'design flaw' - I would say that if some behaviour is unexpected, undesirable and undocumented, it's a design flaw. (That's what my customers would tell me too). On the other hand I can perfectly understand why Microsoft designed it that way - I'm certainly not trying to imply I could have done a better job of designing SQL Server ! And obviously I don't really think Oracle is better (otherwise we would be using it instead of SS).

    Gail,

    I'm sorry, I wasn't trying to pick a fight ! You have been nothing but helpful and constructive - my crack about 'denial' was aimed at earlier posters.

    I understand that covering indexes will often give performance benefits. However, I didn't understand that they were a requirement to prevent deadlocks until I discovered this behaviour for myself. And I genuinely was astonished when I found out.

    In fact, even using covering indexes wouldn't enable me to eliminate deadlocks. There are situations when a SELECT which joins two tables conflicts with UPDATEs to those tables. Depending on the execution plan of the SELECT, this may or may not cause a deadlock. How can I fix that situation ? Force a fixed execution plan - possibly an inefficient plan - with hints ? For every SELECT with a join ?

    In the end, the decision we have taken is to go with a schema which provides good performance, to avoid execution plan hints as far as possible, and to live with a low level of 'benign' deadlocks.

    No I haven't posted this on Connect. It obviously is a known problem, and it probably can't be fixed without rewriting SQL Server.

  • David Griffiths (7/15/2008)


    I understand that covering indexes will often give performance benefits. However, I didn't understand that they were a requirement to prevent deadlocks until I discovered this behaviour for myself.

    They are not. However they speed up queries and reduce required locks, both of which will reduce the chance of deadlocks.

    If you have a single lock on a table (whether NC or clustered index), you cannot cause a deadlock on that table. That requires 2 locks at minimum.

    With covering indexes, the only thing you need a lock on is the NC index. The update will have to wait for that, but since the select doesn't need a lock on the cluster at all, no deadlock.

    Unless you're running 2 or more updates within a transaction, that should fix things

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Good thread everyone. I learned a lot.

    Thanks,

  • GilaMonster (7/14/2008)

    Up until now, I've considered this kind of deadlock theoretically possibel, but not likely to occur in a real system, due to the frequency of queries necessary.

    But I did want to put on record the fact that this astonishing design flaw is present in SqlServer, and that the techniques for avoiding it - whether avoiding bookmark lookups by only using covering indexes, or using snapshot isolation, or reverting to Oracle - are simply workarounds.

    Have you logged it on connect? If so, please post the link so I can add comments and a vote.

    Gail,

    The key thing here is the timing of the events. In this kind of deadlock scenario, a few nanoseconds can make the difference between a successful execution of both queries or a deadlock scenario. I actually have spent days trying to recreate this kind of deadlock with no luck, even using replay traces that contained multiple deadlocks. I swore I had a bug in SQL until I found Bart Duncans blog.

    As for the connect idea, I would be interested in seeing what Microsofts Response would be. I would think that it will be something to the point that SQL Server uses Pessimistic Locking which is the most common manner of implementing the ANSI standard for transaction isolation, and that READ_COMMITTED_SNAPSHOT can be enabled at the database level in SQL 2005 to remove the blocking caused by SELECT statements.

    Kimberly Tripp wrote an excellent WhitePaper on Row Versioning-based Transaction Isolation that I would recommend as a read for this topic:

    http://download.microsoft.com/download/f/1/c/f1cf7b8d-7fb9-4b71-a658-e748e67f9eba/RowVersioningBasedIsolation.doc

    She covers in depth the comparison of this isolation level against Oracle's locking and row versioning methods. It is a very interesting read, and Kimberly always does a good job explaining the complex concepts where necessary.

    Jonathan Kehayias | Principal Consultant | MCM: SQL Server 2008
    My Blog | Twitter | MVP Profile
    Training | Consulting | Become a SQLskills Insider
    Troubleshooting SQL Server: A Guide for Accidental DBAs[/url]

  • Jonathan Kehayias (7/15/2008)


    The key thing here is the timing of the events. In this kind of deadlock scenario, a few nanoseconds can make the difference between a successful execution of both queries or a deadlock scenario.

    Agreed. That's why I'm surprised to see it happening more than once in a blue moon. Several times a day is rather ... unexpected.

    As for the connect idea, I would be interested in seeing what Microsofts Response would be.

    Likewise. They would probably say to use snapshot isolation. Though, it could be that a small change in the order of the locks taken by the update would prevent this from happening at all.

    At the time the clustered index update occurs, it is known which of the NC indexes need to be updated, so update locks could be taken on those first. The change would also have to be done for inserts and deletes as well.

    (Note, I do not, in any way, speak for the dev team. It's possible there's something within the engine that makes that proposition impossible)

    Kimberly Tripp wrote an excellent WhitePaper on Row Versioning-based Transaction Isolation that I would recommend as a read for this topic:

    I read it a while back. Excellent as always.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Jonathan Kehayias (7/15/2008)


    Kimberly Tripp wrote an excellent WhitePaper on Row Versioning-based Transaction Isolation that I would recommend as a read for this topic:

    She covers in depth the comparison of this isolation level against Oracle's locking and row versioning methods. It is a very interesting read, and Kimberly always does a good job explaining the complex concepts where necessary.

    I can't get this to load in my browser. Is there a problem in the way it's cited?

    I've greatly enjoyed this thread, and a belated congratulations, Gail, on being named an MVP!

  • steve smith (7/15/2008)


    I can't get this to load in my browser. Is there a problem in the way it's cited?

    Try doing a save-as

    http://download.microsoft.com/download/f/1/c/f1cf7b8d-7fb9-4b71-a658-e748e67f9eba/RowVersioningBasedIsolation.doc

    I've greatly enjoyed this thread, and a belated congratulations, Gail, on being named an MVP!

    Thanks. It's turning out to be a very good year.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 15 posts - 31 through 45 (of 47 total)

You must be logged in to reply to this topic. Login to reply