trouble wit a poor performing query

  • I've got a poor performing query and not sure the best approach to fixing it.  Here's the statement:


    UPDATE @t
     SET ToLocationID = it.LocationID
    FROM dbo.InventoryTransactions it, @t b
    WHERE it.TransactionGroupID = b.TransactionGroupGUID
    AND it.TransactionID <> b.TransactionID

    @t is a memory table that has some indexes built in.  In our worst case scenario, it has 5,000 rows.
    DECLARE @t TABLE (TransactionID int INDEX IDX_TransactionID, TransactionTypeCodeID int, LocationID int, FromLocationID int, ToLocationID int, TransactionGroupGUID uniqueidentifier INDEX IDX_TransactionGroupGUID,
    ProtocolPatientID int, ToDrop bit) 

    The permanent table InventoryTransactions is not too bad (~200,000 rows) and TransactionID is the PK with a clustered index.

    When we run this query, it takes 5-6 minutes for this one statement.  What am I missing?

  • Can you post execution plan (as a .sqlplan file) please?

    How does the performance change when you use a temp table instead of the table variable?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I agree, we need more details and a temp table would be much better.  But you should always UPDATE an alias when doing JOIN(s) in an UPDATE.  And, if possible, make the TransactionID in the table variable a primary key as well.

    UPDATE b
    SET ToLocationID = it.LocationID
    FROM dbo.InventoryTransactions it, @t b
    WHERE it.TransactionGroupID = b.TransactionGroupGUID
    AND it.TransactionID <> b.TransactionID

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • srmc - Tuesday, February 7, 2017 2:44 PM

    I've got a poor performing query and not sure the best approach to fixing it.  Here's the statement:


    UPDATE @t
     SET ToLocationID = it.LocationID
    FROM dbo.InventoryTransactions it, @t b
    WHERE it.TransactionGroupID = b.TransactionGroupGUID
    AND it.TransactionID <> b.TransactionID

    @t is a memory table that has some indexes built in.  In our worst case scenario, it has 5,000 rows.
    DECLARE @t TABLE (TransactionID int INDEX IDX_TransactionID, TransactionTypeCodeID int, LocationID int, FromLocationID int, ToLocationID int, TransactionGroupGUID uniqueidentifier INDEX IDX_TransactionGroupGUID,
    ProtocolPatientID int, ToDrop bit) 

    The permanent table InventoryTransactions is not too bad (~200,000 rows) and TransactionID is the PK with a clustered index.

    When we run this query, it takes 5-6 minutes for this one statement.  What am I missing?

    Unless TransactionGroupID is unique in at least one of the tables, you've created an accidental many-to-many join.  You may have also run across an exceptional problem with UPDATEs in T-SQL by saying UPDATE @t instead of using the alias for @t.  Change the UPDATE clause to UPDATE b and see if that helps (you may have to force a recompile to get rid of a bad cached plan) .  If not, you'll need to have a look at the actual execution plan and see if you have a fat arrow or two where the rowcount greatly exceeds on table or the other, which would be the proof of the accidental many-to-many join (a dual Triangular Join, in this case, which is just several rows short of being a full Cartesian Product).

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks for the tips.  Using the temp tables instead of memory tables and updating the alias helped a lot.  It took it from 5-6 minutes to a second.

      I'm still working through more issues in the bigger procedure (e.g. scalar function use) but this helps.

  • Be sure to cluster the temp table as well, on ( TransactionGroupGUID, b.TransactionID )

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Jeff Moden - Tuesday, February 7, 2017 5:00 PM

    You may have also run across an exceptional problem with UPDATEs in T-SQL by saying UPDATE @t instead of using the alias for @t.  Change the UPDATE clause to UPDATE b and see if that helps 

    Can you expand on this, or reference an article?  What makes this a problem?  My search only found how to use aliases (which I already know), but not why they should be used.  I generally use them for clarity and to save typing.

  • gvoshol 73146 - Thursday, February 9, 2017 5:21 AM

    Jeff Moden - Tuesday, February 7, 2017 5:00 PM

    You may have also run across an exceptional problem with UPDATEs in T-SQL by saying UPDATE @t instead of using the alias for @t.  Change the UPDATE clause to UPDATE b and see if that helps 

    Can you expand on this, or reference an article?  What makes this a problem?  My search only found how to use aliases (which I already know), but not why they should be used.  I generally use them for clarity and to save typing.

    I don't know of any articles on the subject.  I can only tell you from personal experience and through several folks I've helped on these very forums.

    I've been using an alias for updates since way back when in SQL 2000.  It wasn't in Books Online (BOL) back then and didn't actually show up in BOL until sometime after 2008.  BOL also showed that UPDATE and DELETE actually had  2 FROM clauses, the first being on the target of the UPDATE and the other being what we expect from a FROM clause now.  I remember that someone pointed out on these very forums that the use of an alias on the first FROM (normally implied instead of written out) was a little bit faster than using the table name again (already included in the 2nd FROM clause for joined updates).  They were right back then.  It was a little faster.  That was back in the heyday of 32 bit machines and SQL Server 2000.  It doesn't seem to add any oomph on 64 bit machines but it's a habit that I've gotten into and kept for readability and simplicity sake. 

    It also inherently prevents a deadly-to-performance form of UPDATE that was also not so well documented until Azure came out .  It's officially documented as an "implicit join" for Azure (search for "implicit join" in the article at the following url: https://msdn.microsoft.com/en-us/library/ms177523.aspx ) and I have no idea what the performance implications there are but, on standalone and VM boxes (ie non-Azure), it can raise all sorts of hell.  Here's the basic format of an "implicit join" for UPDATEs.

    UPDATE TargetTable
        SET SomeColumn = src.SomeColumn
       FROM SourceTable src
      WHERE TargetTable.SomeOtherColumn = src.SomeOtherColumn
    ;

    Another form is to have no aliases at all (like the example in the link I provided).

     UPDATE TargetTable
        SET SomeColumn = SourceTable.SomeColumn
       FROM SourceTable
      WHERE TargetTable.SomeOtherColumn = SourceTable.SomeOtherColumn
    ;

    What makes them an "implicit join" is that the target of the UPDATE is NOT included in the FROM clause and it can be deadly for standalone/VM instances.

    I first ran into this problem more than a decade ago in SQL Server 2000 (32 bit). The update worked fine (just a couple of seconds) on a Development box.  When it was deployed to the production box, it slammed 4 of the 8 CPUs (hyperlnked) into the wall for more than 2 hours.  And, it was very repeatable (we waited only 5-10 minutes on subsequent attempts before cancelling).  It took us a while to realize that form of update was no where to be found in Books Online, not to mention the fact that it worked fine with a similar amount of data on the Dev box.

    We changed it to a more traditional and fully documented "explicit join" (we didn't want to make too many changes at once and so didn't convert to 2 part naming using aliases, at first).

     UPDATE TargetTable
        SET SomeColumn = SourceTable.SomeColumn
       FROM SourceTable
       JOIN TargetTable ON TargetTable.SomeOtherColumn = SourceTable.SomeOtherColumn
    ;

    That update worked so fast we could hardly get our finger off the go-button before it completed.  Then, we converted it to what we both liked to see... proper use of aliases.

    UPDATE dbo.TargetTable
        SET SomeColumn = src.SomeColumn
       FROM dbo.SourceTable src
       JOIN TargetTable tgt ON tgt.SomeOtherColumn = src.SomeOtherColumn
    ;

    Again, the code ran like lightning.  To make sure we weren't being fooled by some form of caching, we ran the original code again and it still stuffed the CPUs.

    Finally, I suggested that the undocumented use of an alias in the UPDATE clause as a part of our standards would make it so that people had to include the target of the UPDATE in the FROM clause as part of an "explicit join" so that no one would make a similar mistake in the future and here's the form that we standardized on with never a similar problem occurring ever again.

     UPDATE tgt
        SET (tgt.)SomeColumn = src.SomeColumn  --(tgt.) is in parenthesis because it's optional
       FROM dbo.SourceTable src
       JOIN TargetTable tgt ON tgt.SomeOtherColumn = src.SomeOtherColumn
    ;

    There's no way I can put my finger on the posts but, since that time, I've run into many folks having the same performance issues with an "implicit join" in their UPDATE statements.  When I showed them how to convert it into the fully documented (except for the alias I used) "explicit join" form of the UPDATE, they all reported back with amazement at how such a simple thing could make such a difference.  To the best of my knowledge, they were all using 2008 or less.  I don't know if they fixed the problem in 2012 and up but, except for Azure, it's (implicit joined UPDATEs) still not documented in BOL as a legal form of UPDATE.

    It's difficult to write an article about because, it seems, that the conditions must be exactly right for the problem to occur but does seem to occur more often on servers with a fair number of CPUs, a lot of data, and just the right kind of indexing.  Of course, it also only occurs 17.5 hours after the second blood moon of the year or on the autumnal equinox and only if you've been able to stand an uncooked egg on its pointy end on the server. 😉  As a result, I've not been able to create test data that duplicates what occurs naturally and it would vary from box to box anyway.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks Jeff.

    I use either of the last 2 examples in your post.  I was concerned that the last one, with an alias following UPDATE, was or was not preferred over the 2nd to last.

    I avoid the implicit join in most of my queries, not just UPDATE, simply because it doesn't seem "right" to me.  I know it works; I just think using JOIN and ON provides better clarity.

  • I think the problem may be deeper than that.  Remember, once you alias a table in a query, you cannot refer to that table by its original name, but only by the alias.  With that in mind, let's look at this statement again:

    UPDATE @t
    SET ToLocationID = it.LocationID
    FROM dbo.InventoryTransactions it, @t b
    WHERE it.TransactionGroupID = b.TransactionGroupGUID
    AND it.TransactionID <> b.TransactionID

    I believe the "UPDATE @t" does not refer to the same table as "@t b".  Instead, every row in @t is being UPDATEd for every join match in "it" and "b" -- maybe!? -- and that is why the UPDATE takes so long.  I think SQL would treat this as basically an implicit type of cross join.

    But, when you use the proper alias, so that SQL "knows"/"understands" that you intend to update the joined "@t b" row, rather than every row in the table, the code naturally runs much faster, since every row is UPDATEd only once.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply