Performance of the SQL MERGE vs. INSERT/UPDATE

  • guy.stephens (10/27/2013)


    Much as we love the Merge statement in my workplace, we seem to be running into deadlocking problems that we haven't had when using the UPDATE/INSERT .. LEFT JOIN approach.

    We haven't investigated too deeply but are tending back to UPDATE/INSERT on large datasets. Would be glad to hear from anyone else who has experienced this.

    I haven't run into this problem, but consider looking into your transaction isolation level. Chances are that if you are experiencing lock issues using the merge statement then you will also have problems when reading/writing from that table in other applications.

  • I've been using the Binary_Checksum to compare my source and target values before updating. This helps when you want to compare the two and handle NULL comparrisons at the same time.

    Example

    Merge SomeTable as T

    Using(

    Select alpha,bravo,charlie

    From AnotherTable

    ) s on T.alpha = S.alpha

    When Matched

    And (

    Binary_Checksum(t.bravo,t.charlie) <> Binary_Checksum(s.bravo,s.charlie)

    )

    Then Update

    ...

    I didn't write the rest of the merge example, but does anyone have an opinion on using the Binary_CheckSum?

  • ChrisM@home (10/29/2013)


    Thanks Dwain for another informative and well-written article. It wouldn't surprise me if the discussion provoked a significant rewrite with more use cases, based on posts so far.

    Thanks Chris. I agree some amendments might be in order based on the additional use cases. It appears that I have over simplified the situation.

    I will have to look into that. Still waiting on Steve to correct the formatting issues in the article.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • King Conch (10/29/2013)


    ...but does anyone have an opinion on using the Binary_CheckSum?

    Yes. I'd advise extreme caution. Run the following and see why.

    SELECT BINARY_CHECKSUM(CAST( 2 AS BIGINT))

    , BINARY_CHECKSUM(CAST(-3 AS BIGINT))

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I prefer using merge statement combined with output which enables me to perfrorm additional inserts.

    Below is an example of such statement; my task is to make an update on a target table.

    At the same time changes made must reflect in a history table.

    I compared the performance vs. classic approach and it, in most cases performs faster.

    insert into dbo.history_table

    (ItemId, Oid, old_value, new_value, TransactionCode)

    select

    ItemId, 18, old_value, new_Value, '1234567'

    from

    (

    merge into

    dbo.mother_table AS target_table

    using

    (select

    itemNo,

    1057

    from

    dbo.some_table

    where

    something = 3) as source_table (ItemNo, new_stat)

    on

    (target_table.some_field = source_table.ItemNo)

    when matched and target_table.stat <> new_stat then update set stat = new_stat

    output inserted.some_field, deleted.stat, inserted.stat)

    as spr(ItemId, old_value, new_Value);

  • marko.celarc (2/5/2015)


    I prefer using merge statement combined with output which enables me to perfrorm additional inserts.

    Below is an example of such statement; my task is to make an update on a target table.

    At the same time changes made must reflect in a history table.

    I compared the performance vs. classic approach and it, in most cases performs faster.

    insert into dbo.history_table

    (ItemId, Oid, old_value, new_value, TransactionCode)

    select

    ItemId, 18, old_value, new_Value, '1234567'

    from

    (

    merge into

    dbo.mother_table AS target_table

    using

    (select

    itemNo,

    1057

    from

    dbo.some_table

    where

    something = 3) as source_table (ItemNo, new_stat)

    on

    (target_table.some_field = source_table.ItemNo)

    when matched and target_table.stat <> new_stat then update set stat = new_stat

    output inserted.some_field, deleted.stat, inserted.stat)

    as spr(ItemId, old_value, new_Value);

    What you're doing here is composable DML and I wouldn't guarantee you'd get the same performance characteristic as what I was showing for the classic Upsert vs. the MERGE. I agree that for what you are doing here, it makes the code quite nice.

    Even with my findings, I tend to use MERGE a lot for its convenience. Typically the performance impact only appears when a large number of rows are affected by the MERGE.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Very nice and detailed article. Thank you.

  • tmitchelar (10/29/2013)


    guy.stephens (10/27/2013)


    Much as we love the Merge statement in my workplace, we seem to be running into deadlocking problems that we haven't had when using the UPDATE/INSERT .. LEFT JOIN approach.

    We haven't investigated too deeply but are tending back to UPDATE/INSERT on large datasets. Would be glad to hear from anyone else who has experienced this.

    I haven't run into this problem, but consider looking into your transaction isolation level. Chances are that if you are experiencing lock issues using the merge statement then you will also have problems when reading/writing from that table in other applications.

    No this is an issue with MERGE. It has been documented and closed as "By Design". https://connect.microsoft.com/SQLServer/feedback/details/723696/basic-merge-upsert-causing-deadlocks

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • @SSC Thanks for reposting this article. I missed it when it came out over a year ago!

    @ Jeff Moden - I know this was long ago, but back then you wrote:

    Jeff Moden (10/28/2013)


    I've never taken the time to quantify all the open Connect items on MERGE so thanks for the link to Aaron's good article on the subject.

    Even before seeing Aaron's list of faults, I had decided that MERGE wasn't worth the possible headaches especially with what it does in triggers. It just doesn't take that long to write a 3 part "merge" or a 2 part "upsert" and since I like to test each part, it's not likely that I'll ever use MERGE in SQL Server. That's not a recommendation to you folks. That's just my personal opinion of MERGE.

    and I was wondering if you still held that opinion. I'm asking, because I recently learned about a technique using the OUTPUT clause with MERGE to perform a set-based copy forward solution where the Parent table had an IDENTITY int primary key. (I was replacing a RBAR "solution" :crazy: a developer had written).

    Other set-based copy forward solutions I've seen use "WITH (TABLOCK, HOLDLOCK)" and SET IDENTITY_INSERT ON, but the MERGE solution doesn't require either. I haven't had time to do performance comparisons, but even if it turned out to be more expensive, it seems like the MERGE solution should have better concurrency. So in this case, MERGE seems like the greatest thing since bottled beer, but I greatly respect your experience and opinions. My questions to you are:

    Do you suggest avoiding MERGE like the plague, or merely using it with caution?

    Assuming database re-design is not an option ;), do you have a favorite set-based copy forward solution? If so, does it use TABLOCK and/or IDENTITY_INSERT?

  • MERGE is buggy (still), still subject to race/concurrency issues and you should try it with triggers on the target table sometime ...

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Thanks for the reply Kevin.

    "Buggy" and "issues" are vague terms, but I get the picture. Also, I'm OK just taking your word for it on the triggers! My goal isn't to defend MERGE, I'm interested in your (or the Community's) recommendation for a set-based copy forward technique.

    Not sure how standard that term is, so here's a typical scenario illustrating what I mean by "copy forward":

    A Parent table has an IDENTITY int as a primary key (PK) and one or more child tables have foreign keys (FK) referencing it. A block of new parent records needs to be created with duplicate data except for some attribute (calendar year, for example) which is loaded with a new value. All of the related child records also need be duplicated except, of course, their FKs which need to match the new parent record PKs. The challenge is to perform a block insert on the Parent in a way that captures and ties new PKs to original PKs and also has acceptable concurrency.

    IMHO, the best solution is to simply add a "CopiedFromID" column to the parent and insert the old PK into it when copying forward. The original child records can then be selected using CopiedFromID and their new FK loaded from the new Parent ID.

    We often don't have the luxury of adding a new column though, so assuming adding CopiedFromID is not an option, what set-based approach can be taken to map old parent PKs to the newly inserted ones?

    One set-based approach I've seen does the following:

    opens a table lock on the parent table,

    copies parent table IDs into a temp table which creates new ids in the temp table (using IDENT_CURRENT(ParentTable) for a seed),

    Sets IDENTITY_INSERT ON,

    Inserts new Parent records with temp table's new ID using join on temp table's old ID.

    Turns IDENTITY_INSERT OFF,

    Copies forward child table records using temp table,

    Releases table lock upon completion.

    This locks the parent table for the duration of the inserts which hurts concurrency. Plus we don't need to force new IDs. We just need to map the old IDs to what ever new ones SQL Server creates. If we use the OUTPUT clause of an INSERT, we can capture new IDs, but that's worthless because we don't have a map to the old IDs. That's what's so "magical" about using MERGE with an OUTPUT clause. It gives us the old-to-new mapping in the same statement that does the insert!

    Here's what that approach does:

    Merge to insert parent records and load old IDs mapped to new IDs into a temp table

    Copies forward child table records using temp table

    Is there an approach as elegant as this that doesn't use the "buggy" MERGE with "issues"?

    Heck, if MERGE's only problem is that it chokes on insert triggers, it might still make sense to disable them and replicate their behaviour in the copy forward procedure, even though that would mean using TABLOCK to prevent other processes from inserting while the trigger is disabled.

    I'm not trying to be argumentative. It just feels like I'm missing something obvious (besides adding that CopiedFromID column, πŸ˜‰ ) and I'm interested in learning if others have run into this business need and if they've solved it without RBAR, IDENTITY_INSERT, or TABLOCK.

  • I think there was a link to an Aaron Bertrand post about MERGE issues. Do a web search for "sql server merge concurrency problem" and you will get a number of relevant links. here's one:

    http://weblogs.sqlteam.com/dang/archive/2009/01/31/UPSERT-Race-Condition-With-MERGE.aspx

    If you don't HOLDLOCK you WILL get bad data under load.

    But whatever works for you is good by me. πŸ™‚

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Here is Aaron's blog post about MERGE. http://www.mssqltips.com/sqlservertip/3074/use-caution-with-sql-servers-merge-statement/[/url]

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Editorial comment: 'Simplistic' describes the logical fallacy of oversimplification. I think you mean 'simple' rather than 'simplistic' in "To illustrate our case, let’s set up some very simplistic source and target tables..."

  • Thanks Kevin and Sean for the informative links. After reading them I'm actually less concerned about using MERGE for copy forward operations.

    I apologize for not posting sample code for the copy forward technique I'm referring to. That might have made a difference in your replies. I saw it in a forum and assumed it was common knowledge. After looking the code over, I think you might agree that it's not vulnerable to the pitfalls described in those articles. The code is posted below (without the Try/Catch wrapper), but here are the main points:

    The procedure is not merging. It will never update. It is using MERGE to INSERT only. (There is no "WHEN MATCHED" and it is "matching" on 1=0)

    The MERGE is not inserting any primary keys. The PK on the target is an identity column. SQL Server is coughing up the value, not MERGE. This should have no more concurrency problems than an INSERT would.

    Because there are no DELETES or UPDATES, insert triggers will process inserted rows accurately. (The trigger problem I saw in the article seemed to be caused by mixed operations.)

    The ONLY reason MERGE is being used instead of INSERT is because it's OUTPUT clause can be used to expose the old PK at the same time as the new PK. (If somebody knows how to do that with a simple INSERT, please, share!)

    The following code is not my original work. I wish I could credit the original author but I can't remember where I found this as a posted solution to what seems to be a fairly common business need. As I stated earlier, IMHO the best solution is to add a column to the parent table, but when that's not an option I feel that this is the best solution. I will be very impressed if someone posts a better one:

    --Begin TRY block here

    DECLARE @IDXLATE TABLE (Old_ID INT, New_ID INT);

    MERGE Parent AS Dest

    USING (

    SELECT *

    FROM Parent

    WHERE filtercolumn = filtervalue --Condition to select records to copy forward

    ) AS Srce

    ON 0 = 1 --Never update, always insert

    WHEN NOT MATCHED BYTARGET THEN

    INSERT (datacolumn1 --data column to copy forward

    ,datacolumn2

    ,datacolumn3

    ,updatedcolumn) --column that will change in new record

    VALUES (datacolumn1,

    ,datacolumn2

    ,datacolumn3

    ,@newvalue) --data for column changed in new record (not shown above could be a new calendar year, for example)

    OUTPUT INSERTED.ID, Srce.ID INTO @IDXLATE(New_ID,Old_ID); --ID is IDENTITY int PK column, @IDXLATE will get Old to New ID mapping

    --Note that in one T-SQL statement we've performed our new insert and also captured map of Old ID to New ID!

    --Detail Tables

    INSERT INTO dbo.Child1

    (ParentFK

    ,datacolumns)

    SELECT

    New_ID

    ,datacolumns

    FROM dbo.Child1 C1

    INNER JOIN @IDXLATE s ON C1.ParentFK = s.Old_ID

    INSERT INTO dbo.Child2

    (ParentFK

    ,datacolumns)

    SELECT

    New_ID

    ,datacolumns

    FROM dbo.Child1 C2

    INNER JOIN @IDXLATE s ON C2.ParentFK = s.Old_ID

    --etc

    --End TRY block

Viewing 15 posts - 31 through 45 (of 53 total)

You must be logged in to reply to this topic. Login to reply