Performance of the SQL MERGE vs. INSERT/UPDATE

  • Andy DBA (2/6/2015)


    Thanks Kevin and Sean for the informative links. After reading them I'm actually less concerned about using MERGE for copy forward operations.

    I apologize for not posting sample code for the copy forward technique I'm referring to. That might have made a difference in your replies. I saw it in a forum and assumed it was common knowledge. After looking the code over, I think you might agree that it's not vulnerable to the pitfalls described in those articles. The code is posted below (without the Try/Catch wrapper), but here are the main points:

    The procedure is not merging. It will never update. It is using MERGE to INSERT only. (There is no "WHEN MATCHED" and it is "matching" on 1=0)

    The MERGE is not inserting any primary keys. The PK on the target is an identity column. SQL Server is coughing up the value, not MERGE. This should have no more concurrency problems than an INSERT would.

    Because there are no DELETES or UPDATES, insert triggers will process inserted rows accurately. (The trigger problem I saw in the article seemed to be caused by mixed operations.)

    The ONLY reason MERGE is being used instead of INSERT is because it's OUTPUT clause can be used to expose the old PK at the same time as the new PK. (If somebody knows how to do that with a simple INSERT, please, share!)

    The following code is not my original work. I wish I could credit the original author but I can't remember where I found this as a posted solution to what seems to be a fairly common business need. As I stated earlier, IMHO the best solution is to add a column to the parent table, but when that's not an option I feel that this is the best solution. I will be very impressed if someone posts a better one:

    --Begin TRY block here

    DECLARE @IDXLATE TABLE (Old_ID INT, New_ID INT);

    MERGE Parent AS Dest

    USING (

    SELECT *

    FROM Parent

    WHERE filtercolumn = filtervalue --Condition to select records to copy forward

    ) AS Srce

    ON 0 = 1 --Never update, always insert

    WHEN NOT MATCHED BYTARGET THEN

    INSERT (datacolumn1 --data column to copy forward

    ,datacolumn2

    ,datacolumn3

    ,updatedcolumn) --column that will change in new record

    VALUES (datacolumn1,

    ,datacolumn2

    ,datacolumn3

    ,@newvalue) --data for column changed in new record (not shown above could be a new calendar year, for example)

    OUTPUT INSERTED.ID, Srce.ID INTO @IDXLATE(New_ID,Old_ID); --ID is IDENTITY int PK column, @IDXLATE will get Old to New ID mapping

    --Note that in one T-SQL statement we've performed our new insert and also captured map of Old ID to New ID!

    --Detail Tables

    INSERT INTO dbo.Child1

    (ParentFK

    ,datacolumns)

    SELECT

    New_ID

    ,datacolumns

    FROM dbo.Child1 C1

    INNER JOIN @IDXLATE s ON C1.ParentFK = s.Old_ID

    INSERT INTO dbo.Child2

    (ParentFK

    ,datacolumns)

    SELECT

    New_ID

    ,datacolumns

    FROM dbo.Child1 C2

    INNER JOIN @IDXLATE s ON C2.ParentFK = s.Old_ID

    --etc

    --End TRY block

    INSERT, UPDATE, and DELETE also have the OUTPUT clause. The OUTPUT Clause goes back to (I think) SQL 2000 and maybe older.

    https://msdn.microsoft.com/en-us/library/ms177564.aspx

  • venoym (2/9/2015)


    Andy DBA (2/6/2015)


    Thanks Kevin and Sean for the informative links. After reading them I'm actually less concerned about using MERGE for copy forward operations.

    I apologize for not posting sample code for the copy forward technique I'm referring to. That might have made a difference in your replies. I saw it in a forum and assumed it was common knowledge. After looking the code over, I think you might agree that it's not vulnerable to the pitfalls described in those articles. The code is posted below (without the Try/Catch wrapper), but here are the main points:

    The procedure is not merging. It will never update. It is using MERGE to INSERT only. (There is no "WHEN MATCHED" and it is "matching" on 1=0)

    The MERGE is not inserting any primary keys. The PK on the target is an identity column. SQL Server is coughing up the value, not MERGE. This should have no more concurrency problems than an INSERT would.

    Because there are no DELETES or UPDATES, insert triggers will process inserted rows accurately. (The trigger problem I saw in the article seemed to be caused by mixed operations.)

    The ONLY reason MERGE is being used instead of INSERT is because it's OUTPUT clause can be used to expose the old PK at the same time as the new PK. (If somebody knows how to do that with a simple INSERT, please, share!)

    The following code is not my original work. I wish I could credit the original author but I can't remember where I found this as a posted solution to what seems to be a fairly common business need. As I stated earlier, IMHO the best solution is to add a column to the parent table, but when that's not an option I feel that this is the best solution. I will be very impressed if someone posts a better one:

    --Begin TRY block here

    DECLARE @IDXLATE TABLE (Old_ID INT, New_ID INT);

    MERGE Parent AS Dest

    USING (

    SELECT *

    FROM Parent

    WHERE filtercolumn = filtervalue --Condition to select records to copy forward

    ) AS Srce

    ON 0 = 1 --Never update, always insert

    WHEN NOT MATCHED BYTARGET THEN

    INSERT (datacolumn1 --data column to copy forward

    ,datacolumn2

    ,datacolumn3

    ,updatedcolumn) --column that will change in new record

    VALUES (datacolumn1,

    ,datacolumn2

    ,datacolumn3

    ,@newvalue) --data for column changed in new record (not shown above could be a new calendar year, for example)

    OUTPUT INSERTED.ID, Srce.ID INTO @IDXLATE(New_ID,Old_ID); --ID is IDENTITY int PK column, @IDXLATE will get Old to New ID mapping

    --Note that in one T-SQL statement we've performed our new insert and also captured map of Old ID to New ID!

    --Detail Tables

    INSERT INTO dbo.Child1

    (ParentFK

    ,datacolumns)

    SELECT

    New_ID

    ,datacolumns

    FROM dbo.Child1 C1

    INNER JOIN @IDXLATE s ON C1.ParentFK = s.Old_ID

    INSERT INTO dbo.Child2

    (ParentFK

    ,datacolumns)

    SELECT

    New_ID

    ,datacolumns

    FROM dbo.Child1 C2

    INNER JOIN @IDXLATE s ON C2.ParentFK = s.Old_ID

    --etc

    --End TRY block

    INSERT, UPDATE, and DELETE also have the OUTPUT clause. The OUTPUT Clause goes back to (I think) SQL 2000 and maybe older.

    https://msdn.microsoft.com/en-us/library/ms177564.aspx

    Why are you bothering with a merge here at all? Why not just remove the merge part of this entirely as it doesn't do anything for you.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • venoym - The OUTPUT clause was first introduced in MS SQL Server 2005.

    venoym and Sean - Please re-read the only bold font sentence in my last post and the sentence following it. I think you're missing the point. MERGE is needed, but not for merging. It's OUTPUT clause lets you retreive data from the INSERT's source along with data from the INSERT's destination. That's the ONLY thing it's being used for. The point of my last post is that this set-based solution is not vulnerable to MERGE's pitfalls when it is used for a true merge because it is only doing an insert and it is letting the db engine generate the primary keys. To use a term coined on this site, this usage of MERGE might be considered "SQL Spackle" because one would not normally think of using it this way.

    I think I went over the business requirement that this technique is addresssing in previous posts, but please let me know if it's not clear. I admit this whole exercise may seem a bit contrived because IMHO the best solution is to simply add a column to the parent table (and do a simple INSERT instead of MERGE), but there are situations where that may not be an option. In my particular case, I was replacing a RBAR "solution" that did not need an extra column and I had to provide a "like for like" replacement (in addition to my recommendation to add the column 😉 ).

  • Andy DBA (2/9/2015)


    venoym - The OUTPUT clause was first introduced in MS SQL Server 2005.

    venoym and Sean - Please re-read the only bold font sentence in my last post and the sentence following it. I think you're missing the point. MERGE is needed, but not for merging. It's OUTPUT clause lets you retreive data from the INSERT's source along with data from the INSERT's destination. That's the ONLY thing it's being used for. The point of my last post is that this set-based solution is not vulnerable to MERGE's pitfalls when it is used for a true merge because it is only doing an insert and it is letting the db engine generate the primary keys. To use a term coined on this site, this usage of MERGE might be considered "SQL Spackle" because one would not normally think of using it this way.

    I think I went over the business requirement that this technique is addresssing in previous posts, but please let me know if it's not clear. I admit this whole exercise may seem a bit contrived because IMHO the best solution is to simply add a column to the parent table (and do a simple INSERT instead of MERGE), but there are situations where that may not be an option. In my particular case, I was replacing a RBAR "solution" that did not need an extra column and I had to provide a "like for like" replacement (in addition to my recommendation to add the column 😉 ).

    But OUTPUT is available for an INSERT also. That is why I don't understand the point of using a MERGE when you are explicitly not matching. Just change it to an INSERT. https://msdn.microsoft.com/en-us/library/ms177564.aspx

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Perhaps a more detailed explanation will make things more clear.

    I'm selecting records from a table (PARENT) and inserting them into the same table but with a new PK (and one or more other changed columns eg. calendar year). I also want to select records from a related child table (CHILD) and insert them into the same child table, but with a new FK that points to the new PK inserted above. When I'm done I'll have two sets of data. For argument's sake let's say old 2014 parent records and their children, and new 2015 parent records and their children.

    To do so, I need a mapping table so I can SELECT old child records by matching the old PK to their FK and then INSERT them with a new FK set to the new PK value. But PARENT has an IDENTITY PK and the DB engine creates new PKs during the INSERT, I don't know what the new PK will be beforehand and don't know what record the newly inserted PARENT record was copied from.

    Make sense so far?

    The RBAR approach is to insert to PARENT one agonizing row at a time and use SCOPE_IDENTITY to retrieve the new PK. You have the old PK from the one row you just copied, so you can use it to SELECT CHILD records and INSERT them using the SCOPE_IDENTITY value you just pulled for their new FK. This is "wonderful" because you don't have to use Identity Insert or Tablock holdlock, etc. but it's RBAR!

    If I do a set based approach and I INSERT all the PARENT records at once, OUTPUT from that INSERT only gives me the columns that were inserted, including new system generated PK values. That is of no use in this case, because I don't know what original PKs the ones came from so I can't select the CHILD records tied to the original PARENT records.

    But if I use MERGE to do the INSERT, its OUTPUT will give me BOTH the old and new PKs together and I can put them into a temp table. I can then join to that temp table to select original CHILD records and INSERT new CHILD records with new FKs pointing to the new parent PKs. That one MERGE T-SQL statement not only inserts all of the PARENT records, it also gives me a mapping table I can use to load the CHILD records.

    Finally, because the DB engine is creating new PKs during the MERGE's insert, the insert is not vulnerable to PK violation problems that can be caused by race conditions in MERGE operations where PKs are explicitly being set.

    Make sense?

  • marko.celarc (2/5/2015)


    insert into dbo.history_table

    (ItemId, Oid, old_value, new_value, TransactionCode)

    select

    ItemId, 18, old_value, new_Value, '1234567'

    from

    (

    merge into

    dbo.mother_table AS target_table

    using

    (select

    itemNo,

    1057

    from

    dbo.some_table

    where

    something = 3) as source_table (ItemNo, new_stat)

    on

    (target_table.some_field = source_table.ItemNo)

    when matched and target_table.stat <> new_stat then update set stat = new_stat

    output inserted.some_field, deleted.stat, inserted.stat)

    as spr(ItemId, old_value, new_Value);

    I don't see why you need a MERGE within your composable DMS. The UPDATE seems to work exactly the same, assuming I understand what it is you're trying to do.

    CREATE TABLE #history_table

    (

    ItemID INT

    ,Oid INT

    ,old_value INT

    ,new_value INT

    ,TransactionCode VARCHAR(10)

    );

    CREATE TABLE #mother_table

    (

    some_field INT IDENTITY

    ,stat INT

    );

    CREATE TABLE #some_table

    (

    ItemNo INT IDENTITY

    ,something INT

    );

    INSERT INTO #some_table (something)

    SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3;

    INSERT INTO #mother_table (stat)

    SELECT 1111 UNION ALL SELECT 2222 UNION ALL SELECT 3333;

    SELECT *

    FROM #some_table;

    SELECT *

    FROM #mother_table;

    insert into #history_table

    (ItemId, Oid, old_value, new_value, TransactionCode)

    select

    ItemId, 18, old_value, new_Value, '1234567'

    from

    (

    --merge into

    --#mother_table AS target_table

    --using

    --(select

    --itemNo,

    --1057

    --from

    --#some_table

    --where

    --something = 3) as source_table (ItemNo, new_stat)

    --on

    --(target_table.some_field = source_table.ItemNo)

    --when matched and target_table.stat <> new_stat then update set stat = new_stat

    UPDATE target_table

    SET stat = new_stat

    output inserted.some_field, deleted.stat, inserted.stat

    FROM #mother_table target_table

    JOIN

    (select

    itemNo,

    1057

    from

    #some_table

    where

    something = 3) as source_table (ItemNo, new_stat)

    ON target_table.some_field = source_table.ItemNo

    ) as spr(ItemId, old_value, new_Value);

    SELECT *

    FROM #history_table;

    SELECT *

    FROM #some_table;

    SELECT *

    FROM #mother_table;

    GO

    DROP TABLE #mother_table;

    DROP TABLE #some_table;

    DROP TABLE #history_table;


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • We use merge extensively as part of our delivery process.  We need to have re-runnable scripts that affect "process" related data (reference type data).  If the value isn't there then insert, if the data is there AND it's changed then update it.  This keeps the deployment script clean and able to execute multiple times while still affecting the data once.

    merge   dbo.some_table with (holdlock) tgt
    using   aQueryOrCte src
            on src.KeyValue = tgt.KeyValue
    when matched
            and tgt.Column1 != src.Column1
             or tgt.Column2 != src.Column2
    then
    update
       set Column1 = src.Column1
        ,   Column2 = src.Column2
    when not matched
    then
    insert
        (   KeyValue
        ,   Column1
        ,   Column2
        )
    values
        (   src.KeyValue
        ,   src.Column1
        ,   src.Column2
        );

    It works like a champ!  I execute the scripts from a PoSh script running in Octopus, pipe the output to a file and report the results back to the caller.

    --Paul Hunter

  • I thought it was interesting that for the update test, the MERGE seemed to perform somewhat fewer reads and dramatically fewer writes. Even though the difference in elapsed time wasn't large in your test, it's possible the savings would increase for larger workloads. I would be surprised if MERGE was better at updates than UPDATE, but now you have me wondering.

  • SQL Server, in particular SQL Server 2014, has performance issues with left join, especially when compared to inner joins. I do not know how merge was implemented by Microsoft, but it can't be that much different than update/insert-left join within a transaction.  if your target table has hundreds of millions of rows and the source table is only a very small fraction of the size of the target table, it pays to use CTE for the target table with an inner-join to limit the scope of the target table.  When I did that, the performance of merge was significantly better than the performance of update/insert-left join.  Example:


    ;with TGTROWS as
    (
         select TGT.*
         from SRC inner join
              TGT on
                  ...join clause...
    )
    merge TGTROWS
    using SRC on
        ...join clause...
    when matched
    update xxx
    when not matched
    insert xxx;

Viewing 9 posts - 46 through 53 (of 53 total)

You must be logged in to reply to this topic. Login to reply