Performance of the SQL MERGE vs. INSERT/UPDATE

  • Gatekeeper (10/28/2013)


    Good write-up. I used to love MERGE when it was first introduced. I was bit on a few bugs but worked through them. However, Andy wrote a nice write-up describing open issues with MERGE. I've known/ran into a few of them on his list which has further pushed me away from implementing any new MERGE code.

    Broken MERGE Items[/url]

    I've never taken the time to quantify all the open Connect items on MERGE so thanks for the link to Aaron's good article on the subject.

    Even before seeing Aaron's list of faults, I had decided that MERGE wasn't worth the possible headaches especially with what it does in triggers. It just doesn't take that long to write a 3 part "merge" or a 2 part "upsert" and since I like to test each part, it's not likely that I'll ever use MERGE in SQL Server. That's not a recommendation to you folks. That's just my personal opinion of MERGE.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Phil Parkin (10/27/2013)


    Thanks for the concise article Dwain - an interesting read as usual.

    Hey Phil! Good to hear from you and thanks for the compliment.

    Haven't seen you out and about on the forums much of late. Glad to see you are still with us.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • twin.devil (10/28/2013)


    dwain.c (10/27/2013)


    guy.stephens (10/27/2013)


    Much as we love the Merge statement in my workplace, we seem to be running into deadlocking problems that we haven't had when using the UPDATE/INSERT .. LEFT JOIN approach.

    We haven't investigated too deeply but are tending back to UPDATE/INSERT on large datasets. Would be glad to hear from anyone else who has experienced this.

    I personally have not experienced this issue, however with a 28% performance hit it wouldn't surprise me.

    I would say that if you are really expecting to INSERT/UPDATE 100s of 1000s of rows, it might be a good idea to stick to the traditional.

    I guess the key words here are "it depends."

    +1

    As for the article it is a nice one and good for someone to understand the working of MERGE statement, and especially for the decide whether to with MERGE or Not.

    Thanks for the +1. I would say that it is important to remember it is just a guideline that may not be applicable to all cases and situations. Test, test and test again, to be sure where your case or situation falls within the narrow guidelines that I ran across.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • andrew.fenna (10/28/2013)


    This can be wrote as:-

    -- MERGE

    MERGE #Target t

    USING (@ID, @Value

    ) s (ID, Value)

    ON s.ID = t.ID

    WHEN MATCHED

    THEN UPDATE

    SET Value = s.Value

    WHEN NOT MATCHED

    THEN INSERT

    ( ID, Value )

    VALUES

    ( s.ID

    , s.Value

    );

    Andrew - I think it would be problematic to draw any conclusions off of a MERGE constructed this way as at most 1 row is in the source. Hard to garner any performance statistics without a lot of rows being affected.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Mark-101232 (10/28/2013)


    Interesting article!

    Paul White posted some useful information about MERGE performance

    http://www.sqlservercentral.com/Forums/FindPost1466528.aspx

    Mark - Thanks for the highly relevant thread. Paul's analysis is clearly informative, as his analyses usually are.

    I'm surprised I missed it since it came out only a few months ago, considering how much I troll the forums (does that make me a forum troll? :-P), but I think it may have come out when I was tied up with a project in Papua New Guinea.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • tim_edwards (10/28/2013)


    Great article!

    Anyone else getting a "The resource could not be found" error when trying to download the Test Harness files?

    Just in case you or anyone are having this issue, I've attached the .sql files again here.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • asdf-146317 (10/28/2013)


    Usually merge is modifying a tiny portion of the target rows. I do not think your test case is a valid real-world example since it would only happen one time. Subsequent merges on changes would be a more accurate test.

    I ran your test harness, only moving the location of the source and target tables outside of tempdb, and was not able to reproduce your results. Merge performed better in a normally logged database.

    In response to yours and an earlier post, I'll try running my test harness using permanent user tables. I will also try it on SQL 2008, whereas the original tests were run on SQL 2012.

    In the meantime, would you mind posting your results in a comparable format to what I did, and also tell us what your testing machine specs were? It might be interesting to see if there is some commonality to the cases where performance is a wash.

    This is something I'll try to do tonight (in about 12 hours or so), so please bear with me.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Had some deadlocks and massive performance hits with large datasets that involved varchar(MAX) fields..

    Added a plan guide with OPTION (MERGE JOIN) and the performance went back to good..

    Plan is to change them over to IUD's in the near future, but I still like merge in many situations.

  • A question about your results...

    What were the hardware specs of the system you ran the tests on? What kind of CPU? How much RAM? Size, speed and number of HDDs in the disk sub-system/RAID?

    Some of my customers have database servers that are processor and memory heavy, but are under-powered in the disk sub-system. I'd like to have an idea how my programmer switching to use MERGE would affect them. A lot of those customers unfortunately don't have a DBAs that I can coordinate tests with or allow remote access into their systems in order for me to perform benchmarks tests either.

    EDIT:

    Nevermind. I just saw the line at the bottom of the article that shows the hardware specs used. Thanks for such an informative article!

  • Nice article Dwain, thanks.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Very good article and nicely explained analysis.

  • Koen and sqlnaive - Thank you sirs for the feedback.

    Now on to rerunning my test harness with a couple of variations.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • It seems that there is quite a bit more variation than I thought. I have 2 machines available for testing:

    1. The Lenovo reported at the end of the article, which has installed SQL 2012 and SQL 2008R2 (both 64 bit versions).

    2. A Dell Inspiron Core i5 2.4 GHz with 8GB of memory running Windows 7 (64 bit) and SQL 2008R2 (64 bit).

    Focusing on comparing MERGE to the UPDATE/INSERT case and ignoring the others.

    Using the original Lenovo machine (4GB memory)

    And non-temporary tables on SQL 2012:

    CPU Reads Writes Duration

    MERGE 4618 4515337 5226 6772

    INSERT/UPDATE 3541 3075083 7019 3956

    MERGE 4633 4515221 5214 5117

    INSERT/UPDATE 3994 3075081 7007 4726

    MERGE 4587 4515222 4949 6627

    INSERT/UPDATE 3760 3075081 7010 4119

    MERGE 4664 4515221 5212 5384

    INSERT/UPDATE 3650 3075081 7009 4040

    MERGE 4493 4515220 5213 5025

    INSERT/UPDATE 3573 3075081 7008 3851

    Timings show that MERGE consumed 24% more CPU and 40% more elapsed time.

    Using temporary tables on SQL 2008R2.

    CPU Reads Writes Duration

    MERGE 5757 4525802 5193 10041

    INSERT/UPDATE 4009 2922120 4404 4401

    MERGE 5757 4525760 5173 6206

    INSERT/UPDATE 4009 2922120 4405 4430

    MERGE 5647 4525752 5202 6218

    INSERT/UPDATE 4103 2922124 4404 4523

    MERGE 5601 4525757 5136 6292

    INSERT/UPDATE 4040 2922120 4404 4320

    MERGE 5803 4525755 5162 6149

    INSERT/UPDATE 3853 2922121 4404 4607

    Timings show that MERGE consumed 43% more CPU and 57% more elapsed time.

    Using non-temporary tables on SQL 2008R2.

    CPU Reads Writes Duration

    MERGE 6271 4527210 2855 6995

    INSERT/UPDATE 4867 2928630 4408 5497

    MERGE 6271 4529746 2610 6961

    INSERT/UPDATE 4899 2931221 4396 5601

    MERGE 6302 4529750 3157 7421

    INSERT/UPDATE 4914 2931220 4406 5601

    MERGE 6303 4529750 2029 6792

    INSERT/UPDATE 5023 3087491 5814 5603

    MERGE 6334 4529750 2603 6782

    INSERT/UPDATE 4992 3087517 7016 7647

    Timings show that MERGE consumed 27% more CPU and 17% more elapsed time.

    Using the Dell machine (8GB memory)

    Using temporary tables on SQL 2008R2.

    CPU Reads Writes Duration

    MERGE 4664 4028115 4620 5708

    INSERT/UPDATE 3994 2922546 4008 5499

    MERGE 4929 4028112 4857 6022

    INSERT/UPDATE 3978 2922545 3990 5318

    MERGE 4805 4029027 3800 6371

    INSERT/UPDATE 4555 3080953 3991 6696

    MERGE 4836 4028115 4769 5790

    INSERT/UPDATE 4587 3080958 3966 6579

    MERGE 5101 4028114 4686 6411

    INSERT/UPDATE 4197 2922545 3993 6085

    Timings show that MERGE consumed 13% more CPU and elapsed time was about the same.

    Using non-temporary tables on SQL 2008R2.

    CPU Reads Writes Duration

    MERGE 6739 4029813 2955 14291

    INSERT/UPDATE 5944 2926033 5648 23672

    MERGE 4820 4029885 2902 9253

    INSERT/UPDATE 4150 2926035 4393 15135

    MERGE 4836 4029881 2633 11900

    INSERT/UPDATE 4212 2926024 4398 12546

    MERGE 4821 4029882 2669 5795

    INSERT/UPDATE 4119 2926024 4398 14905

    MERGE 4961 4029886 2613 5733

    INSERT/UPDATE 4274 2926021 4404 5869

    Timings show that MERGE consumed 15% more CPU but elapsed time was about 35% improved (although with a significant amount of variation).

    Sorry if I mislead anybody but it appears MERGE may not be so bad if you aren't too memory constrained. It seems what I said is true particularly when using MERGE, which is to do some testing under your particular environmental conditions before drawing any serious conclusions.

    Edit: Added the attachment, which is Test Harness #1 revised to use non-temporary tables.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • adnan.korkmaz (10/28/2013)


    Nice comparison.

    On the other hand, it could be better to see physical table performance comparison instead of / in addition to temporary tables.

    There's unlikely to be much difference.


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • Thanks Dwain for another informative and well-written article. It wouldn't surprise me if the discussion provoked a significant rewrite with more use cases, based on posts so far.


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

Viewing 15 posts - 16 through 30 (of 53 total)

You must be logged in to reply to this topic. Login to reply