Big Performance Problem with Merge Statement

  • I'll second everyone else's comments: fascinating stuff, Paul. As I was saying, my "problem" is resolved, but I was really interested to understand why SQL was behaving that way. So, thank you, very much. You make it very clear. πŸ™‚

    I will be particularly interested to play around with using NOT EXISTS logic where we currently use WHERE IS NOT NULL.

  • David Moutray (6/24/2013)


    I will be particularly interested to play around with using NOT EXISTS logic where we currently use WHERE IS NOT NULL.

    If you feel like reading up on it a bit before diving in, these two articles are my favourites:

    http://www.sqlperformance.com/2012/12/t-sql-queries/left-anti-semi-join

    http://explainextended.com/2009/09/15/not-in-vs-not-exists-vs-left-join-is-null-sql-server/

  • Couple of other good ones on the same subject...

    http://sqlinthewild.co.za/index.php/2010/02/18/not-exists-vs-not-in/

    http://sqlinthewild.co.za/index.php/2010/03/23/left-outer-join-vs-not-exists/

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Oh yes, of course. Thanks!

  • spaghettidba (6/24/2013)


    Paul, your answers could be collected in a book and it would be an absolute best-seller!

    Awesome, as usual.

    Absolutely. Paul, you don't often post on ssc these days, but when you do it's a real blockbuster of a learning opportunity for us mere mortals. Thank you!

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • ChrisM@Work (6/25/2013)


    Absolutely. Paul, you don't often post on ssc these days...

    It's nothing personal, I just sort of drifted away when I found there weren't often "my sort of questions" around. If anyone notices a question I might be useful on, feel free to ping me via the thread or twitter etc.

  • I totally agree with The Dixie Flatline. Indexes, indexes, indexes. The one caveat I have found working on ETL processes is that having indexes on the tables being loaded (usually staging tables) can kill performance, especially if inserting rows in the middle of a clustered index. It may be necessary to drop some or all indexes on the target during the MERGE op, or even create a new index on the source table before the MERGE op and then drop it after, to get optimal performance.

    As also mentioned, there are certain very specific scenarios where a non-optimal query plan may be chosen, but you may still have some ways with query hints to force a specific query plan. Technically you could do the cleanup op (delete on no match) separately as implied above so that the upsert op doesn't have to worry about those records. That would be interesting to see the total performance across 2 separate commands.

    J Pratt

  • Hi , I know this post is very old but could you let me know how did you manage do a merge insert update within seconds.

    My source table has only 5 million records and target is empty.

    the merge insert update is taking forever.

  • I've seen that happen (inserting multi-millions of rows into an empty table taking hours) when using a clustered index on the target and the records are inserted in an order that does not follow the clustered index. The solution was to drop the clustered index first, insert all the records, then apply the clustered index. The reason is that if you insert, for example, the millionth record somewhere in the middle of the clustered index, roughly half of the 999,999 physical rows already there have to be reshuffled on disk to make room for the new record. And it just keeps getting worse the more records you add.

    J Pratt

  • Oh! I sorta created one because it was suggested for good performance of merge in https://technet.microsoft.com/en-us/library/cc879317%28v=sql.105%29.aspx.

    i.e. to have a clustered index in target.

    I could drop it and try but I don't think i need should need to create it again.

    Had created it for merge insert update in the first place.

  • A clustered index gives great, if not the best, performance IF you use it. If you do not use it in the match, or if you are inserting large amounts of records in the middle of a clustered index, it can reduce or totally kill performance.

    For merges, it is most important to match on an indexed column or columns (not necessarily clustered). When you are just inserting or updating, you only need to find the record that matches. For deletes when no match on source, you need to attempt a match and then identify the target record(s) with no matching source record in order to delete them. Being able to achieve an index seek in the query plan is optimal. An index scan is okay with small numbers of rows.

    If, for example, you have a single-column unique clustered index (e.g. ID/PK column) that is used to uniquely identify each record in the target AND the source, using that will definitely help performance since both the source AND target data will be sorted that way on disk and each row will easily be identified in both data sets. In most such cases, new records will also be at the end of the clustered index and therefore not cause the performance hit caused when inserting a row in the middle of a clustered index.

    For other scenarios, you need to assess the best way to achieve the best performance. In the case where the target is empty, technically all you are doing is inserts. If the clustered index on the target does not match the clustered index and/or sort order of the source data, your performance is going to tank over time. For that specific situation, I recommend either A) dropping the target's clustered index and reapplying it after the merge is done or B) sorting the source data to match the target clustered index sort order (still a performance hit, and really ugly to impractical as the row count gets really large).

    There's one more option for the situation where the target is empty and can be confirmed as empty: just do straight inserts, bulk copy, or another mass insert type of operation.

    J Pratt

  • OK, so the follow-up question this would be:

    How would MERGE perform inside an INSERT/UPDATE trigger?

    Being that the inserted table would be the source, thus minimal rows, and the target table in the 40-50 million row region.

    I have read articles on how MERGE is the spawn of the devil, and others how it is the saving grace that DB Designers have needed for so long ... and I am a little confused as 'proof' either way appears to depend on separate criteria. Is this just a case of when conditions are right then use, else ignore?

    ________________________________________________

    We passed upon the stair - and I was that man who sold the world
  • Performance inside a trigger will be much like performance outside a trigger, with the additional caveat that the Virtual Inserted and Deleted tables are like table variables in that they have no statistics. So, generating an optimal execution plan is somewhat "hit or miss". You also can't assume that trigger performance will be fine as long as you only insert a few rows at a time. That, however, is not specific to MERGE statements within a trigger. It applies to triggers in general.

    There are times when you really need triggers, though, and there are occasions when you need the MERGE statement, too. If you need to do some combination of inserts, updates and deletes at the exact same time, then there is really no other way to do it. Take a look at this article by Alex Kuznetsov for an example: https://www.simple-talk.com/sql/sql-training/yet-another-sql-strategy-for-versioned-data/. The general principle that I try to follow is that, like the infamous CURSOR, triggers and the MERGE statement are specialty tools intended for certain specific situations. Use them when you really need them. If you don't really need to use a trigger or a MERGE in a particular situation, then don't use them. I don't know what problem you are trying to solve, but is there a reason you can't use a combination of INSERT, UPDATE and DELETE statements inside a declared transaction?

    If you do need to use a MERGE (and triggers) I suggest the following guidelines:

    1. Keep the MERGE statement short and simple. Complex logic in a MERGE statement will baffle the query optimizer.

    2. As Tom LaRock says (http://thomaslarock.com/2015/06/101-things-i-wish-you-knew-about-sql-server/), Triggers are awful, awful things. Keep your trigger logic simple, short and sweet, too.

    3. As much as practical, limit the number of records your trigger processes. It doesn't guarantee that your trigger will perform well, but it might help.

    If you have a specific problem you'd like to pose, by all means, please post it as a separate question. We'll take a stab at it! πŸ™‚

    I hope this helps! πŸ™‚

Viewing 13 posts - 16 through 27 (of 27 total)

You must be logged in to reply to this topic. Login to reply