A Hazard of Using the SQL Merge Statement

  • vliet (4/3/2013)


    We tried both the CTE approach and the AND clause on the WHEN NOT MATCHED BY SOURCE part to merge rows spanning only a certain period. The performance of the CTE approach was much better, especially when the number of rows within that period was relatively small compared to the total number of rows in the target table. You will not notice the difference in these tiny example row sets but for a data warehouse the target table may easily contain several million rows and performance does matter.

    And I must admit we did forget that CTE the first time, deleting all rows in the target table outside the selected period. Luckily we had a separate development server so we could restore these rows from our production environment. Don't think you will never make this mistake, of coarse this behavior is by design but the article is not about bad behavior of the MERGE statement but about a caveat you should notice before using this statement. Thanks!

    I'd love to see some actual performance results, even though I didn't try to justify my approach on a performance basis. Just too busy with work at the moment to build me a test harness.

    I may come back to this after I get back from my visit to Papua New Guinea next week (if I don't forget or someone else hasn't beaten me to it).


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Jim Sebastiano (4/3/2013)


    ChrisM@Work (4/3/2013)


    I think Dwain knows a thing or two about table constructors [/url]already 😛

    I guess old habits die hard!

    Jim - I do get very excited about new things but as Chris pointed out table value constructors I have seen.

    I got ridiculously excited when I learned about composable DML, until my playtime with it concluded that it was quite limiting.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Steven, tspade and all the others (Phil expecially for multiple visits :-)) that have chimed into this lively discussion, my thanks for both your time reading the article and contributing.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Steve Jones - SSC Editor (4/3/2013)


    Interesting, and thanks for the piece. I definitely learned something here.

    So you're saying you can teach an old dog new tricks? 😛

    Thanks Steve for editing that initial data set up to make it right (yes I noticed).

    But you could have warned me of the reaction my alarmist title would bring! Didn't mean for that to happen, not that I'd want to change it now given how lively it's made the discussion.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • I guess I understand the concern some folks have with the title but I personally don't see a thing wrong with it. If you're not aware of what is actually a feature of the command, you can, as you said, really get burned by it.

    I've not done a deep dive on the article but I think it's a good one that explains one of those things that can go bump in the night.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (4/3/2013)


    I guess I understand the concern some folks have with the title but I personally don't see a thing wrong with it. If you're not aware of what is actually a feature of the command, you can, as you said, really get burned by it.

    I've not done a deep dive on the article but I think it's a good one that explains one of those things that can go bump in the night.

    Personally, I hate things that go bump in the night. My cats are waking me up all the time.

    Hopefully I didn't keep you up too late on this, but I'm thinking it was probably a pretty fast read.

    Glad you noticed it and thanks for your opinion.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Mike McIver

    "David looses 90% for the snipe at Microsoft ... long live Bill Gates! David needs to attend the Redmond Reeducation Camp.

    "

    Long Live Bill Gates? - Tee hee!

    I love Microsoft products (i have to as my whole career and business are based on MS) but somehow I cannot imagine that Bill will be remembered as fondly as Steve.

    Damn, now I have opened Pandora's box!

    (I also love controversy)

    David Bridge
    David Bridge Technology Limited
    www.davidbridgetechnology.com

  • An interesting article, but IMO it would be even better if you stated what you wanted: "...to replace the entire block of records for ID=2" in the introduction, or at least prior to running the MERGE statement. Instead, you simply say that the "plan is to merge our source table (#Test2) into our target table (#Test1)." Well, that is exactly what happens when the MERGE is run. Only afterwards do you say "Hold on, that wasn’t exactly what we wanted".

  • michaelhitchin (4/5/2013)


    An interesting article, but IMO it would be even better if you stated what you wanted: "...to replace the entire block of records for ID=2" in the introduction, or at least prior to running the MERGE statement. Instead, you simply say that the "plan is to merge our source table (#Test2) into our target table (#Test1)." Well, that is exactly what happens when the MERGE is run. Only afterwards do you say "Hold on, that wasn’t exactly what we wanted".

    Michael - I understand you're suggestion. The reason I wrote it the way I did is that my intention was to sound like a newbie that was sort of exploring both the MERGE statement and the requirements before settling on the final answer.

    Of course, we know that never happens so maybe you're right. 🙂


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Hi

    I'm a little late to the party. 🙂 and I'm a little confused - I have used merge in similar instances and it has behaved as was originally intended in the article. I ran the script that was kindly provided and the data was updated correctly without the need to delete or use a CTE? I am using SQL 2008R2. Could this be the reason?

    TIA

    Rx

  • Reeni-244464 (5/9/2013)


    Hi

    I'm a little late to the party. 🙂 and I'm a little confused - I have used merge in similar instances and it has behaved as was originally intended in the article. I ran the script that was kindly provided and the data was updated correctly without the need to delete or use a CTE? I am using SQL 2008R2. Could this be the reason?

    TIA

    Rx

    All scripts in the article were developed and tested on SQL 2008 R2, so the behavior demonstrated should hold there.

    Maybe there is something in your data that you are not considering? Part of the match (USING) criteria perhaps?


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Comment deleted, and reposted as a new topic: Merge documentation misleading?

  • I do appreciate the potential performance benefits of filtering the target early. However if clarity is of a higher priority, another approach is to give up using just one MERGE statement, and doing the delete with

    DELETE t FROM #Test1 AS t

    WHERE EXISTS (SELECT * FROM #Test2 AS s WHERE s.ID = t.ID) AND

    NOT EXISTS (SELECT * FROM #Test2 s WHERE s.ID = t.ID AND s.RowNo = t.RowNo);

    so the whole transaction becomes:

    BEGIN TRANSACTION T1;

    MERGE #Test1 t -- Target

    USING #Test2 s -- Source

    ON t.ID = s.ID AND t.RowNo = s.RowNo

    WHEN MATCHED

    THEN

    UPDATE SET Value = s.Value

    WHEN NOT MATCHED BY TARGET

    THEN

    INSERT (ID, RowNo, Value)

    VALUES (s.ID, s.RowNo, s.Value);

    DELETE t FROM #Test1 AS t

    WHERE EXISTS (SELECT * FROM #Test2 AS s WHERE s.ID = t.ID) AND

    NOT EXISTS (SELECT * FROM #Test2 s WHERE s.ID = t.ID AND s.RowNo = t.RowNo);

    SELECT

    *

    FROM #Test1

    ORDER BY ID, RowNo;

    ROLLBACK TRANSACTION T1;

  • Nice article, thanks.

    qh

    [font="Tahoma"]Who looks outside, dreams; who looks inside, awakes. – Carl Jung.[/font]

Viewing 14 posts - 46 through 58 (of 58 total)

You must be logged in to reply to this topic. Login to reply