Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««12

Stairway to T-SQL DML Level 12: Using the MERGE Statement Expand / Collapse
Author
Message
Posted Wednesday, January 16, 2013 11:21 AM
SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Wednesday, October 15, 2014 10:06 AM
Points: 424, Visits: 237
Great article - we've used MERGE for some time and it reduces the CRUD burden considerably.

However, it should not be used if you are using transaction replication. We've found (and there are others if you google) that replication does not recognise the MERGE statement correctly, and will issue a DELETE/INSERT instead of an UPDATE to the subscribers. This can then collide with any Foreign Key Constraints you may have set up, breaking the replication. Unfortunately this has meant we've had to remove the use of MERGE for a lot of our routines.
Post #1408012
Posted Wednesday, January 16, 2013 11:47 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Friday, June 27, 2014 12:43 PM
Points: 13,872, Visits: 9,596
paul.barbin (1/16/2013)
Curious to hear from others on whether they use the MERGE command in production and what it's performance is like.

We began using it in our unit test scripts to build data for developers immediately following the database build (we're using SSDT 2012). We wanted to use MERGE so the scripts could run even if developers already had data in their database.

So, rather than having an insert script of 10,000 records, we use the Merge command along with the 10,000 rows using the VALUES statement. This is EXTREMELY slow!
It takes 30 minutes to populate a small database (5GB) vs 2-3 minutes using TRUNCATE/INSERT.

Paul


I use Merge in production all over the place.

For example, I have a series of databases that power websites. Thousands of hits per day. The data for them gets created/updated/deleted in a CRM, then has to be loaded into the databases for the sites. Pretty complex business rules on the load processes, and servers thousands of miles apart from each other.

Prior DBA had it all set up as Truncate then Insert, but I modified it all to use Merge. Used to be, we'd have "blank" periods a couple of time per day, and employees got used to (a) fielding questions about that from customers, and (b) making changes and not having them show up till (sometimes) late the next day.

I modified it all to use Merge, and set it to run hourly. No "blank periods", and minimal waits for data to go live. Everyone loves it.

However, since Merge has to lock the tables that are being reloaded, and usually needs a full table-lock because of the scope of the action, it would have failed if I didn't combine it with Read Committed Snapshot Isolation. That allows queries from the sites to keep running and not be blocked by the large Merge actions.

Tested without that, and it was bad. (Was in a test environment, so no impact on anyone. But if Merge had gone live without that testing, it would have been worse than the prior "the tables are temporarily empty" issues.)

So, it can be pretty useful, but test against expected concurrent load.

Also tried more traditional two-step update/insert actions, but Merge performed better than those in my case. I've seen it take longer, however, so, again, test vs more traditional methods.

Edit: Truncate+Insert will be faster. It doesn't require any join math between the source and destination. That alone will make it faster. If speed is the only concern, then truncate+insert will almost certainly be faster. However, that option also means you can't have FKs referencing the table, and all the other issues that go along with truncate. But if speed is really all that matters, use it.


- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Post #1408024
Posted Wednesday, January 16, 2013 12:58 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Tuesday, October 7, 2014 11:51 AM
Points: 3, Visits: 104
@Brian, that delete/insert action in transactional replication is a known bug (or at least it used to be). We had that problem many moons ago and we weren't using the MERGE command. MS said that if the key field in an index is updated, transactional replication views that as a Delete/Insert pair. We had to have special stored procs on the subscriber to handle the error for us. When the Delete sp was called, we checked to see if the delete would fail due to FK constraints, if so, we determined that it was a "false" delete and we'd ignore the delete command. The following command would be the insert and we'd check to see if the record already existed. If so, we'd just update it with the values. That was way back in 2000 and 2005. No idea if that's corrected in the newer versions.

@Gsquared, thanks. I was just curious, I haven't used MERGE outside of this one case. In this particular case, speed isn't crucial, but the time difference is SO great that it starts to become an issue. What we wanted was the ability to run the script regardless of the state of the db and MERGE gives that ability where "kill and fill" doesn't. Good to know that when used in other applications that it is as fast or faster.

Thanks.
Post #1408048
Posted Wednesday, January 16, 2013 1:17 PM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Friday, June 27, 2014 12:43 PM
Points: 13,872, Visits: 9,596
There are some very solid, and pretty intuitive, ways to speed up Merge.

For my cross-server ETL, I set up a bulk-copy database that just copies the source tables from the other server. Very fast because it's minimally logged and doesn't incure DTC locks. Then I Merge from that. But that kind of "stage first, load second" isn't specific to Merge. I'd have done that no matter what my final choice was for loading the target tables, since it would apply equally to kill'n'fill, or update/insert (2-step), or anything else.

Another way to really speed up Merge is to make sure the On statement from source to target, is properly indexed, and (preferably) has index-enforced uniqueness for the source. For complex source queries, it can be faster to load a temp table, index that, then use Merge with that as the source.


- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Post #1408062
Posted Thursday, January 17, 2013 1:58 AM
SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Friday, October 17, 2014 7:55 AM
Points: 411, Visits: 1,406
@Ryan.Polk: Yes, you can do that, but it is not the same. First, as you just demonstrated, it is easy to forget that you need to add the same condition to both the "when not matched by source" and the "when matched" clauses. But on top of that, if you compare the query plans and the io statistics for both options, you will see that my approach with the cte performs better than your alternative. i.e. my cte approach is easier on the eye plus it performs better.

statistics io for cte approach:
Table 'suppliers'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'supplierproducts'. Scan count 1, logical reads 6, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#4C8B54C9'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.


statistics io for 2 Ryan.Polk method:
Table 'suppliers'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'supplierproducts'. Scan count 2, logical reads 13, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#4C8B54C9'. Scan count 2, logical reads 17, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.




Posting Data Etiquette - Jeff Moden
Posting Performance Based Questions - Gail Shaw
Hidden RBAR - Jeff Moden
Cross Tabs and Pivots - Jeff Moden
Catch-all queries - Gail Shaw


If you don't have time to do it right, when will you have time to do it over?
Post #1408220
Posted Friday, January 18, 2013 2:08 PM


Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Tuesday, September 23, 2014 10:28 AM
Points: 23, Visits: 220
i also found that performance of MERGE is a bit worse than just INSERT/UPDATE - with indexes in place - it is not a huge difference but still noticeable.

Also it is hard to debug/troubleshoot MERGE statements - with INSERT/UPDATEs you can just select the SELECT part with joins and WHERE clause and run it to see results and play with them till you are ready to do actual inserts/updates.

With Merge statement, you would have to cut out the code from it and run on a side or use output clause which you have to type or uncomment.

One pain at least in 2008, that you cannot use results on OUTPUT clause outside of the MERGE (as a inner query) - you can only insert results to a table variable which you need to define and create first.

So after some first excitements our team is back on using old good UPSERTs.
Post #1409103
Posted Wednesday, August 28, 2013 9:58 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 9:36 PM
Points: 35,347, Visits: 31,885
Greg,

Would you respond to the following post concerning your stairway. Thanks.
http://www.sqlservercentral.com/Forums/Topic1489336-90-1.aspx


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1489511
Posted Monday, July 28, 2014 9:43 AM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Thursday, October 16, 2014 11:16 AM
Points: 219, Visits: 724
I would like to get started with these...

DML:
http://www.sqlservercentral.com/stairway/75773/

Advanced T-SQL:
http://www.sqlservercentral.com/stairway/104497/

The only thing I could not determine is which version of SQL Server Express I should install and what database these are written for (ie: Adventure Works)?

Thanks!
Post #1596925
« Prev Topic | Next Topic »

Add to briefcase ««12

Permissions Expand / Collapse