Introduction
I love the SQL MERGE statement, introduced into MS SQL Server 2008. It’s made the job of updating tables where many records need to be either added, updated or removed a plain joy. No longer do you need to either do a DELETE first, or check for EXISTS and UPDATE then check for NOT EXISTS and INSERT. As far as I’m concerned, it is the neatest thing to come along in the Data Manipulation Language (DML) world since sliced bread.
If you’ve never used one before, you simply have got to try it. However there is one little wrinkle to it that can be a major hazard if you are caught unaware. So today we’ll set up a problem, show you how to use a MERGE to solve it, and then demonstrate both the hazard and the very simple way that it can be solved.
Set Up the Test Conditions
Let’s populate a couple of tables so that we can demonstrate the MERGE.
CREATE TABLE #Test1 (ID INT, RowNo INT, Value MONEY); CREATE TABLE #Test2 (ID INT, RowNo INT, Value MONEY); INSERT INTO #Test1 -- Target SELECT 1, 1, 25 UNION ALL SELECT 1, 2, 32 UNION ALL SELECT 2, 1, 38 UNION ALL SELECT 2, 2, 61 UNION ALL SELECT 2, 4, 43 UNION ALL SELECT 3, 1, 15 UNION ALL SELECT 3, 2, 99 UNION ALL SELECT 3, 3, 54; INSERT INTO #Test2 -- Source SELECT 2, 1, 45 UNION ALL SELECT 2, 2, 88 UNION ALL SELECT 2, 3, 28;
Our plan is to merge our source table (#Test2) into our target table (#Test1). Note that for the rows where the ID=2, we have 3 values of RowNo in the target (1, 2, 4) and three values in the source (1, 2, 3). Here’s the MERGE:
BEGIN TRANSACTION T1; MERGE #Test1 t -- Target USING #Test2 s -- Source ON t.ID = s.ID AND t.RowNo = s.RowNo WHEN MATCHED THEN UPDATE SET Value = s.Value WHEN NOT MATCHED THEN -- Target INSERT (ID, RowNo, Value) VALUES (s.ID, s.RowNo, s.Value); SELECT * FROM #Test1 ORDER BY ID, RowNo; ROLLBACK TRANSACTION T1; SELECT * FROM #Test1 ORDER BY ID, RowNo;
Note that we’ve taken the precaution of wrapping our MERGE in a TRANSACTION and we do a ROLLBACK after the MERGE completes, so we can see the results of the MERGE displayed by the first SELECT. Those results are:
ID RowNo Value 1 1 25.00 1 2 32.00 2 1 45.00 2 2 88.00 2 3 28.00 2 4 43.00 3 1 15.00 3 2 99.00 3 3 54.00
The rows with IDs 1 and 3 are unchanged. For ID=2, we now have four rows where the Value column has been updated in RowNo 1 and 2, RowNo 3 was unchanged (because it didn’t exist in the source) and RowNo 4 was inserted.
Hold on, that wasn’t exactly what we wanted. We really wanted to replace the entire block of records for ID=2. Hastily, we consult MSDN about the MERGE statement.
Demonstrate the Hazard
A quick scan of the article identifies a clause of the MERGE where you can say WHEN NOT MATCHED BY SOURCE THEN DELETE. Obviously this must be the ticket, so let’s give that a try.
MERGE #Test1 t -- Target USING #Test2 s -- Source ON t.ID = s.ID AND t.RowNo = s.RowNo WHEN MATCHED THEN UPDATE SET Value = s.Value WHEN NOT MATCHED -- Target THEN INSERT (ID, RowNo, Value) VALUES (s.ID, s.RowNo, s.Value) WHEN NOT MATCHED BY SOURCE THEN DELETE; SELECT * FROM #Test1 ORDER BY ID, RowNo; ROLLBACK TRANSACTION T1; SELECT * FROM #Test1 ORDER BY ID, RowNo;
The results displayed by the first SELECT are now:
ID RowNo Value 2 1 45.00 2 2 88.00 2 3 28.00
Wait a minute. What happened to the rows for IDs 1 and 3? Holy, moldy guacamole! It’s a good thing we wrapped our test MERGE in a transaction before we ran that nasty bit of work in Production! Fortunately our second SELECT shows that our original data remains intact.
The problem of course is that when you say NOT MATCHED BY SOURCE, the source only contains records for ID=2, which as it turns out is what we’re left with. The result is to completely replace the target with the contents of the source.
A Simple Approach to Avoid the Hazard
Fortunately there’s a simple solution to make our MERGE operate the way we want it to, which is to only delete the unmatched records (based on ID) that are within our source. The answer is to do the MERGE to a Common Table Expression (CTE) or a VIEW that limits the row set in the target. Here’s how to do that:
BEGIN TRANSACTION T1; WITH TargetRows AS ( SELECT a.ID, RowNo, Value FROM #Test1 a INNER JOIN ( SELECT ID FROM #Test2 GROUP BY ID) b ON a.ID = b.ID ) MERGE TargetRows t -- Target USING #Test2 s -- Source ON t.ID = s.ID AND t.RowNo = s.RowNo WHEN MATCHED THEN UPDATE SET Value = s.Value WHEN NOT MATCHED -- Target THEN INSERT (ID, RowNo, Value) VALUES (s.ID, s.RowNo, s.Value) WHEN NOT MATCHED BY SOURCE THEN DELETE; SELECT * FROM #Test1 ORDER BY ID, RowNo; ROLLBACK TRANSACTION T1; SELECT * FROM #Test1 ORDER BY ID, RowNo;
Now we see our results displaying just what the doctor ordered:
ID RowNo Value 1 1 25.00 1 2 32.00 2 1 45.00 2 2 88.00 2 3 28.00 3 1 15.00 3 2 99.00 3 3 54.00
The rows for IDs 1 and 3 remain intact and the rows for ID=2 consist of only the RowNos that are in our source (i.e., RowNo=4 was deleted).
Whew! What a relief!
Conclusion
Beware the hazards of using NOT MATCHED BY SOURCE because untested queries could land you in some seriously hot water when it deletes all the rows in your target table unexpectedly!
When you need to use it, make sure you only apply the MERGE to target rows that are within the scope of your source, so the operation doesn’t exceed its mandate. Do that by either merging through a CTE or through a VIEW.
And don’t forget to create and rollback a transaction around the DML you’re testing! That's especially good advice anytime you’re deleting rows. Nothing can be more annoying and frustrating to have to restore, even in a test system.
In the end, enjoy the benefits of the new MERGE statement. No longer do you need to worry about other transactions sneaking in and polluting your target row set while you execute multiple DML statements.
I'd like to thank all of the interested readers for their time and attention and we sincerely hope that we've helped at least one of you to avoid an embarrassing incident when using MERGE.
Dwain Camps
SQL Enthusiast
Skype: dwaincamps