A Hazard of Using the SQL Merge Statement

Introduction

I love the SQL MERGE statement, introduced into MS SQL Server 2008. It’s made the job of updating tables where many records need to be either added, updated or removed a plain joy. No longer do you need to either do a DELETE first, or check for EXISTS and UPDATE then check for NOT EXISTS and INSERT. As far as I’m concerned, it is the neatest thing to come along in the Data Manipulation Language (DML) world since sliced bread.

If you’ve never used one before, you simply have got to try it. However there is one little wrinkle to it that can be a major hazard if you are caught unaware. So today we’ll set up a problem, show you how to use a MERGE to solve it, and then demonstrate both the hazard and the very simple way that it can be solved.

Set Up the Test Conditions

Let’s populate a couple of tables so that we can demonstrate the MERGE.


CREATE TABLE #Test1
(ID INT, RowNo INT, Value MONEY);
CREATE TABLE #Test2
(ID INT, RowNo INT, Value MONEY);
INSERT INTO #Test1              -- Target
 SELECT 1, 1, 25 
 UNION ALL 
 SELECT 1, 2, 32 
 UNION ALL 
 SELECT 2, 1, 38 
 UNION ALL 
 SELECT 2, 2, 61 
 UNION ALL 
 SELECT 2, 4, 43 
 UNION ALL 
 SELECT 3, 1, 15 
 UNION ALL 
 SELECT 3, 2, 99 
 UNION ALL 
 SELECT 3, 3, 54;
INSERT INTO #Test2              -- Source           
 SELECT 2, 1, 45 
 UNION ALL 
 SELECT 2, 2, 88 
 UNION ALL 
 SELECT 2, 3, 28;

Our plan is to merge our source table (#Test2) into our target table (#Test1). Note that for the rows where the ID=2, we have 3 values of RowNo in the target (1, 2, 4) and three values in the source (1, 2, 3). Here’s the MERGE:

BEGIN TRANSACTION T1;
MERGE #Test1 t  -- Target 
USING #Test2 s  -- Source 
ON t.ID = s.ID AND t.RowNo = s.RowNo 
  WHEN MATCHED 
    THEN     
      UPDATE SET Value = s.Value 
  WHEN NOT MATCHED 
    THEN       -- Target     
      INSERT (ID, RowNo, Value)     
        VALUES (s.ID, s.RowNo, s.Value);
SELECT 
  *
 FROM #Test1
 ORDER BY ID, RowNo;
ROLLBACK TRANSACTION T1;
SELECT *
 FROM #Test1 
 ORDER BY ID, RowNo;

Note that we’ve taken the precaution of wrapping our MERGE in a TRANSACTION and we do a ROLLBACK after the MERGE completes, so we can see the results of the MERGE displayed by the first SELECT. Those results are:

ID    RowNo  Value
1     1      25.00
1     2      32.00
2     1      45.00
2     2      88.00
2     3      28.00
2     4      43.00
3     1      15.00
3     2      99.00
3     3      54.00

The rows with IDs 1 and 3 are unchanged. For ID=2, we now have four rows where the Value column has been updated in RowNo 1 and 2, RowNo 3 was unchanged (because it didn’t exist in the source) and RowNo 4 was inserted.

Hold on, that wasn’t exactly what we wanted. We really wanted to replace the entire block of records for ID=2. Hastily, we consult MSDN about the MERGE statement.

Demonstrate the Hazard

A quick scan of the article identifies a clause of the MERGE where you can say WHEN NOT MATCHED BY SOURCE THEN DELETE. Obviously this must be the ticket, so let’s give that a try.

MERGE #Test1 t       -- Target 
USING #Test2 s       -- Source 
ON t.ID = s.ID AND t.RowNo = s.RowNo
  WHEN MATCHED 
    THEN
      UPDATE SET Value = s.Value
  WHEN NOT MATCHED   -- Target
    THEN
       INSERT (ID, RowNo, Value)
         VALUES (s.ID, s.RowNo, s.Value)
  WHEN NOT MATCHED BY SOURCE 
    THEN
      DELETE;
SELECT 
  *
 FROM #Test1
 ORDER BY ID, RowNo;
ROLLBACK TRANSACTION T1;
SELECT 
  *
 FROM #Test1
 ORDER BY ID, RowNo;

The results displayed by the first SELECT are now:

ID    RowNo  Value
2     1      45.00
2     2      88.00
2     3      28.00

Wait a minute. What happened to the rows for IDs 1 and 3? Holy, moldy guacamole! It’s a good thing we wrapped our test MERGE in a transaction before we ran that nasty bit of work in Production! Fortunately our second SELECT shows that our original data remains intact.

The problem of course is that when you say NOT MATCHED BY SOURCE, the source only contains records for ID=2, which as it turns out is what we’re left with. The result is to completely replace the target with the contents of the source.

A Simple Approach to Avoid the Hazard

Fortunately there’s a simple solution to make our MERGE operate the way we want it to, which is to only delete the unmatched records (based on ID) that are within our source. The answer is to do the MERGE to a Common Table Expression (CTE) or a VIEW that limits the row set in the target. Here’s how to do that:

BEGIN TRANSACTION T1;
WITH TargetRows AS
 (
     SELECT a.ID, RowNo, Value
     FROM #Test1 a
     INNER JOIN (
         SELECT ID
         FROM #Test2
         GROUP BY ID) b
         ON a.ID = b.ID
     )
 MERGE TargetRows t   -- Target
 USING #Test2 s       -- Source
 ON t.ID = s.ID AND t.RowNo = s.RowNo
   WHEN MATCHED 
     THEN
       UPDATE SET Value = s.Value
   WHEN NOT MATCHED   -- Target
     THEN
       INSERT (ID, RowNo, Value)
         VALUES (s.ID, s.RowNo, s.Value)
 WHEN NOT MATCHED BY SOURCE 
     THEN
       DELETE;
SELECT
  *
 FROM #Test1
 ORDER BY ID, RowNo;
ROLLBACK TRANSACTION T1;
SELECT
  *
 FROM #Test1
 ORDER BY ID, RowNo;

Now we see our results displaying just what the doctor ordered:

ID    RowNo  Value
1     1      25.00
1     2      32.00
2     1      45.00
2     2      88.00
2     3      28.00
3     1      15.00
3     2      99.00
3     3      54.00

The rows for IDs 1 and 3 remain intact and the rows for ID=2 consist of only the RowNos that are in our source (i.e., RowNo=4 was deleted).

Whew! What a relief!

Conclusion

Beware the hazards of using NOT MATCHED BY SOURCE because untested queries could land you in some seriously hot water when it deletes all the rows in your target table unexpectedly!

When you need to use it, make sure you only apply the MERGE to target rows that are within the scope of your source, so the operation doesn’t exceed its mandate. Do that by either merging through a CTE or through a VIEW.

And don’t forget to create and rollback a transaction around the DML you’re testing! That's especially good advice anytime you’re deleting rows. Nothing can be more annoying and frustrating to have to restore, even in a test system.

In the end, enjoy the benefits of the new MERGE statement. No longer do you need to worry about other transactions sneaking in and polluting your target row set while you execute multiple DML statements.

I'd like to thank all of the interested readers for their time and attention and we sincerely hope that we've helped at least one of you to avoid an embarrassing incident when using MERGE.

Dwain Camps

SQL Enthusiast

Skype: dwaincamps

A Hazard of Using the SQL Merge Statement

Introduction

Set Up the Test Conditions

Demonstrate the Hazard

A Simple Approach to Avoid the Hazard

Conclusion

Rate

Share

Categories

Share

Rate

A Hazard of Using the SQL Merge Statement

Introduction

Set Up the Test Conditions

Demonstrate the Hazard

A Simple Approach to Avoid the Hazard

Conclusion

Rate

Share

Categories

Share

Rate

Related content

Performance of the SQL MERGE vs. INSERT/UPDATE

Merge

Merge error handling

Duplicate Detection and Management

Stairway to T-SQL DML Level 12: Using the MERGE Statement