Understanding MERGE to Insert, Update, and Delete Data Simultaneously

Question

Understanding MERGE to Insert, Update, and Delete Data Simultaneously

Noman072

SSChasing Mays

Points: 600
More actions
May 5, 2025 at 12:00 am

#4582071

Comments posted to this topic are about the item Understanding MERGE to Insert, Update, and Delete Data Simultaneously

Viewing 6 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

Ant-Green SSC Guru Points: 113566 More actions · Answer 1

Nice detailed article but for a wider topic and coming from CSS with customers complaining of issues.

Use merge with caution, there’s a number of bugs open with MSFT on merge which will just not be fixed or are detailed as by design.

Additionally leaving this here worth reading the array of articles if you’re having issues with merge.

https://sqlblog.org/merge

Johan Bijnens SSC Guru Points: 135587 More actions · Answer 2

Hugo Kornelis provided a nice update on the Merge problem cases in Sep 2023 "an-update-on-merge"

Johan

Learn to play, play to learn !

Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:

- How to post Performance Problems
- How to post data and code to get the best help

- How to prevent a sore throat after hours of presenting ppt

press F1 for solution, press shift+F1 for urgent solution 😀

Need a bit of Powershell? How about this

Who am I ? Sometimes this is me but most of the time this is me

o.schoen SSC Veteran Points: 213 More actions · Answer 3

Thanks for an interesting read and inspiration to do more research. Unfortunately, in the batch processing code there seem to be some bugs.

Using the SOURCE definition as a subquery (SELECT TOP (@BatchSize) * FROM UpdatedProducts WHERE Processed = 0) reduces the number of rows read from the UpdateProducts table leads to all rows outside this batch to be removed from the target table. This should be solved using a construction like

MERGE TOP @BatchsSize
INTO Products AS TARGET
USING UpdateProducts AS SOURCE
...

Further, the SOURCE definition is only valid within the MERGE statement so the filter data for the second UPDATE to SET Processed = 1 needs an OUTPUT statement into a table of processed rows from the MERGE.

Obviously, the original UpdatedProducs table will need the extra column - that much can be left to the reader as an exercise.

Best regards,

Oliver.

o.schoen SSC Veteran Points: 213 More actions · Answer 4

Update to my last: a batch process could work like this, using code from Noman and adding some bits

-- Create the target table (Products), which contains the existing product records
CREATE TABLE Products
(
   ProductID INT PRIMARY KEY,
   ProductName VARCHAR(100),
   Rate MONEY
);
-- Insert initial data into the Products table (target table)
INSERT INTO Products (ProductID, ProductName, Rate)
VALUES
   (1, 'Tea', 10.00),
   (2, 'Coffee', 20.00),
   (3, 'Muffin', 30.00),
   (4, 'Biscuit', 40.00);

-- Create the source table (UpdatedProducts) with updated data, including new products and modified prices
CREATE TABLE UpdatedProducts
(
   ProductID INT PRIMARY KEY,
   ProductName VARCHAR(100),
   Rate MONEY,
   Processed INT DEFAULT 0
);
-- Insert updated data into the UpdatedProducts table (source table)
INSERT INTO UpdatedProducts (ProductID, ProductName, Rate)
VALUES
   (1, 'Tea', 10.00),     -- No change, Tea stays the same
   (2, 'Coffee', 25.00),   -- Coffee has a price change
   (3, 'Muffin', 35.00),   -- Muffin has a price change
   (5, 'Pizza', 60.00);    -- New product to be added
GO


DECLARE @BatchSize INT = 1;
DECLARE @RowCount INT = (SELECT COUNT(1) FROM UpdatedProducts);
DECLARE @Act TABLE (InsertedID INT, DeletedID INT, Action NVARCHAR(10));

-- process removed products first
DELETE FROM Products
WHERE NOT ProductID IN (SELECT ProductID FROM UpdatedProducts)

-- Loop through in batches
WHILE (@RowCount > 0)
BEGIN
    -- Update or insert records in smaller batches
    DELETE FROM @Act; -- reset table
    MERGE TOP (@BatchSize)
INTO Products AS TARGET
    USING (SELECT ProductID, ProductName, Rate FROM UpdatedProducts WHERE Processed = 0) AS SOURCE
    ON TARGET.ProductID = SOURCE.ProductID
    WHEN MATCHED THEN -- name or rate change
        UPDATE SET TARGET.ProductName = SOURCE.ProductName, TARGET.Rate = SOURCE.Rate
    WHEN NOT MATCHED BY TARGET THEN -- new product
        INSERT (ProductID, ProductName, Rate) 
        VALUES (SOURCE.ProductID, SOURCE.ProductName, SOURCE.Rate)
--    WHEN NOT MATCHED BY SOURCE THEN -- product cancelled: already covered above to enable batch mode
--        DELETE
OUTPUT INSERTED.ProductID, DELETED.ProductID, $ACTION
INTO @Act(InsertedID,DeletedID,Action);
    -- Update the processed flag to prevent reprocessing
    UPDATE UpdatedProducts
    SET Processed = 1
    WHERE ProductID IN (SELECT COALESCE(InsertedID,DeletedID) FROM @Act);
    -- Check if there are more rows to process
    SET @RowCount = (SELECT COUNT(1) FROM UpdatedProducts WHERE Processed = 0);
END

This can be optimised: we do not need the DeletedID in the @Act table and instead of updating the source table we could use a join on the @Act table if we do not clear it on each pass. Main point is that I do the DELETE of outdated rows first off in a separate statement - there may be other solutions, but this immediately reduces the JOIN on the SOURCE and TARGET tables and so reduces overhead for each batch run.

Regards,

Oliver.

Kevin Kennedy Valued Member Points: 64 More actions · Answer 5

Many years ago I read an article online about "writing t sql merge statements the right way" (link appears to be dead). Using a given pattern, I have been using MERGE for many years with zero problems. It has worked for me in light to moderate ETL type of work. Of course, your mileage may vary and I would not use it every situation.