May 5, 2025 at 12:00 am
Comments posted to this topic are about the item Understanding MERGE to Insert, Update, and Delete Data Simultaneously
May 5, 2025 at 7:41 am
Nice detailed article but for a wider topic and coming from CSS with customers complaining of issues.
Use merge with caution, there’s a number of bugs open with MSFT on merge which will just not be fixed or are detailed as by design.
Additionally leaving this here worth reading the array of articles if you’re having issues with merge.
May 5, 2025 at 7:55 am
Hugo Kornelis provided a nice update on the Merge problem cases in Sep 2023 "an-update-on-merge"
Johan
Learn to play, play to learn !
Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:
- How to post Performance Problems
- How to post data/code to get the best help[/url]
- How to prevent a sore throat after hours of presenting ppt
press F1 for solution, press shift+F1 for urgent solution
Need a bit of Powershell? How about this
Who am I ? Sometimes this is me but most of the time this is me
May 5, 2025 at 10:40 am
Thanks for an interesting read and inspiration to do more research. Unfortunately, in the batch processing code there seem to be some bugs.
Using the SOURCE definition as a subquery (SELECT TOP (@BatchSize) * FROM UpdatedProducts WHERE Processed = 0) reduces the number of rows read from the UpdateProducts table leads to all rows outside this batch to be removed from the target table. This should be solved using a construction like
MERGE TOP @BatchsSize
INTO Products AS TARGET
USING UpdateProducts AS SOURCE
...
Further, the SOURCE definition is only valid within the MERGE statement so the filter data for the second UPDATE to SET Processed = 1 needs an OUTPUT statement into a table of processed rows from the MERGE.
Obviously, the original UpdatedProducs table will need the extra column - that much can be left to the reader as an exercise.
Best regards,
Oliver.
May 5, 2025 at 1:17 pm
Update to my last: a batch process could work like this, using code from Noman and adding some bits
-- Create the target table (Products), which contains the existing product records
CREATE TABLE Products
(
ProductID INT PRIMARY KEY,
ProductName VARCHAR(100),
Rate MONEY
);
-- Insert initial data into the Products table (target table)
INSERT INTO Products (ProductID, ProductName, Rate)
VALUES
(1, 'Tea', 10.00),
(2, 'Coffee', 20.00),
(3, 'Muffin', 30.00),
(4, 'Biscuit', 40.00);
-- Create the source table (UpdatedProducts) with updated data, including new products and modified prices
CREATE TABLE UpdatedProducts
(
ProductID INT PRIMARY KEY,
ProductName VARCHAR(100),
Rate MONEY,
Processed INT DEFAULT 0
);
-- Insert updated data into the UpdatedProducts table (source table)
INSERT INTO UpdatedProducts (ProductID, ProductName, Rate)
VALUES
(1, 'Tea', 10.00), -- No change, Tea stays the same
(2, 'Coffee', 25.00), -- Coffee has a price change
(3, 'Muffin', 35.00), -- Muffin has a price change
(5, 'Pizza', 60.00); -- New product to be added
GO
DECLARE @BatchSize INT = 1;
DECLARE @RowCount INT = (SELECT COUNT(1) FROM UpdatedProducts);
DECLARE @Act TABLE (InsertedID INT, DeletedID INT, Action NVARCHAR(10));
-- process removed products first
DELETE FROM Products
WHERE NOT ProductID IN (SELECT ProductID FROM UpdatedProducts)
-- Loop through in batches
WHILE (@RowCount > 0)
BEGIN
-- Update or insert records in smaller batches
DELETE FROM @Act; -- reset table
MERGE TOP (@BatchSize)
INTO Products AS TARGET
USING (SELECT ProductID, ProductName, Rate FROM UpdatedProducts WHERE Processed = 0) AS SOURCE
ON TARGET.ProductID = SOURCE.ProductID
WHEN MATCHED THEN -- name or rate change
UPDATE SET TARGET.ProductName = SOURCE.ProductName, TARGET.Rate = SOURCE.Rate
WHEN NOT MATCHED BY TARGET THEN -- new product
INSERT (ProductID, ProductName, Rate)
VALUES (SOURCE.ProductID, SOURCE.ProductName, SOURCE.Rate)
-- WHEN NOT MATCHED BY SOURCE THEN -- product cancelled: already covered above to enable batch mode
-- DELETE
OUTPUT INSERTED.ProductID, DELETED.ProductID, $ACTION
INTO @Act(InsertedID,DeletedID,Action);
-- Update the processed flag to prevent reprocessing
UPDATE UpdatedProducts
SET Processed = 1
WHERE ProductID IN (SELECT COALESCE(InsertedID,DeletedID) FROM @Act);
-- Check if there are more rows to process
SET @RowCount = (SELECT COUNT(1) FROM UpdatedProducts WHERE Processed = 0);
END
This can be optimised: we do not need the DeletedID in the @Act table and instead of updating the source table we could use a join on the @Act table if we do not clear it on each pass. Main point is that I do the DELETE of outdated rows first off in a separate statement - there may be other solutions, but this immediately reduces the JOIN on the SOURCE and TARGET tables and so reduces overhead for each batch run.
Regards,
Oliver.
May 6, 2025 at 3:09 pm
Many years ago I read an article online about "writing t sql merge statements the right way" (link appears to be dead). Using a given pattern, I have been using MERGE for many years with zero problems. It has worked for me in light to moderate ETL type of work. Of course, your mileage may vary and I would not use it every situation.
Viewing 6 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply
This website stores cookies on your computer.
These cookies are used to improve your website experience and provide more personalized services to you, both on this website and through other media.
To find out more about the cookies we use, see our Privacy Policy