Understanding MERGE to Insert, Update, and Delete Data Simultaneously

  • Comments posted to this topic are about the item Understanding MERGE to Insert, Update, and Delete Data Simultaneously

  • Nice detailed article but for a wider topic and coming from CSS with customers complaining of issues.

    Use merge with caution, there’s a number of bugs open with MSFT on merge which will just not be fixed or are detailed as by design.

    Additionally leaving this here worth reading the array of articles if you’re having issues with merge.

    https://sqlblog.org/merge

  • Hugo Kornelis provided a nice update on the Merge problem cases in Sep 2023 "an-update-on-merge"

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • Thanks for an interesting read and inspiration to do more research. Unfortunately, in the batch processing code there seem to be some bugs.

    Using the SOURCE definition as a subquery (SELECT TOP (@BatchSize) * FROM UpdatedProducts WHERE Processed = 0) reduces the number of rows read from the UpdateProducts table leads to all rows outside this batch to be removed from the target table. This should be solved using a construction like

    MERGE TOP @BatchsSize
    INTO Products AS TARGET
    USING UpdateProducts AS SOURCE
    ...

    Further, the SOURCE definition is only valid within the MERGE statement so the filter data for the second UPDATE to SET Processed = 1 needs an OUTPUT statement into a table of processed rows from the MERGE.

    Obviously, the original UpdatedProducs table will need the extra column - that much can be left to the reader as an exercise.

    Best regards,

    Oliver.

  • Update to my last: a batch process could work like this, using code from Noman and adding some bits

    -- Create the target table (Products), which contains the existing product records
    CREATE TABLE Products
    (
    ProductID INT PRIMARY KEY,
    ProductName VARCHAR(100),
    Rate MONEY
    );
    -- Insert initial data into the Products table (target table)
    INSERT INTO Products (ProductID, ProductName, Rate)
    VALUES
    (1, 'Tea', 10.00),
    (2, 'Coffee', 20.00),
    (3, 'Muffin', 30.00),
    (4, 'Biscuit', 40.00);

    -- Create the source table (UpdatedProducts) with updated data, including new products and modified prices
    CREATE TABLE UpdatedProducts
    (
    ProductID INT PRIMARY KEY,
    ProductName VARCHAR(100),
    Rate MONEY,
    Processed INT DEFAULT 0
    );
    -- Insert updated data into the UpdatedProducts table (source table)
    INSERT INTO UpdatedProducts (ProductID, ProductName, Rate)
    VALUES
    (1, 'Tea', 10.00), -- No change, Tea stays the same
    (2, 'Coffee', 25.00), -- Coffee has a price change
    (3, 'Muffin', 35.00), -- Muffin has a price change
    (5, 'Pizza', 60.00); -- New product to be added
    GO


    DECLARE @BatchSize INT = 1;
    DECLARE @RowCount INT = (SELECT COUNT(1) FROM UpdatedProducts);
    DECLARE @Act TABLE (InsertedID INT, DeletedID INT, Action NVARCHAR(10));

    -- process removed products first
    DELETE FROM Products
    WHERE NOT ProductID IN (SELECT ProductID FROM UpdatedProducts)

    -- Loop through in batches
    WHILE (@RowCount > 0)
    BEGIN
    -- Update or insert records in smaller batches
    DELETE FROM @Act; -- reset table
    MERGE TOP (@BatchSize)
    INTO Products AS TARGET
    USING (SELECT ProductID, ProductName, Rate FROM UpdatedProducts WHERE Processed = 0) AS SOURCE
    ON TARGET.ProductID = SOURCE.ProductID
    WHEN MATCHED THEN -- name or rate change
    UPDATE SET TARGET.ProductName = SOURCE.ProductName, TARGET.Rate = SOURCE.Rate
    WHEN NOT MATCHED BY TARGET THEN -- new product
    INSERT (ProductID, ProductName, Rate)
    VALUES (SOURCE.ProductID, SOURCE.ProductName, SOURCE.Rate)
    -- WHEN NOT MATCHED BY SOURCE THEN -- product cancelled: already covered above to enable batch mode
    -- DELETE
    OUTPUT INSERTED.ProductID, DELETED.ProductID, $ACTION
    INTO @Act(InsertedID,DeletedID,Action);
    -- Update the processed flag to prevent reprocessing
    UPDATE UpdatedProducts
    SET Processed = 1
    WHERE ProductID IN (SELECT COALESCE(InsertedID,DeletedID) FROM @Act);
    -- Check if there are more rows to process
    SET @RowCount = (SELECT COUNT(1) FROM UpdatedProducts WHERE Processed = 0);
    END

    This can be optimised: we do not need the DeletedID in the @Act table and instead of updating the source table we could use a join on the @Act table if we do not clear it on each pass. Main point is that I do the DELETE of outdated rows first off in a separate statement  - there may be other solutions, but this immediately reduces the JOIN on the SOURCE and TARGET tables and so reduces overhead for each batch run.

    Regards,

    Oliver.

  • Many years ago I read an article online about "writing t sql merge statements the right way" (link appears to be dead). Using a given pattern, I have been using MERGE for many years with zero problems. It has worked for me in light to moderate ETL type of work. Of course, your mileage may vary and I would not use it every situation.

Viewing 6 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply