How To Avoid MERGE in SQL Server

  • Comments posted to this topic are about the item How To Avoid MERGE in SQL Server

  • Hi. What are the advantages of this approach as opposed to using a MERGE statement?  We use MERGE statements extensively as part of our data warehouse's ETL processes and I've always found them to be fast and easy to write/understand.

  • I like this, especially after having just got to grips with a MERGE and realised a few limitations myself (for example if you're logging row counts, there's no easy way to determine how many were in the INSERT branch and how many in the UPDATE branch as @@ROWCOUNT returns the total number of rows).

    I can definitely see a use-case for this script and it's essentially the "fix" to the MERGE questions I had. We backload many rows of hospital data each day and our current solution is either to perform a full TRUNCATE of PROD or to delete a specific chunk based on dates and insert the updated rows.

    But I wonder what that transaction log overhead will be on these multiple UPDATE statements on larger tables with myriad fields.

  • Try $action with OUTPUT.

  • You mention a few metrics on performance for this method.  Do you also have metrics to compare against the merge statement? I'd be interested to see the IO statistics on both of these or any other metrics to show the advantages.

  • Performance wasn't the motivator for this script because both approaches have performed well enough for me. But if you just want to know the status of each merged record (inserted, deleted, updated)  then MERGE isn't needed. But unlike my script which knows nothing about which columns have changed, MERGE is useful if you want to do something when a particular column has changed.

  • One benefit to understanding methods for merging without using the T-SQL MERGE command is that you can more easily port concepts to other platforms. For example, Snowflake supports _a_ MERGE INTO command, but it is structured and behaves slightly different to the T-SQL MERGE.

    Side note: looks like your cat walked on your keyboard before you cut'n'pasted the script - see line 38 🙂

  • I wrote this script because MERGE seemed overkill for something quite simple. Also, as previous comments have indicated, this script has some advantages but it can't replace the more flexible approach that MERGE delivers.

  • It looks like a random string "exddasf" got inserted into the script when I pasted it into the article:

    -- Display Source and Target tables.

    SELECT * FROM [dbo].[Table1] -- Source

    SELECT * FROM exddasf[dbo].[Table2] -- Target

    Just remove it.

    Thanks for pointing that out.

     

     

  • Another advantage (hinted at in the introduction) is the potentially maintenance-free nature of the algorithm for when the column profile of the source/target tables changes, assuming they remain the same.

    A downside is that it does not accommodate target tables with Foreign Keys - a row that's referenced by another record cannot be updated simply by deleting the old version and inserting the new one.

    Forum Etiquette: How to post data/code on a forum to get the best helpMake sure that you include code in the appropriate IFCode tags, e.g. <your code here>. You can find the IFCode tags under the INSERT options when you are writing a post.

  • You've raised a good point about Foreign Keys.

    The script assumes no constraints on the target table, other than a key.

    Basically, I extracted all the records into a no-constraint table and then used an empty version of that table (with all constraints present) to check data integrity. This was done by auto-building all the error-checking queries using the table's own specifications. For example, date columns would be auto-tested for legitimacy, varchars checked for proper lengths, NULL checks, etc.

  • I'm not a huge fan of this as a replacement for MERGE as any existing reports/stored procedures/views would need to be changed to cater and filter on the new ETL fields as well as any SQL users would need to ensure they look out for this sort of thing and query accordingly as figures could easily show inflated numbers/records if not done properly.

    Just a point for one of the earlier comments I use the following code to help store MERGE record counts -

    DECLARE @ChangedTable Table (

    Action Varchar(20),

    ID BIGINT

    )

    MERGE...

    OUTPUT

    $action, SOURCE.ID

    INTO @ChangedTable;

    INSERT INTO Log

    SELECT

    'StoredProcedure'

    , GETDATE()

    , Action

    , COUNT(*)

    , 'SourceTable'

    , 'TargetTable'

    , 'Merge Procedure for Activity'

    FROM @ChangedTable

    GROUP BY Action

  • Thanks for the snippet to store MERGE counts.

    The script is only meant to replace MERGE where you don't have special processing on any column (such as logging a row if some particular column has changed).

    In other words, if you only want the status of each row and don't care which column has changed, then MERGE isn't needed.

     

     

     

     

  • This doesn't float my boat because I'm a fan of the MERGE statement and use it extensively in our data warehouse loading. So extensively that I have written a script to generate the merge statement from the structure of the destination table by using information schema views which happily generates a merge script for large numbers of columns which can then be manually tweaked to fine tune. The work is then generating the source view which does the real T of the ETL process and has column names matching the final table it is merged into.

    And if you log the output of a Merge - action, inserted key and deleted key it is easy to see which are inserts and which are updates.

  • Sorry, but this script won't work well for large tables, especially with heavy use of IN() and NOT IN() operators.

    MERGE is usually faster than series of consecutive INSERT/UPDATE/DELETE statements.

Viewing 15 posts - 1 through 15 (of 26 total)

You must be logged in to reply to this topic. Login to reply