How To Avoid MERGE in SQL Server

  • andrei_solntsev wrote:

    Sorry, but this script won't work well for large tables, especially with heavy use of IN() and NOT IN() operators. MERGE is usually faster than series of consecutive INSERT/UPDATE/DELETE statements.

    I've found the opposite, particularly for MERGE statements that just do INSERT/UPDATE, that putting the SQL in separate UPDATE/INSERT statements is much faster than a MERGE.

  • I agree - there is no real reason, to avoid MERGE. And if it seems to difficult to understand, you may need just some practice 🙂

    I use it very often on very large tables in our DWH (~20 TB) and if I would replace it by multiple INSERT / UPDATE / DELETE statements, everything would run slower.

    I know, there are several - many year old - articles, that say, that MERGE is buggy, but if you check the bug lists, it are only very special edge cases (e.g. using MERGE to delete something from tbl1, using the OUTPUT option *without* INTO but putting the whole MERGE in a subselect INTO tbl1, which is something nobody really would consider to do) :

    SELECT sub.*
    INTO tbl1
    FROM (MERGE INTO tbl1 AS t
    USING ... AS s
    ON s.key1 = t.key1
    WHEN ...
    WHEN NOT MATCHED
    THEN DELETE
    OUTPUT $action, Deleted.*
    ) as sub

    God is real, unless declared integer.

  • Here is a test that shows using a separate UPDATE/INSERT is faster than a MERGE.

    SET STATISTICS IO, TIME OFF

    if object_id('dbo.t1','U') IS NOT NULL drop table dbo.t1;
    if object_id('dbo.t2','U') IS NOT NULL drop table dbo.t2;
    go
    create table dbo.t1(id int primary key clustered not null , value varchar(20)) ;
    create table dbo.t2(id int primary key clustered not null , value varchar(20)) ;

    insert into dbo.t1
    select t.N, SUBSTRING(CONVERT(varchar(40), newid()), 1, 10)
    from dbo.fnTally(1,10000000) t;

    insert into dbo.t2
    select t.N, SUBSTRING(CONVERT(varchar(40), newid()), 1, 20)
    from dbo.fnTally(1,11000000) t;

    SET STATISTICS IO, TIME ON
    GO

    BEGIN TRANSACTION
    GO
    PRINT '************************************* Start Merge'
    MERGE
    INTO dbo.t1 AS tgt
    USING dbo.t2 as src
    ON src.id = tgt.id
    WHEN MATCHED THEN
    UPDATE SET tgt.value = src.value
    WHEN NOT MATCHED THEN
    INSERT (id, value) VALUES (src.id, src.value)
    ;
    PRINT '************************************* End Merge'
    GO
    ROLLBACK
    GO
    BEGIN TRANSACTION
    GO
    PRINT '************************************* Start Upsert'
    UPDATE x
    SET x.value = y.value
    FROM dbo.t1 x
    INNER JOIN dbo.t2 y
    ON y.id = x.id
    ;

    INSERT INTO t1(id, Value)
    SELECT x.Id, x.Value
    FROM dbo.t2 x
    WHERE NOT EXISTS(SELECT *
    FROM dbo.t1 y
    WHERE y.id = x.id)
    ;
    PRINT '************************************* End Upsert'
    GO
    ROLLBACK;
    ************************************* Start Merge

    SQL Server Execution Times:
    CPU time = 0 ms, elapsed time = 0 ms.
    Table 't1'. Scan count 1, logical reads 30080903, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
    Table 't2'. Scan count 1, logical reads 30004, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.

    SQL Server Execution Times:
    CPU time = 24266 ms, elapsed time = 25164 ms.

    (11000000 rows affected)
    ************************************* End Merge
    ************************************* Start Upsert

    SQL Server Execution Times:
    CPU time = 0 ms, elapsed time = 0 ms.
    Table 't1'. Scan count 1, logical reads 36299, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
    Table 't2'. Scan count 1, logical reads 27286, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.

    SQL Server Execution Times:
    CPU time = 11015 ms, elapsed time = 11272 ms.

    (10000000 rows affected)
    Table 't1'. Scan count 1, logical reads 3223826, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
    Table 't2'. Scan count 1, logical reads 30004, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
    Table 'Worktable'. Scan count 1, logical reads 2883241, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.

    SQL Server Execution Times:
    CPU time = 5469 ms, elapsed time = 5500 ms.

    (1000000 rows affected)
    ************************************* End Upsert

    The MERGE took 25 seconds compared to 16 seconds for the UPSERT.

    dbo.fnTally

     

  • Thank you everybody for your helpful comments.

    A little context here.

    As a database consultant I was asked by my client to perform a merge on 27 tables, one of which had 700 columns. Having never used MERGE before, I assumed (correctly or not) that snippets of code would be required for all these columns. To avoid this, I wrote a simple merge which knew nothing about the columns and it worked fine. Later, when it became apparent that MERGE in a SSIS package was required (corporate standard) I wrote a proc to auto-generate all the merges, and semi-automated the package as well. I have no issue with MERGE even though I've never written one so the article's title was a bit inflammatory. But it was a lot of fun writing my own merge.

  • Here is a test that shows using a separate MERGE is faster than a UPDATE/INSERT.

    If you change your data to a more realistic scenario, you will come to the opposite conclusion. It is not - for most people - realistic to update all existing 10,000,000 rows. With 1,000,000 updates and 1,000,000 inserts, MERGE 'wins'. Conclusions can only be made if the number of rows is known and is realistic.

    ************************************* Start Merge

    SQL Server Execution Times:

    CPU time = 18250 ms, elapsed time = 20696 ms.

    (2000001 rows affected)

    ************************************* End Merge

    ************************************* Start Upsert

    SQL Server Execution Times:

    CPU time = 14579 ms, elapsed time = 15593 ms.

    (1000001 rows affected)SQL Server Execution Times:

    CPU time = 12112 ms, elapsed time = 11685 ms.

    (1000000 rows affected)

    ************************************* End Upsert

  • carsten.saastamoinen wrote:

    Here is a test that shows using a separate MERGE is faster than a UPDATE/INSERT.

    If you change your data to a more realistic scenario, you will come to the opposite conclusion. It is not - for most people - realistic to update all existing 10,000,000 rows. With 1,000,000 updates and 1,000,000 inserts, MERGE 'wins'. Conclusions can only be made if the number of rows is known and is realistic.

    ************************************* Start Merge SQL Server Execution Times: CPU time = 18250 ms, elapsed time = 20696 ms.

    (2000001 rows affected) ************************************* End Merge ************************************* Start Upsert SQL Server Execution Times: CPU time = 14579 ms, elapsed time = 15593 ms.

    (1000001 rows affected)SQL Server Execution Times: CPU time = 12112 ms, elapsed time = 11685 ms.

    (1000000 rows affected) ************************************* End Upsert

    Where is the test??

  • I use your script and just show the result.

  • carsten.saastamoinen wrote:

    I use your script and just show the result.

    Ok data setup with the following rows:

    insert into dbo.t1
    select t.N, SUBSTRING(CONVERT(varchar(40), newid()), 1, 10)
    from dbo.fnTally(1,10000000) t;

    insert into dbo.t2
    select t.N, SUBSTRING(CONVERT(varchar(40), newid()), 1, 20)
    from dbo.fnTally(1,1000000) t;

    insert into dbo.t2
    select t.N+10000000, SUBSTRING(CONVERT(varchar(40), newid()), 1, 20)
    from dbo.fnTally(1,1000000) t;

    Results MERGE:

    ************************************* Start Merge

    Table 't1'. Scan count 1, logical reads 3090684, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
    Table 't2'. Scan count 1, logical reads 9210, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.

    SQL Server Execution Times:
    CPU time = 4750 ms, elapsed time = 4769 ms.

    (2000000 rows affected)
    ************************************* End Merge

    Results UPSERT:

    ************************************* Start Upsert

    Table 't1'. Scan count 1, logical reads 41520, physical reads 0, page server reads 0, read-ahead reads 7, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
    Table 't2'. Scan count 1, logical reads 4623, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.

    SQL Server Execution Times:
    CPU time = 2047 ms, elapsed time = 2069 ms.

    (1000000 rows affected)
    Table 't1'. Scan count 5, logical reads 3235233, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
    Table 't2'. Scan count 5, logical reads 9340, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.

    SQL Server Execution Times:
    CPU time = 3095 ms, elapsed time = 2033 ms.

    (1000000 rows affected)
    ************************************* End Upsert

    So MERGE 4769 ms

    UPDATE + INSERT 2069 ms + 2033 ms = 4102 ms

    So it is still faster.

     

     

  • Try to look at the CPU time

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

  • carsten.saastamoinen wrote:

    Try to look at the CPU time 

    Try looking up the word faster.

  • YES! And the elapsed time will change more than CPU time on a production server with many users.

  • carsten.saastamoinen wrote:

    YES! And the elapsed time will change more than CPU time on a production server with many users.

    I hope I don't sound too critical but that's not really true, production database servers are generally I/O bound because the performance of a database server is often limited by the speed at which data can be read from or written to the storage devices. This is especially true for database servers that handle large amounts of data or perform complex queries, which can require accessing multiple disk blocks and performing disk seeks. The CPU utilization of a database server is typically not the bottleneck, because modern CPUs are generally fast enough to handle the processing required for most database operations. Instead, the server's performance is limited by the speed at which it can access data from disk, which is typically slower than the CPU speed.

    The Update/Insert is still faster on my machine than the Merge, so I was wondering what data did you use to set up your test?

Viewing 12 posts - 16 through 26 (of 26 total)

You must be logged in to reply to this topic. Login to reply