SQLServerCentral Article

How To Avoid MERGE in SQL Server

,

Introduction

This article shows how to script a basic merge between two tables, without using MERGE. This can be handy when you're merging a pair of tables with hundreds of columns, since the script doesn't know anything about them. This technique introduces and populates some auditing columns (IS_DELETED, IS_INSERTED, IS_UPDATED, TRANSACTION_DATE) for each record, and runs fast.

The script may be used to benchmark simple MERGEs for integrity and performance.

Example

Suppose we want to merge Table1 into Table2:

Table1
IDNAMEIS_DELETEDIS_INSERTEDIS_UPDATEDTRANSACTION_DATE
1A000NULL
2B000NULL
3CNEW000NULL
Table2
IDNAMEIS_DELETEDIS_INSERTEDIS_UPDATEDTRANSACTION_DATE
2B0001/1/1900
3C0001/1/1900
4D0001/1/1900

Here, ID is the key for both tables while NAME contains user data. The other columns audit the changes in Table2 for each merge, but they’re present in Table1 so we can use the EXCEPT operator.

Each table has a row missing from the other (i.e. missing key values), but they also intersect on two rows (i.e. common key values) where one of them has a user column (NAME) that’s been changed. So, row 1 (from Table1) will be inserted into Table2, row 4 will be soft-deleted (i.e. flagged as deleted from Table1), row 2 will remain unchanged and row 3 will be updated.

The script’s result looks like this, where TRANSACTION_DATE records the date where each row’s audit status has changed:

Table2
IDNAMEIS_DELETEDIS_INSERTEDIS_UPDATEDTRANSACTION_DATE
1A0102/27/2020
2B0001/1/1900
3CNEW0012/27/2020
4D1002/27/2020

Later, if you add row 4 to the first table and re-run the script, the delete flag will be turned off in the second table in favor of the update flag:

Table2
IDNAMEIS_DELETEDIS_INSERTEDIS_UPDATEDTRANSACTION_DATE
1A0102/27/2020
2B0001/1/1900
3CNEW0012/27/2020
4D0012/27/2020

This script has been tested on two tables with 700 columns and 250K rows (run time about 1 minute).

Although there’s only one user column in this example (NAME), you can add more columns at any time because the script doesn’t know anything about them.

The Algorithm

The merging process is quite simple.

Take two identical tables Source and Target, where Source will be merged into Target. Add the audit columns IS_DELETED, IS_INSERTED, IS_UPDATED, TRANSACTION_DATE to both tables. Select the same key for both of them so they can be joined. Assume that a previous merge has been performed, so the audit columns in Target have been set.

For the current merge, do the following:

  • Update the audit columns in Source so they agree with their counterparts in Target. That way both tables will agree on these columns (for joined rows) when the EXCEPT operator is invoked.
  • For those rows in Target that don’t join with Source, set IS_DELETED = 1.
  • For those rows in Source that don’t join with Target, set IS_INSERTED = 1.
  • Insert rows from Source that don’t join with Target. After that, no rows from Source are missing from Target and the newly-inserted records have their audit columns set properly.
  • Delete rows from Target that join with Source but differ in at least one column (use the EXCEPT operator to do this). Now we have rows in Source that are missing from Target.
  • Delete rows from Target that join with Source, but where IS_DELETED = 1. Now we have more rows in Source that are missing from Target. That’s because a row with IS_DELETED = 1 from the previous merge may now see its counterpart re-appear in Source but no column values have changed.
  • For those rows in Source that don’t join with Target, set IS_UPDATED = 1.
  • Insert rows from Source that don’t join with Target.

Script

Here is the script that will perform this process. You can test and run this yourself.

/*
--
-- Build and populate Source and Target tables.
-- Do this once and then run script.
--
IF OBJECT_ID('[dbo].[Table1]', 'U') IS NOT NULL
DROP TABLE [dbo].[Table1];
CREATE TABLE [dbo].[Table1](
[ID] [int] NOT NULL,
[NAME] varchar(10) NULL,
IS_DELETED BIT DEFAULT 0 NULL,
IS_INSERTED BIT DEFAULT 0 NULL,
IS_UPDATED BIT DEFAULT 0 NULL,
TRANSACTION_DATE datetime NULL,
PRIMARY KEY ([ID])
) ON [PRIMARY];
IF OBJECT_ID('[dbo].[Table2]', 'U') IS NOT NULL
DROP TABLE [dbo].[Table2];
CREATE TABLE [dbo].[Table2](
[ID] [int] NOT NULL,
[NAME] varchar(10) NULL,
IS_DELETED BIT DEFAULT 0 NULL,
IS_INSERTED BIT DEFAULT 0 NULL,
IS_UPDATED BIT DEFAULT 0 NULL,
TRANSACTION_DATE datetime NULL,
PRIMARY KEY ([ID])
) ON [PRIMARY];
INSERT INTO [dbo].[Table1]([ID],[NAME]) VALUES(1,'A');
INSERT INTO [dbo].[Table1]([ID],[NAME]) VALUES(2,'B');
INSERT INTO [dbo].[Table1]([ID],[NAME]) VALUES(3,'CNEW');
INSERT INTO [dbo].[Table2]([ID],[NAME]) VALUES(2,'B');
INSERT INTO [dbo].[Table2]([ID],[NAME]) VALUES(3,'C');
INSERT INTO [dbo].[Table2]([ID],[NAME]) VALUES(4,'D');
UPDATE [dbo].[Table2] SET TRANSACTION_DATE = '1900-01-01';
*/-- Display Source and Target tables.
SELECT * FROM [dbo].[Table1] -- Source
SELECT * FROM exddasf[dbo].[Table2] -- Target
-- Update IS_DELETED, IS_INSERTED, IS_UPDATED, TRANSACTION_DATE
-- in Source rows using corresponding Target rows.
-- This is done so that the EXCEPT operator (below) won't find
-- differences between corresponding rows with these audit columns.
UPDATE d1
SET
d1.IS_DELETED = d2.IS_DELETED,
d1.IS_INSERTED = d2.IS_INSERTED,
d1.IS_UPDATED = d2.IS_UPDATED,
d1.TRANSACTION_DATE = d2.TRANSACTION_DATE
FROM
[dbo].[Table1] d1
INNER JOIN
[dbo].[Table2] d2
ON
d1.[ID] = d2.[ID]
/*
SELECT * FROM [dbo].[Table1] -- Source
SELECT * FROM [dbo].[Table2] -- Target
*/-- Soft delete rows in Target that are missing from Source.
-- Update IS_DELETED, IS_INSERTED, IS_UPDATED, TRANSACTION_DATE.
UPDATE [dbo].[Table2]
SET
IS_DELETED = 1,
IS_INSERTED = 0,
IS_UPDATED = 0,
TRANSACTION_DATE = getdate()
WHERE [ID] NOT IN (SELECT [ID] FROM [dbo].[Table1])
/*
SELECT * FROM [dbo].[Table1] -- Source
SELECT * FROM [dbo].[Table2] -- Target
*/-- Set IS_INSERTED = 1 in Source for rows that are missing in Target.
-- This is done so that IS_INSERTED = 1 after insertion.
-- Also, update IS_DELETED, IS_UPDATED, TRANSACTION_DATE.
UPDATE [dbo].[Table1]
SET
IS_DELETED = 0,
IS_INSERTED = 1,
IS_UPDATED = 0,
TRANSACTION_DATE = getdate()
WHERE [ID] NOT IN (SELECT [ID] FROM [dbo].[Table2])
/*
SELECT * FROM [dbo].[Table1] -- Source
SELECT * FROM [dbo].[Table2] -- Target
*/-- Insert rows from Source that are missing in Target.
INSERT INTO [dbo].[Table2]
SELECT d1.* FROM [dbo].[Table1] d1
WHERE d1.[ID] NOT IN (SELECT [ID] FROM [dbo].[Table2])
/*
SELECT * FROM [dbo].[Table1] -- Source
SELECT * FROM [dbo].[Table2] -- Target
*/-- Delete rows in Target that will be replaced by insertions from Source.
-- These are the ones in Target whose associated rows in Source differ
-- in at least one column.
DELETE [dbo].[Table2] WHERE
[ID] IN
(
SELECT [ID] FROM
(
SELECT d2.* from [dbo].[Table2] d2
INNER JOIN [dbo].[Table1] d1 on d1.[ID] = d2.[ID]
EXCEPT (SELECT * from [dbo].[Table1])
) d
)
/*
SELECT * FROM [dbo].[Table1] -- Source
SELECT * FROM [dbo].[Table2] -- Target
*/-- Delete rows in Target that are present in Source and IS_DELETED = 1.
-- This will insure that those rows will also be updated.
-- Those rows may, or may not, have been deleted in the previous deletion.
DELETE [dbo].[Table2] WHERE
[ID] IN
(
SELECT [ID] FROM
(
SELECT d2.* from [dbo].[Table2] d2
INNER JOIN [dbo].[Table1] d1 on d1.[ID] = d2.[ID]
WHERE d2.IS_DELETED = 1
) d
)
/*
SELECT * FROM [dbo].[Table1] -- Source
SELECT * FROM [dbo].[Table2] -- Target
*/-- Set IS_UPDATED = 1 for rows in Source that will be inserted into Target.
-- This is done so that IS_UPDATED = 1 after insertion.
-- Also, update IS_DELETED, IS_INSERTED, TRANSACTION_DATE.
UPDATE [dbo].[Table1]
SET
IS_DELETED = 0,
IS_INSERTED = 0,
IS_UPDATED = 1,
TRANSACTION_DATE = getdate()
WHERE
[ID] NOT IN (SELECT [ID] FROM [dbo].[Table2])
/*
SELECT * FROM [dbo].[Table1] -- Source
SELECT * FROM [dbo].[Table2] -- Target
*/-- Insert rows from Source that are now missing in Target.
INSERT INTO [dbo].[Table2] SELECT d1.* FROM [dbo].[Table1] d1
WHERE d1.[ID] NOT IN (SELECT [ID] FROM [dbo].[Table2])
-- Display Target table.
SELECT * FROM [dbo].[Table2] -- Target
/*
To use your own tables, perform the following replacements in the script (including the brackets []).
[dbo] ---> [Your schema]
[Table1] ---> [Your source table]
[Table2] ---> [Your target table]
[ID] ---> [Your key]
Your own tables must have the following columns:
[Your key]
IS_DELETED BIT DEFAULT 0 NULL
IS_INSERTED BIT DEFAULT 0 NULL
IS_UPDATED BIT DEFAULT 0 NULL
TRANSACTION_DATE datetime NULL
where [Your key] is the key in both tables.
If your key is INT with Identify Specification = ON in [Table2], then you’ll need to set it OFF.
For a timestamp (rownum) column in [Table2], convert it to varchar or remove them from both tables.
*/

Rate

4.5 (6)

You rated this post out of 5. Change rating

Share

Share

Rate

4.5 (6)

You rated this post out of 5. Change rating