Merge documentation misleading?

Question

Post reply

Merge documentation misleading?

John Rees

Mr or Mrs. 500

Points: 571
More actions
October 23, 2013 at 1:24 pm

#395715
I find some statements in the documentation for Merge confusing and potentially misleading, and would like to know if you agree, and if so, what a better description would be.
The two BOL pages and quotes are:
MERGE (Transact-SQL)
ON <merge_search_condition>
Specifies the conditions on which <table_source> is joined with target_table to determine where they match.
:exclamation:Caution
It is important to specify only the columns from the target table that are used for matching purposes. That is, specify columns from the target table that are compared to the corresponding column of the source table. Do not attempt to improve query performance by filtering out rows in the target table in the ON clause, such as by specifying AND NOT target_table.column_x = value. Doing so may return unexpected and incorrect results.
Inserting, Updating, and Deleting Data by Using MERGE
Search Condition Guidelines
The search conditions used to match the source and target rows and the additional search conditions used to filter rows from either the source or target must be specified correctly to ensure that correct results are obtained. We recommend following these guidelines:
Specify only search conditions in the ON <merge_search_condition> clause that determine the criteria for matching data in the source and target tables. That is, specify only columns from the target table that are compared to the corresponding columns of the source table.
Do not include comparisons to other values such as a constant.
It seems to me that the bolded statements above (my emphasis) imply that something "unexpected" or "incorrect" may happen if these suggestions are not followed. It reminds me of the caveats about using joins in UPDATE statements which can produce undefined results.
However, the examples I've seen don't back up this superstition, and I'd be really grateful if someone could show me an example of something actually unexpected.
Of course, to be unexpected, I have to define what I expect, so here goes:
Assuming the statement is
MERGE Target AS T
USING Source AS S
ON <merge_search_condition>
WHEN MATCHED
THEN UPDATE SET T.name = S.name
WHEN NOT MATCHED BY SOURCE
THEN DELETE
WHEN NOT MATCHED BY TARGET
THEN INSERT(name) VALUES(S.name)
then I expect all the following rules will be executed
For every row of T that has one row of S where <merge_search_condition> is TRUE then execute UPDATE.
For every row of T that has more than one row of S where <merge_search_condition> is TRUE then return ERROR.
For every row of T that has zero rows of S where <merge_search_condition> is TRUE then execute DELETE.
For every row of S that has zero rows of T where <merge_search_condition> is TRUE then execute INSERT
As far as I can see, it doesn't matter what kind of condition is in <merge_search_condition>. Whether it is simply primary key comparisons, or comparisons with a constant (warned against in the BOL quote), or any other kind of condition, I don't see why the result should be "unexpected". It seems that the meanings of WHEN MATCHED, WHEN NOT MATCHED BY TARGET, and WHEN NOT MATCHED BY SOURCE are quite well defined (assuming I have my logic above correct).
Now, of course if a developer puts in the condition: ON T.id = S.id AND S.name = 'john'
then I understand that the delete will essentially be
DELETE FROM T WHERE NOT EXISTS (SELECT * FROM S WHERE T.id = S.id and S.name = 'john')
This may not be what they wanted, but surely it is exactly what is expected.
To get to the point: I'm proposing that the warning about "unexpected" results is misleading, and that it would be more truthful if BOL simply stated that developers should be careful to understand the logic of MATCHING and NOT MATCHING, which are in my opinion actually very intuitive.
In my opinion these warnings about Merge are as misleading as this hypothetical statement about LEFT JOIN:
LEFT JOIN : ON <join_condition>
:exclamation:Caution
It is important to specify only the columns from the left and right table_sources . That is, specify columns from the left table that are compared to the corresponding column of the right table. Do not attempt to improve query performance by filtering out rows in the ON clause, such as by specifying AND NOT right_table.column_x = value. Doing so may return unexpected and incorrect results.
Even though the conditions of a left join normally just compare columns, there is nothing wrong with having extra conditions, as long as you understand the logic of left join. Same with Merge IMHO.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply

Hunchback SSCrazy Points: 2649 More actions · Answer 1

Here is a small script and the result from them is not the same based on the suggestions in BOL.

Merge both tables but update column [c2] in the target only if c2 = 'C'.

SET NOCOUNT ON;

USE tempdb;

GO

DECLARE @T1 TABLE (

c1 int NOT NULL PRIMARY KEY,

c2 varchar(15) NOT NULL

);

DECLARE @T2 TABLE (

c1 int NOT NULL PRIMARY KEY,

c2 varchar(15) NOT NULL UNIQUE

);

INSERT INTO @T1 (c1, c2)

VALUES (1, 'A'), (2, 'C');

INSERT INTO @T2 (c1, c2)

VALUES (1, 'AA'), (2, 'B'), (3, 'C');

SELECT * FROM @T1;

SELECT * FROM @T2;

MERGE INTO @T1 AS T1

USING @T2 AS T2

ON T1.c1 = T2.c1 AND T1.c2 = 'C'

WHEN MATCHED THEN

UPDATE SET T1.c2 = T2.c2

WHEN NOT MATCHED BY SOURCE THEN

DELETE

WHEN NOT MATCHED BY TARGET THEN

INSERT (c1, c2) VALUES (T2.c1, T2.c2);

SELECT * FROM @T1;

SELECT * FROM @T2;

GO

SET NOCOUNT ON;

USE tempdb;

GO

DECLARE @T1 TABLE (

c1 int NOT NULL PRIMARY KEY,

c2 varchar(15) NOT NULL

);

DECLARE @T2 TABLE (

c1 int NOT NULL PRIMARY KEY,

c2 varchar(15) NOT NULL UNIQUE

);

INSERT INTO @T1 (c1, c2)

VALUES (1, 'A'), (2, 'C');

INSERT INTO @T2 (c1, c2)

VALUES (1, 'AA'), (2, 'B'), (3, 'C');

SELECT * FROM @T1;

SELECT * FROM @T2;

MERGE INTO @T1 AS T1

USING @T2 AS T2

ON T1.c1 = T2.c1

WHEN MATCHED AND T1.c2 = 'C' THEN

UPDATE SET T1.c2 = T2.c2

WHEN NOT MATCHED BY SOURCE THEN

DELETE

WHEN NOT MATCHED BY TARGET THEN

INSERT (c1, c2) VALUES (T2.c1, T2.c2);

SELECT * FROM @T1;

SELECT * FROM @T2;

GO

Notice that in the first script we still update the tuple (1, 'A') even though there is no match by the conditions expressed as part of the ON subclause.

What BOL is enphasizing is that in the ON clasue we should use just the columns that relate both sets and any extra condition should be used in the <clause_search_condition> part of the subbranches.

John Rees Mr or Mrs. 500 Points: 571 More actions · Answer 2

Thanks for the script, but I don't think your description is quite right:

Notice that in the first script we still update the tuple (1, 'A') even though there is no match by the conditions expressed as part of the ON subclause.

What is actually happening is that (1, 'A') is not matched by the source in @T2 so it is deleted, and then (1, 'AA') and (3, 'C') in the source are not matched in the target, so they are inserted.

It seems that (1, 'A') has been updated, but actually it has been deleted, and then (1, 'AA') inserted.

So the matching rules are being followed exactly as I would expect. It's just that the logic needs to be carefully understood when using "WHEN NOT MATCHED BY SOURCE" and "WHEN NOT MATCHED BY TARGET" rules in a statement that uses a non-trivial search condition.

This behaviour can be seen by adding an OUTPUT clause to your Merge:

SET NOCOUNT ON;

USE tempdb;

GO

DECLARE @T1 TABLE (

c1 int NOT NULL PRIMARY KEY,

c2 varchar(15) NOT NULL

);

DECLARE @T2 TABLE (

c1 int NOT NULL PRIMARY KEY,

c2 varchar(15) NOT NULL UNIQUE

);

INSERT INTO @T1 (c1, c2)

VALUES (1, 'A'), (2, 'C');

INSERT INTO @T2 (c1, c2)

VALUES (1, 'AA'), (2, 'B'), (3, 'C');

SELECT * FROM @T1;

SELECT * FROM @T2;

MERGE INTO @T1 AS T1

USING @T2 AS T2

ON T1.c1 = T2.c1 AND T1.c2 = 'C'

WHEN MATCHED THEN

UPDATE SET T1.c2 = T2.c2

WHEN NOT MATCHED BY SOURCE THEN

DELETE

WHEN NOT MATCHED BY TARGET THEN

INSERT (c1, c2) VALUES (T2.c1, T2.c2)

OUTPUT $action,

deleted.c1 as deleted_c1, deleted.c2 as deleted_c2,

inserted.c1 as inserted_c1, inserted.c2 as inserted_c2

;

SELECT * FROM @T1;

SELECT * FROM @T2;

Produces this OUTPUT from the Merge:

$action deleted_c1 deleted_c2 inserted_c1 inserted_c2

DELETE 1 A NULL NULL

INSERT NULL NULL 1 AA

UPDATE 2 C 2 B

INSERT NULL NULL 3 C

Hunchback SSCrazy Points: 2649 More actions · Answer 3

You are right and that was a bad choice from my side so I apologize for that.

The way I see this warning is that the "unexpected" is more related to the way we use to exclude rows when using this type of condition in the ON sub-clause on the SELECT statement. The behavior in the MERGE statement is different and these extra conditions are not used to exclude rows from any side, applying the conditions to all rows in both sides depending on the WHEN parts used in the statement.

John Rees Mr or Mrs. 500 Points: 571 More actions · Answer 4

Yes, I agree that for most SELECT statements the ON clause is seen as a filtering condition, particularly for inner joins when the ON condition is equivalent to the WHERE condition.

Then your comment about "The behavior in the MERGE statement is different" got me thinking about why it's different. It occured to me that the function of the ON clause in MERGE seems very similar to the ON clause in a FULL OUTER JOIN.

If we equate the left table in a FULL OUTER JOIN to the MERGE target, then this table shows the comparison between the behaviour of the two statements assuming they use the same ON clause.

Left rows Right rows FULL OUTER JOIN MERGE

>= 1 1 combine both execute WHEN MATCHED

1 0 right fields are null execute WHEN NOT MATCHED BY SOURCE

0 1 left fields are null execute WHEN NOT MATCHED BY TARGET

1 > 1 combine both error

So maybe it's true to say that using a funky ON clause in a MERGE requires the same care and understanding as using it in a FULL OUTER JOIN??

Hunchback SSCrazy Points: 2649 More actions · Answer 5

John,

Your thinking is correct but it get complicated depending on which sub-branch was specified as part of the statement. There is a comprehensive list in this article / topic in BOL.

Inserting, Updating, and Deleting Data by Using MERGE

http://technet.microsoft.com/en-us/library/bb522522(v=SQL.105).aspx

John Rees Mr or Mrs. 500 Points: 571 More actions · Answer 6

hunchback (10/25/2013)
Your thinking is correct but it get complicated depending on which sub-branch was specified as part of the statement. There is a comprehensive list in this article / topic in BOL.
Inserting, Updating, and Deleting Data by Using MERGE
http://technet.microsoft.com/en-us/library/bb522522(v=SQL.105).aspx

Thanks for the link.

To be honest I, think that this article is really an example of what I don't like about the MERGE docs. It hopelessly complicates the understanding of how to use MERGE by describing its implementation in terms of 5 (!) different join types depending on which clauses appear in the MERGE. That may be important for performance issues, but is not appropriate for understanding the logic of MERGE.

Thanks for all your replies. It was good to have someone with some counter arguments. I thought this topic might stir up a few more opinions, but I guess it turned out just to be a pet peeve.

Venkataraman R SSChasing Mays Points: 634 More actions · Answer 7

The dangerous side of MERGE talks about how MERGE statement with wrong usage can wipe out data: http://www.sqlservercentral.com/articles/MERGE/97867/#.Um5Zwk6LY0Q.twitter

John Rees Mr or Mrs. 500 Points: 571 More actions · Answer 8

Venkataraman R (10/28/2013)
The dangerous side of MERGE talks about how MERGE statement with wrong usage can wipe out data: http://www.sqlservercentral.com/articles/MERGE/97867/#.Um5Zwk6LY0Q.twitter

I saw that article (you'll probably notice a rather familiar reply in that discussion 😉 ) and although it points out an interesting technique using a CTE, it also completely misuses the logic of the ON clause coupled with "WHEN NOT MATCHED BY SOURCE", and then tries to fix it.

I just don't understand why MERGE seems to get so many warnings about how it can be misued, when I don't think it is any more or less dangerous than making a mistake when writing a DELETE which uses a correlated subquery.

The warnings kind of distract the reader from understanding the fairly intuitive logic (although admittedly cumbersome syntax!)