Merge statement with conditions

Question

Post reply

Merge statement with conditions

David92595

SSC Eights!

Points: 862
More actions
June 23, 2016 at 4:09 pm

#314663

Hello,
I am trying to merge a temp table (@TempTable) into a table(@MasterTable). I have 6 fields I need to compare if any of them have changed I need to update the table(@MasterTable).
Fields:
ID (I have a 1 to 1 database, so this ID will only exist once)
Event_Name
[EVENT]
Client_Name
Servicer_Code
Servicer_Name
MERGE INTO MasterTable as Target
USING (SELECT ID, Event_Name, [EVENT], Client_Name, Servicer_Code, Servicer_Name FROM @tempTable) as Source
ON (Target.ID = Source.ID)
WHEN Matched AND Target.Event_Name <> Source.Event_name OR Target.[EVENT] <> Source.[EVENT]
OR Target.Client_Name <> Source.Client_Name OR Target.Servicer_Code <> Source.Servicer_Code
OR Target.Servicer_Name <> Source.Servicer_Name
THEN
UPDATE SET Target.Event_Name = Source.Event_Name and Target.[EVENT] = Source.[EVENT]
AND Target.Client_Name = Source.Client_Name AND Target.Servicer_Code = Source.Servicer_Code
AND Target.Servicer_Name = Source.Servicer_Name
WHEN NOT MATCHED BY TARGET THEN
INSERT (ID, Event_Name, [EVENT],Client_Name, Servicer_Code, Servicer_Name)
values (Source.ID, Source.Event_Name, Source.[EVENT],Source.Client_Name, Source.Servicer_Code, Source.Servicer_Name)
If the ID matches then I need it to compare the other 5 fields, if any of them have changed, update the record. If ID does not exist, insert.
David92595

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply

Sergiy SSC Guru Points: 110206 More actions · Answer 1

Why do you need MERGE?

update T

set Event_Name = S.Event_Name ,

Client_Name = S.Client_Name,

[EVENT] = S.[EVENT],

Client_Name = S.Client_Name,

Servicer_Code = S.Servicer_Code,

Servicer_Name = S.Servicer_Name

FROM MasterTable T

INNER JOIN @tempTable S ON T.ID = S.ID

WHERE Target.Event_Name <> Source.Event_name

OR Target.[EVENT] <> Source.[EVENT]

OR Target.Client_Name <> Source.Client_Name

OR Target.Servicer_Code <> Source.Servicer_Code

OR Target.Servicer_Name <> Source.Servicer_Name

INSERT INTO MasterTable (Event_Name, [EVENT], Client_Name, Servicer_Code, Servicer_Name)

SELECT Event_Name, [EVENT], Client_Name, Servicer_Code, Servicer_Name

FROM @tempTable S

WHERE NOT EXISTS (select * from MasterTable T

WHERE T.ID = S.ID )

_____________
Code for TallyGenerator

David92595 SSC Eights! Points: 862 More actions · Answer 2

David92595

SSC Eights!

Points: 862

June 24, 2016 at 2:24 am

#1886665

Wow, way over thought that one...

Thank you!

R.P.Rozema SSChampion Points: 12307 More actions · Answer 3

Sergiy (6/23/2016)
Why do you need MERGE?

Don't know, maybe because it is easier to read (once you've gotten used to the merge syntax), does the same thing in a single statement and performs better? 😉

WITH cteSource as (

SELECT ID, Event_Name, [EVENT], Client_Name, Servicer_Code, Servicer_Name

FROM @tempTable

),

cteTarget as (

SELECT ID, Event_Name, [EVENT], Client_Name, Servicer_Code, Servicer_Name

FROM MasterTable mt

WHERE EXISTS (

SELECT *

FROM @temptable t

WHERE t.ID = mt.ID

)

MERGE INTO cteTarget t

USING cteSource s

ON (s.ID = t.ID)

WHEN Matched AND (

t.Event_Name <> s.Event_name

OR t.[EVENT] <> s.[EVENT]

OR t.Client_Name <> s.Client_Name

OR t.Servicer_Code <> s.Servicer_Code

OR t.Servicer_Name <> s.Servicer_Name

)

THEN

UPDATE

SET

t.Event_Name = s.Event_Name,

t.[EVENT] = s.[EVENT],

t.Client_Name = s.Client_Name,

t.Servicer_Code = s.Servicer_Code,

t.Servicer_Name = s.Servicer_Name

WHEN NOT MATCHED BY TARGET

THEN

INSERT (ID, Event_Name, [EVENT],Client_Name, Servicer_Code, Servicer_Name)

VALUES (s.ID, s.Event_Name, s.[EVENT],s.Client_Name, s.Servicer_Code, s.Servicer_Name);

Posting Data Etiquette - Jeff Moden[/url]
Posting Performance Based Questions - Gail Shaw[/url]
Hidden RBAR - Jeff Moden[/url]
Cross Tabs and Pivots - Jeff Moden[/url]
Catch-all queries - Gail Shaw[/url]

If you don't have time to do it right, when will you have time to do it over?

Phil Parkin SSC Guru Points: 246713 More actions · Answer 4

UPDATE is probably better than MERGE.

If you are a fan of MERGE, please take a read of this[/url] and decide whether, perhaps, you should reconsider.

And rather than using this type of construction:

UPDATE

...

WHERE

a.1 <> b.1 AND

a.2 <> b.2

I would suggest using

UPDATE

WHERE NOT EXISTS (

SELECT a.1, a.2

INTERSECT

SELECT b.1, b.2

)

because the latter version handles NULLs.

If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

drew.allen SSC Guru Points: 76945 More actions · Answer 5

Phil Parkin (6/24/2016)
UPDATE is probably better than MERGE.
If you are a fan of MERGE, please take a read of this[/url] and decide whether, perhaps, you should reconsider.
And rather than using this type of construction:
UPDATE
...
WHERE
a.1 <> b.1 AND
a.2 <> b.2
I would suggest using
UPDATE
WHERE NOT EXISTS (
SELECT a.1, a.2
INTERSECT
SELECT b.1, b.2
)
because the latter version handles NULLs.

I prefer using EXCEPT, simply because it closer to the way I, and probably most other people, think about this.

;

WITH Source AS (

SELECT ID, Event_Name, [EVENT], Client_Name, Servicer_Code, Servicer_Name

FROM @tempTable

EXCEPT

SELECT ID, Event_Name, [EVENT], Client_Name, Servicer_Code, Servicer_Name

FROM MasterTable

)

<your merge or update statement here>

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

Phil Parkin SSC Guru Points: 246713 More actions · Answer 6

Paul White suggests that the NOT EXISTS ... INTERSECT form produces better query plans than the EXCEPT form, and that's why I use it. See here (in the comments).

If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

David92595 SSC Eights! Points: 862 More actions · Answer 7

Thanks guys

As a work around I was comparing one field at a time. This knocked 200 lines of code out of my report and cut execution time in half!

drew.allen SSC Guru Points: 76945 More actions · Answer 8

Phil Parkin (6/24/2016)
Paul White suggests that the NOT EXISTS ... INTERSECT form produces better query plans than the EXCEPT form, and that's why I use it. See here (in the comments).

Further down in those same comments he says that you can use EXCEPT when you are not interested in both sides, which is the case here. We don't care about records that exist in the target, but not in the source in this particular case. The NOT EXISTS ... INTERSECT version will produce those records (depending on the initial join type), whereas the EXCEPT version will not.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

Sergiy SSC Guru Points: 110206 More actions · Answer 9

Phil Parkin (6/24/2016)
Paul White suggests that the NOT EXISTS ... INTERSECT form produces better query plans than the EXCEPT form, and that's why I use it. See here (in the comments).

No, it does not.

Before repeating after Paul White try to reproduce it yourself.

And don't be afraid to deviate a bit from the scripts provided by Paul.

The problem with Paul's suggestions is that he does not stop before skewing his test scripts to make the output fitting his agenda.

In the blog post you quoted he's provided all sorts of "alternative to INTERSECT" forms of queries, except one.

The one which is most obvious and is most commonly used for this kind of checks:

SELECT

*

FROM @Set1 AS t

JOIN @Set2 AS s ON

t.pk = s.pk

WHERE

NOT EXISTS

(

SELECT 1

WHERE

t.pk = s.pk

AND (t.ival = s.ival OR (t.ival IS NULL AND s.ival IS NULL))

AND (t.cval = s.cval OR (t.cval IS NULL AND s.cval IS NULL))

AND (t.mval = s.mval OR (t.mval IS NULL AND s.mval IS NULL))

)

Can you guess why?

Can you suspect he does not (did not at the time) know about this form of query?

Oh, no, you would not discredit him as an SQL guru. He certainly knows that.

But with this query in the set the whole point of using INTERSECT would disappear: it produces exactly the same plan, exactly the same output and, of course, exactly the same execution statistics.

So, NOT EXISTS ... INTERSECT is not any better than NOT EXISTS (... Equal... )

Apart from probably the shorter code.

But even the code would be of the same length if we'd set ANSI NULLS OFF:

SET ANSI_NULLS OFF

SELECT

*

FROM @Set1 AS t

JOIN @Set2 AS s ON

t.pk = s.pk

WHERE

NOT EXISTS

(

SELECT 1

WHERE

t.pk = s.pk

AND (t.ival = s.ival)

AND (t.cval = s.cval)

AND (t.mval = s.mval)

)

Which we cannot do because this setting is being deprecated and to be strictly blocked in newer versions of SQL Server.

So, the only thing which is different about INTERSECT that in an environment where ANSI NULLS standard is enforced it provides a "back door" for those developers who strongly believe that NULL=NULL.

Well, some would consider it as an advantage, I would not do it.

That's about "JOIN" kind of queries Paul was using in his test scripts.

But it applies strictly to "update" queries.

_____________
Code for TallyGenerator

Sergiy SSC Guru Points: 110206 More actions · Answer 10

But what if we need to identify possibly new records added to the set?

Then we need to use a bit different form of the queries:

SET STATISTICS IO, TIME, PROFILE ON

SELECT *

FROM @Set1 AS t

WHERE

NOT EXISTS (SELECT s.* FROM @Set2 AS s INTERSECT SELECT t.*)

SELECT *

FROM @Set1 AS t

WHERE

NOT EXISTS

(

SELECT 1

FROM @Set2 s

WHERE

t.pk = s.pk

AND (t.ival = s.ival OR (t.ival IS NULL AND s.ival IS NULL))

AND (t.cval = s.cval OR (t.cval IS NULL AND s.cval IS NULL))

AND (t.mval = s.mval OR (t.mval IS NULL AND s.mval IS NULL))

)

SELECT *

FROM @Set1 AS t

EXCEPT

SELECT *

FROM @Set2 s

SET STATISTICS IO, TIME, PROFILE OFF

If you run it you'll see:

- all 3 queries generate exactly the same output;

- all 3 queries result in exactly the same IO stats;

- time stats are different from time to time, and there is no consistent winner or loser amongst these 3 queries;

- execution plans are almost identical for all 3 queries, except INTERSECT query has an extra TOP 1 operation in it. It may be still below 1% overhead, but anyway - it's there. An overhead.

RowsExecutesStmtTextStmtIdNodeIdParentPhysicalOpLogicalOpArgumentDefinedValuesEstimateRowsEstimateIOEstimateCPUAvgRowSizeTotalSubtreeCostOutputListWarningsTypeParallelEstimateExecutions

41SELECT *

FROM @Set1 AS t

WHERE

NOT EXISTS (SELECT s.* FROM @Set2 AS s INTERSECT SELECT t.*)1110NULLNULLNULLNULL1NULLNULLNULL0.00657166NULLNULLSELECT0NULL

41 |--Nested Loops(Left Anti Semi Join, OUTER REFERENCES:([t].[pk], [t].[ival], [t].[cval], [t].[mval]))1121Nested LoopsLeft Anti Semi JoinOUTER REFERENCES:([t].[pk], [t].[ival], [t].[cval], [t].[mval])NULL104.18E-06280.00657166[t].[pk], [t].[ival], [t].[cval], [t].[mval]NULLPLAN_ROW01

61 |--Index Scan(OBJECT:(@Set1 AS [t]))1132Index ScanIndex ScanOBJECT:(@Set1 AS [t])[t].[pk], [t].[ival], [t].[cval], [t].[mval]10.0031250.0001581280.0032831[t].[pk], [t].[ival], [t].[cval], [t].[mval]NULLPLAN_ROW01

26 |--Top(TOP EXPRESSION:((1)))1142TopTopTOP EXPRESSION:((1))NULL101E-0790.00328438NULLNULLPLAN_ROW01

26 |--Clustered Index Seek(OBJECT:(@Set2 AS ), SEEK:(.[pk]=@Set1.[pk] as [t].[pk]), WHERE:(@Set2.[ival] as .[ival] = @Set1.[ival] as [t].[ival] AND @Set2.[mval] as .[mval] = @Set1.[mval] as [t].[mval] AND @Set2.[cval] as .[cval] = @Set1.[cval] as [t].[cval]) ORDERED FORWARD)1154Clustered Index SeekClustered Index SeekOBJECT:(@Set2 AS ), SEEK:(.[pk]=@Set1.[pk] as [t].[pk]), WHERE:(@Set2.[ival] as .[ival] = @Set1.[ival] as [t].[ival] AND @Set2.[mval] as .[mval] = @Set1.[mval] as [t].[mval] AND @Set2.[cval] as .[cval] = @Set1.[cval] as [t].[cval]) ORDERED FORWARDNULL10.0031250.0001581200.0032831NULLNULLPLAN_ROW01

RowsExecutesStmtTextStmtIdNodeIdParentPhysicalOpLogicalOpArgumentDefinedValuesEstimateRowsEstimateIOEstimateCPUAvgRowSizeTotalSubtreeCostOutputListWarningsTypeParallelEstimateExecutions

41SELECT *

FROM @Set1 AS t

WHERE

NOT EXISTS

(

SELECT 1

FROM @Set2 s

WHERE

t.pk = s.pk

AND (t.ival = s.ival OR (t.ival IS NULL AND s.ival IS NULL))

AND (t.cval = s.cval OR (t.cval IS NULL AND s.cval IS NULL))

AND (t.mval = s.mval OR (t.mval IS NULL AND s.mval IS NULL))

)1310NULLNULLNULLNULL1NULLNULLNULL0.00657156NULLNULLSELECT0NULL

41 |--Nested Loops(Left Anti Semi Join, OUTER REFERENCES:([t].[pk], [t].[ival], [t].[cval], [t].[mval]))1321Nested LoopsLeft Anti Semi JoinOUTER REFERENCES:([t].[pk], [t].[ival], [t].[cval], [t].[mval])NULL104.18E-06280.00657156[t].[pk], [t].[ival], [t].[cval], [t].[mval]NULLPLAN_ROW01

61 |--Index Scan(OBJECT:(@Set1 AS [t]))1332Index ScanIndex ScanOBJECT:(@Set1 AS [t])[t].[pk], [t].[ival], [t].[cval], [t].[mval]10.0031250.0001581280.0032831[t].[pk], [t].[ival], [t].[cval], [t].[mval]NULLPLAN_ROW01

26 |--Clustered Index Seek(OBJECT:(@Set2 AS ), SEEK:(.[pk]=@Set1.[pk] as [t].[pk]), WHERE:(@Set1.[ival] as [t].[ival] = @Set2.[ival] as .[ival] AND @Set1.[mval] as [t].[mval] = @Set2.[mval] as .[mval] AND @Set1.[cval] as [t].[cval] = @Set2.[cval] as .[cval]) ORDERED FORWARD)1342Clustered Index SeekClustered Index SeekOBJECT:(@Set2 AS ), SEEK:(.[pk]=@Set1.[pk] as [t].[pk]), WHERE:(@Set1.[ival] as [t].[ival] = @Set2.[ival] as .[ival] AND @Set1.[mval] as [t].[mval] = @Set2.[mval] as .[mval] AND @Set1.[cval] as [t].[cval] = @Set2.[cval] as .[cval]) ORDERED FORWARDNULL10.0031250.0001581200.0032831NULLNULLPLAN_ROW01

RowsExecutesStmtTextStmtIdNodeIdParentPhysicalOpLogicalOpArgumentDefinedValuesEstimateRowsEstimateIOEstimateCPUAvgRowSizeTotalSubtreeCostOutputListWarningsTypeParallelEstimateExecutions

41SELECT *

FROM @Set1 AS t

EXCEPT

SELECT *

FROM @Set2 s1510NULLNULLNULLNULL1NULLNULLNULL0.00657156NULLNULLSELECT0NULL

41 |--Nested Loops(Left Anti Semi Join, OUTER REFERENCES:([t].[pk], [t].[ival], [t].[cval], [t].[mval]))1521Nested LoopsLeft Anti Semi JoinOUTER REFERENCES:([t].[pk], [t].[ival], [t].[cval], [t].[mval])NULL104.18E-06280.00657156[t].[pk], [t].[ival], [t].[cval], [t].[mval]NULLPLAN_ROW01

61 |--Index Scan(OBJECT:(@Set1 AS [t]))1532Index ScanIndex ScanOBJECT:(@Set1 AS [t])[t].[pk], [t].[ival], [t].[cval], [t].[mval]10.0031250.0001581280.0032831[t].[pk], [t].[ival], [t].[cval], [t].[mval]NULLPLAN_ROW01

26 |--Clustered Index Seek(OBJECT:(@Set2 AS ), SEEK:(.[pk]=@Set1.[pk] as [t].[pk]), WHERE:(@Set1.[ival] as [t].[ival] = @Set2.[ival] as .[ival] AND @Set1.[mval] as [t].[mval] = @Set2.[mval] as .[mval] AND @Set1.[cval] as [t].[cval] = @Set2.[cval] as .[cval]) ORDERED FORWARD)1542Clustered Index SeekClustered Index SeekOBJECT:(@Set2 AS ), SEEK:(.[pk]=@Set1.[pk] as [t].[pk]), WHERE:(@Set1.[ival] as [t].[ival] = @Set2.[ival] as .[ival] AND @Set1.[mval] as [t].[mval] = @Set2.[mval] as .[mval] AND @Set1.[cval] as [t].[cval] = @Set2.[cval] as .[cval]) ORDERED FORWARDNULL10.0031250.0001581200.0032831NULLNULLPLAN_ROW01

I also tested with several extra rows added to table @Set1:

INSERT @Set1

(pk, ival, cval, mval)

VALUES

(1, 1000, 'a', $1),

(2, NULL, 'b', $2),

(3, 3000, 'c', NULL),

(4, NULL, 'd', $4),

(5, 5000, 'e', $5),

(6, 6000, 'e', $6),

(7, 7000, 'e', $7),

(8, 8000, 'e', $8),

(9, 9000, 'e', $9),

(10, 10000, 'e', $10),

(11, 11000, 'e', $11),

(12, 12000, 'e', $12);

;

Conclusion remains the same:

1. NOT EXISTS ... INTERSECT does not provide any advantage comparing to NOT EXISTS ...EQUAL or EXCEPT forms of queries;

2. It adds a small overhead of an additional TOP 1 operation when looking for newly added rows.

Which is a bit different from what Paul has come up with.

You may with to run those tests for yourself.

See which conclusions you'll support afterwards.

_____________
Code for TallyGenerator

Phil Parkin SSC Guru Points: 246713 More actions · Answer 11

Sergiy, thank you for taking the time to write these informative posts.

I admit that I am guilty of following Paul White's recommendations as would a sheep and now I shall be looking at things more closely, next time I have to develop code that does this sort of thing.

I still prefer the EXCEPT or INTERSECT versions, because of their inherent handling of NULLs & agree that the EXCEPT form is possibly more pleasing on the brain than the NOT EXISTS ... INTERSECT form.

If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

Sergiy SSC Guru Points: 110206 More actions · Answer 12

Phil Parkin (6/28/2016)
I still prefer the EXCEPT or INTERSECT versions, because of their inherent handling of NULLs & agree that the EXCEPT form is possibly more pleasing on the brain than the NOT EXISTS ... INTERSECT form.

That's your personal preferences, and you have every right to have it and follow it (unless you're in North Korea :-))

Functionally all 3 methods are the same, so it's other factors which play role in choosing one method over another.

Like with CTE - I prefer to use derived tables, some prefer CTE's, and neither side has a case to prove which method is better.

_____________
Code for TallyGenerator