Ignore Duplicate Keys

Question

Ignore Duplicate Keys

Viewing 7 posts - 16 through 22 (of 22 total)

You must be logged in to reply to this topic. Login to reply

Paul White SSC Guru Points: 150468 More actions · Answer 1

dips056 (2/1/2010)
if u have primary key on table then just the trap the error by
@@ERROR
and generate error with RAISERROR ('Duplicate Record')

But the idea is to insert records that don't already exist. An error would roll back the entire insert operation. You're not suggesting inserting the rows one by one, are you?

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

BSavoie SSCertifiable Points: 6309 More actions · Answer 2

With EXCEPT, do all the columns have to match to be excluded? The EXCEPT names all the same columns as the select. I guess I would expect to only have to name the TranId column in the EXCEPT.

.

BSavoie SSCertifiable Points: 6309 More actions · Answer 3

WOW! Just got done comparing the NOT EXISTS vs. IGNORE_DUPS on a couple million rows on the real data, NOT EXISTS took about 9 mins., IGNORE_DUPS took 25 MINS. That's pretty dramatic!

Yes, I know 9 MINS. is still a long time!! There's some evil things going on in this thing to massage the data for a vendor. I hope I can make more improvements, but this is sure a great start!

.

Paul White SSC Guru Points: 150468 More actions · Answer 4

BSavoie (2/1/2010)
With EXCEPT, do all the columns have to match to be excluded? The EXCEPT names all the same columns as the select. I guess I would expect to only have to name the TranId column in the EXCEPT.

All columns participate in the comparison, see the documentation for EXCEPT and INTERSECT.

I included all the columns since they are required for the final INSERT.

You could do the EXCEPT just on TranID, but you would then have to join back to the input table to get the extra columns required for the insert. The effort saved in comparing the extra columns would have to exceed the cost of the extra join. This is less likely than you might think: you should see an anti-semi join in the query plan - note that the join is on TranID and the residual (the other columns) are only compared if the TranIDs match first.

You will see from the documentation that EXCEPT does an implicit DISTINCT on both inputs - but this can be optimized away if the optimizer can guarantee that both inputs are unique already (primary key or unique index). Just something to bear in mind when using EXCEPT and INTERSECT.

For that reason, I would expect method 2 (NOT EXISTS) to perform best. I would expect method 1 (MERGE) to be the close behind method 2. MERGE includes some extra logic in the plan, and really comes into its own if more than one action is required.

So, my recommendation, all things being equal, would be to go with NOT EXISTS rather than EXCEPT - but it does rather depend on the whole plan and your data.

Good luck with removing the evilness

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

nigel. SSChampion Points: 11636 More actions · Answer 5

How would a LEFT OUTER JOIN perform compared to the MERGE?

(for us lesser mortals trapped with older versions :-))

SELECT

t1.TranID, t1.TranDate, t1.TranCode, t1.Amount

from

#Transactions t1

LEFT OUTER JOIN

#VendorExport t2 on t2.TranID = t1.TranID

WHERE

t2.TranID IS NULL

Nigel
How to post questions

Paul White SSC Guru Points: 150468 More actions · Answer 6

nigel. (2/3/2010)
How would a LEFT OUTER JOIN perform compared to the MERGE?
(for us lesser mortals trapped with older versions :-))

Difficult to say. It would probably depend on the exact plan selected and the data distribution.

The MERGE uses the RIGHT OUTER JOIN equivalent of your join, but does a little extra work, as mentioned previously. The EXISTS and EXCEPT both use a right anti-semi join - which stops searching the input as soon as the first duplicate is found, but the EXCEPT may do extra comparisons in the residual. The EXISTS is potentially more efficient than a straight join, but as I say, it depends

I should probably stress that the textual wording and layout of the query is often not important, since the optimizer can transform provably equivalent requests to the same, or trivially different, query plan.

The key, as always, is to test.

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

nigel. SSChampion Points: 11636 More actions · Answer 7

Paul,

Thanks. Here's hoping I get a chance to install 2008 soon.

Nigel
How to post questions

Ignore Duplicate Keys

Cookies on SQLServerCentral