Ignore Duplicate Keys

  • dips056 (2/1/2010)


    if u have primary key on table then just the trap the error by

    @@ERROR

    and generate error with RAISERROR ('Duplicate Record')

    But the idea is to insert records that don't already exist. An error would roll back the entire insert operation. You're not suggesting inserting the rows one by one, are you?

  • With EXCEPT, do all the columns have to match to be excluded? The EXCEPT names all the same columns as the select. I guess I would expect to only have to name the TranId column in the EXCEPT.

    .

  • WOW! Just got done comparing the NOT EXISTS vs. IGNORE_DUPS on a couple million rows on the real data, NOT EXISTS took about 9 mins., IGNORE_DUPS took 25 MINS. That's pretty dramatic!

    Yes, I know 9 MINS. is still a long time!! There's some evil things going on in this thing to massage the data for a vendor. I hope I can make more improvements, but this is sure a great start!

    .

  • BSavoie (2/1/2010)


    With EXCEPT, do all the columns have to match to be excluded? The EXCEPT names all the same columns as the select. I guess I would expect to only have to name the TranId column in the EXCEPT.

    All columns participate in the comparison, see the documentation for EXCEPT and INTERSECT.

    I included all the columns since they are required for the final INSERT.

    You could do the EXCEPT just on TranID, but you would then have to join back to the input table to get the extra columns required for the insert. The effort saved in comparing the extra columns would have to exceed the cost of the extra join. This is less likely than you might think: you should see an anti-semi join in the query plan - note that the join is on TranID and the residual (the other columns) are only compared if the TranIDs match first.

    You will see from the documentation that EXCEPT does an implicit DISTINCT on both inputs - but this can be optimized away if the optimizer can guarantee that both inputs are unique already (primary key or unique index). Just something to bear in mind when using EXCEPT and INTERSECT.

    For that reason, I would expect method 2 (NOT EXISTS) to perform best. I would expect method 1 (MERGE) to be the close behind method 2. MERGE includes some extra logic in the plan, and really comes into its own if more than one action is required.

    So, my recommendation, all things being equal, would be to go with NOT EXISTS rather than EXCEPT - but it does rather depend on the whole plan and your data.

    Good luck with removing the evilness 🙂

  • How would a LEFT OUTER JOIN perform compared to the MERGE?

    (for us lesser mortals trapped with older versions :-))

    SELECT

    t1.TranID, t1.TranDate, t1.TranCode, t1.Amount

    from

    #Transactions t1

    LEFT OUTER JOIN

    #VendorExport t2 on t2.TranID = t1.TranID

    WHERE

    t2.TranID IS NULL

  • nigel. (2/3/2010)


    How would a LEFT OUTER JOIN perform compared to the MERGE?

    (for us lesser mortals trapped with older versions :-))

    Difficult to say. It would probably depend on the exact plan selected and the data distribution.

    The MERGE uses the RIGHT OUTER JOIN equivalent of your join, but does a little extra work, as mentioned previously. The EXISTS and EXCEPT both use a right anti-semi join - which stops searching the input as soon as the first duplicate is found, but the EXCEPT may do extra comparisons in the residual. The EXISTS is potentially more efficient than a straight join, but as I say, it depends 🙂

    I should probably stress that the textual wording and layout of the query is often not important, since the optimizer can transform provably equivalent requests to the same, or trivially different, query plan.

    The key, as always, is to test.

  • Paul,

    Thanks. Here's hoping I get a chance to install 2008 soon.

Viewing 7 posts - 16 through 21 (of 21 total)

You must be logged in to reply to this topic. Login to reply