need technique to find duplicates which are causing PK constraint failure during merge

  • I am merging two tables:

    TBLANIMAL (as T)

    animal varchar(40)

    haveit bit

    name varchar(30)

    ANIMAL (as A)

    animal varchar(40)

    haveit bit

    name varchar(30) Primary Key

    I am merging on T.animal = A.animal. The source table (T) has many duplicates in the name column, that can't be brought over to the target table (A) because it's name column is protected with a PK.

    I need to get rid of all potential duplicates so the merge can go through. What technique can I use to find all the rows, which satisfy the T.animal = A.animal condition and have duplicate name values in the source table . . . and get rid of them?

  • I need to get rid of all potential duplicates so the merge can go through. What technique can I use to find all the rows, which satisfy the T.animal = A.animal condition and have duplicate name values in the source table . . . and get rid of them?

    select name,COUNT(name) from sourcetable group by name having COUNT(*)>1

    This will give you the number of count with the name of those record which have more than one entry.

    then you can delete the duplicate record by leaving just one record of those duplicates.

    hope this help.

    ----------
    Ashish

  • thank you.

    Your solution is similar to one I tried to follow here: I have used this and found duplicates. however, there are so many, I don't know how to get rid of all of them. eg. some cases 126 rows with duplicates (and I am sure I want to delete, at this point).

    Do you have a technique for deleting duplicates?

  • When its coding, i dont hit my head in it. I simply use google for it and then customise the solution as per requirement.

    this article will give you clue -- http://support.microsoft.com/kb/139444

    or this approach is also good to try as written by Pinal

    http://blog.sqlauthority.com/2007/03/01/sql-server-delete-duplicate-records-rows/

    ----------
    Ashish

  • thanks. yes, the first link to msft is the one I was referring to. Thanks for Pinal's, I'll take a look.

  • You may want to use the MERGE statement, for example:

    CREATE TABLE TBLANIMAL(animal vaINSERT INTO TBLANIMAL

    SELECT 'Horse',0,'Giddyup' UNION ALL

    SELECT 'Elephant',1,'Bigone' UNION ALL

    SELECT 'Horse',0,'Giddyup' UNION ALL

    SELECT 'Elephant',1,'Bigone' UNION ALL

    SELECT 'Goat',0,'no name'

    CREATE TABLE ANIMAL(animal varchar(40),haveit bit,name varchar(30))

    Then use the MERGE statement:

    SET NOCOUNT ON;

    MERGE Animal AS target

    USING (SELECT DISTINCT(Name),Animal,haveit FROM TBLAnimal) AS source (Name,Animal,haveit)

    ON (target.Animal = Source.Animal AND target.haveit = Source.haveit AND Target.name = source.name)

    WHEN NOT MATCHED THEN

    INSERT (Animal,haveit,name)

    VALUES (Source.Animal,Source.haveit,Source.name);

    Result:

    SELECT * FROM Animal

    animal haveit name

    Elephant1 Bigone

    Goat 0 no name

    Horse 0 Giddyup

    Edited to add:

    You might want to take the time to watch this short video on the MERGE statement:

    http://www.sqlshare.com/player.aspx?vid=513

    If everything seems to be going well, you have obviously overlooked something.

    Ron

    Please help us, help you -before posting a question please read[/url]
    Before posting a performance problem please read[/url]

  • hxkresl (7/22/2011)


    I am merging two tables:

    TBLANIMAL (as T)

    animal varchar(40)

    haveit bit

    name varchar(30)

    ANIMAL (as A)

    animal varchar(40)

    haveit bit

    name varchar(30) Primary Key

    I am merging on T.animal = A.animal. The source table (T) has many duplicates in the name column, that can't be brought over to the target table (A) because it's name column is protected with a PK.

    I need to get rid of all potential duplicates so the merge can go through. What technique can I use to find all the rows, which satisfy the T.animal = A.animal condition and have duplicate name values in the source table . . . and get rid of them?

    IIRC, doesn't 2008 have a MERGE command that would easily take care of this for you?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 7 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply