Duplicate Records

  • Table has duplicate SSN. I want keep ony one entry for per SSN. Any duplicate entry want to delete. How can I do this? There is 9000 rows in the table.

    Thanks

  • Something like this..

    CREATE TABLE SSNTemp (

     SSN char (10) NOT NULL

    )

    CREATE UNIQUE

      INDEX [IX_SSNTemp] ON SSNTemp ([SSN])

    WITH

        IGNORE_DUP_KEY

    insert into SSNTemp

    select ssn from yourssntable

    delete from yourssntable

    insert into yourssntable

    select ssn from SSNTemp

     

  • Try this script in Query analyser

    BEGIN TRANSACTION T1

    SELECT DISTINCT * INTO TempTable FROM MainTable GROUP BY SSN HAVING COUNT(*) > 1

    DELETE MainTable WHERE SSN IN (SELECT SSN From TempTable)

    INSERT INTO MainTable SELECT * FROM TempTable

    COMMIT TRANSACTION

     

  • Hi,

    See this article..

    http://www.sqlservercentral.com/columnists/ccubley/findinganddeletingduplicatedata.asp

     

    Cheerz

    K.Senthil Kumar

     

     

     

  • --Step 1

    --if your SSNtable do not have a unique key then

    Alter table SSNTable

    add Unique_ID int identity

    --Step 2

    Select max(Unique_ID)Unique_ID,SSN into Temp_Duplicate_SSNtable

    from SSNtable,

    (

    select count(SSN)CountSSN,SSN

    from SSNtable

    group by SSN

    having count(SSN)> 1

    )a

    where a.SSN = SSNtable.SSN

    group by SSN

    --Step 3

    Delete

    from SSNtable

    where Unique_ID in (select Unique_ID from Temp_Duplicate_SSNtable)

    --Step 4

    Drop table Temp_Duplicate_SSNtable

    --Step 5 this must give you no results

    select count(SSN)CountSSN,SSN

    from SSNtable

    group by SSN

    having count(SSN)> 1

    --You can drop the Unique_ID field but I suggest you keep it and add a constraint on your SSN field

    --so that in future no duplicates can be inserted

  • With so many requests for deleting duplicates coming to this forum, it seems like it would be nice if Microsoft provides a query on the lines of - delete duplicate from <table> on <key>

  • DELETE MyTable

    WHERE MyID NOT IN (

      SELECT MAX (MyID)

      FROM MyTable

      GROUP BY SSN)   --assuming MyID and SSN cannot be null

    Keep in mind this will delete the oldest record, and keep the newest.  If you want to keep the oldest record instead, replace the inner select w/ a SELECT MIN (). 

    This technique essentially does the same thing as the solution link that was posted above, only w/ a lot less code.  This sub-select example is less efficient than the delete..join, but if you are only doing this as a one-time data cleanup…

    Also, is the table in question acting as a parent table to other tables?  If so you will need to consolidate the child records under the parent record you want to save.  That is, unless your child records aren’t duplicates as well.

    Also, what process are you putting in place to see to it that this doesn’t continue to happen?

    Corie Curcillo
    MCT, MCDBA, MCSD

  • Thanks For the help.

  • you can try this:

    Create table #Test(Id int,dup_id int)

    GO

    Insert #Test

     Select 1,      100 union all -- Multi duplicates

     Select 2,      100 union all

     Select 3,      100 union all

     Select 4,      100 union all

     Select 5,      200 union all -- Single

     Select 6,     100 union all

     Select 7,     300 union all

     Select 8,     300 union all

     Select 9,     400 union all

     Select 10,     400

    GO

    DELETE a

    FROM #test a

    INNER JOIN #test b ON a.dup_id = b.dup_id

    AND a.id > b.id

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply