Blog Post

Deleting Duplicates without a unique key

,

The following was sent to me by my friend and colleague Dave Miller:

Dave's Email:

Wanted to pass along something I hadn't used before and found useful to easily get rid of duplicates in a set of data. The functionality has existed in the SQL language and was supported in SQL Server 2005. This uses Common Table Expressions (CTE) and the ROW_NUMBER() function.

The PARTITION BY portion of the statment specifies when to reset the row number, in my example I had:

  PersonId    Type    PNumber   
A1123 
A1345 
B1123***DELETE 
A2123 
B1123 
B1123***DELETE 
A2123***DELETE 
B11234 


The following query will remove the duplicates:

WITH PersonPhones AS (
    SELECT ROW_NUMBER() OVER (PARTITION BY PersonId, PhonetypeId, PhoneNumber ORDER BY PersonId, PhonetypeId, PhoneNumber) AS GroupIndex,
        PersonId,
        PhonetypeId,
        PhoneNumber
    FROM Person.Phone pp
)
DELETE FROM PersonPhones WHERE GroupIndex > 1

The CTE would create a table with the following data, where anything that is a duplicate has a GroupIndexId > 1

  GroupIndex    PersonId    Type    PNumber  

 
1A1123 
1A1345 
1A2123 
2A2123***DELETE 
1B1123 
2B1123***DELETE 
3B1123***DELETE 
1B11234 


This method could be used to get the nth item of a group also.

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating