http://www.sqlservercentral.com/blogs/robert_davis/2008/08/21/Deleting-Duplicates-without-a-unique-key/
Printed 2013/05/25 03:14AM
Deleting Duplicates without a unique key
The following was sent to me by my friend and colleague Dave Miller:
Dave's Email:
Wanted to pass along something I hadn't used before and found useful to easily get rid of duplicates in a set of data. The functionality has existed in the SQL language and was supported in SQL Server 2005. This uses Common Table Expressions (CTE) and the ROW_NUMBER() function.
The PARTITION BY portion of the statment specifies when to reset the row number, in my example I had:
| PersonId |
Type |
PNumber |
|
| A |
1 |
123 |
|
| A |
1 |
345 |
|
| B |
1 |
123 |
***DELETE |
| A |
2 |
123 |
|
| B |
1 |
123 |
|
| B |
1 |
123 |
***DELETE |
| A |
2 |
123 |
***DELETE |
| B |
1 |
1234 |
|
The following query will remove the duplicates:
WITH PersonPhones AS (
SELECT ROW_NUMBER() OVER (PARTITION BY PersonId, PhonetypeId, PhoneNumber ORDER BY PersonId, PhonetypeId, PhoneNumber) AS GroupIndex,
PersonId,
PhonetypeId,
PhoneNumber
FROM Person.Phone pp
)
DELETE FROM PersonPhones WHERE GroupIndex > 1
The CTE would create a table with the following data, where anything that is a duplicate has a GroupIndexId > 1
| GroupIndex |
PersonId |
Type |
PNumber
| |
| 1 |
A |
1 |
123 |
|
| 1 |
A |
1 |
345 |
|
| 1 |
A |
2 |
123 |
|
| 2 |
A |
2 |
123 |
***DELETE |
| 1 |
B |
1 |
123 |
|
| 2 |
B |
1 |
123 |
***DELETE |
| 3 |
B |
1 |
123 |
***DELETE |
| 1 |
B |
1 |
1234 |
|
This method could be used to get the nth item of a group also.
Copyright © 2002-2013 Simple Talk Publishing. All Rights Reserved. Privacy Policy. Terms of Use. Report Abuse.