Deleting Duplicate Records

Amitesh Kumar, 2008-12-24 (first published: 2008-11-26)

Deleting duplicate records:

Concept:

Self join the table on all the columns that can possibly have same values and differentiate b/w the duplicate values based on the identity column. In the provided script I've taken Employees table which has some duplicate rows. The duplicacy is possible on EmployeeNo and EmployeeId. Id is a Identity column. TO find and delete the duplicate records, I join the Employee table with itself on the non-unique columns, EmployeeNo and EmployeeId in this case.

Contraints:

The table in consideration must have at least one non-unique column. If a table does not have any identity column then there are two possiblities :

i. Use the get row_num sql-query in conjunction with this script in place of unique field.

ii.

* Create a copy of the table using:

Select * into EmployeesCopy from Employees where 1 = 0

* Alter table to add an identity column in it.

* Insert all records from Employees to EmployeesCopy.

* Delete duplicates using sbove script implementation

* Delete all records from Employees

* Copy the new contents from EmployeesCopy to Employees

--Author: Amitesh Kumar --Submitted On: 26-Nov-2008 --Concept:Self join the table on all --the columns that can possibly have same values --and differentiate b/w the duplicate values --based on the identity column. --Find duplicate records --The result of this query is all the duplicate records whose Id is greater. select a.* from Employees a join Employees b on a.[EmployeeNo] = b.[EmployeeNo] AND a.[EmployeeID]= b.[EmployeeID] AND a.Id>b.Id --Delete duplicate records --The result of this query is deletion of all the duplicate records whose Id is greater. delete a.* from Employees a join Employees b on a.[EmployeeNo] = b.[EmployeeNo] AND a.[EmployeeID]= b.[EmployeeID] AND a.Id>b.Id