Technical Article

DeDuping Tables with Duplicate Records


The script is in three parts:
1. A select statement to determine the number of duplicate records contained within the table
2. The deduping process builds a temporary table called #Duplicates and uses it to compare duplicates across both similar tables whilst deleting records with multiple counts
3. Final test of deduping with no records indicating table is clean of duplicates

The script is run against the database with the duplicate table. Ensure the following is replaced with your objects:
DuplicateTable >>> Your Table
Column1,2,3,... >>> Your Column Names
REQUIRED - The table MUST have an Identity Column
TableID >>> Your Table Identity Column

-- Script by Yomi Obe, Consultant SQL DBA
-- July 2007
-- Version 1.00

SELECT     COUNT(*) AS DuplicateCount, [Column1], [Column2], [Column3]
FROM         dbo.DuplicateTable
GROUP BY [Column1], [Column2], [Column3]
HAVING      (COUNT(*) >= 2)

Select T1.* from dbo.DuplicateTable as T1 join dbo.DuplicateTable as T2
on ltrim(rtrim(T1.[Column1])) = ltrim(rtrim(T2.[Column1])) and
ltrim(rtrim(T1.[Column2])) = ltrim(rtrim(T2.[Column2])) and
ltrim(rtrim(T1.[Column3])) = ltrim(rtrim(T2.[Column3]))
where T1.[TableID] <> T2.[TableID]


Select ltrim(rtrim([Column1])) as [Column1], ltrim(rtrim([Column2])) as [Column2], 
ltrim(rtrim([Column3])) as [Column3], 
DupCount=count(*), IDKey = max([TableID])
into #Duplicates
from dbo.DuplicateTable
group by 
[Column1], [Column2], [Column3]
having count(*) >= 2
order by count(*) DESC 

fromdbo.DuplicateTable DT join #Duplicates D
DT.[Column1] = D.[Column1] and
DT.[Column2] = D.[Column2] and
DT.[Column3] = D.[Column3]
whereDT.[TableID] not in (select IDKey from #Duplicates)

SELECT     COUNT(*) AS DuplicateCount, [Column1], [Column2], [Column3]
FROM         DuplicateTable
GROUP BY [Column1], [Column2], [Column3]
HAVING      (COUNT(*) >= 2)

drop table #Duplicates

PRINT 'If no records are returned on the last column, the de-duping is successful!'


5 (1)




5 (1)