Removing Identical Duplicate Rows

Will Assaf, 2012-02-28 (first published: 2012-02-21)

Deleting duplicate rows out of a table can be tricky. A brute force way to do this is with a TOP 1 and a cursor, but clearly there are set-based ways to accomplish this. Using the ROW_NUMBER function with an appropriate OVER is fast and simple to understand.

This is a simple lab you can run to test it out.

use sandboxdb
go

drop table wdupholder
go

create table wdupholder
( text1 varchar(10) not null
, text2 varchar(10) not null
)

go

insert into wdupholder (text1, text2) values ('test1','test1')
,('test1','test1')
,('test2','test2')
,('test2','test2')
,('test3','test3')
,('test3','test3')
,('test3','test3')
,('test3','test3')

go

select * from wdupholder

Note that we have a number of rows that are completely identical to eachother - no way to tell them apart.

delete a from

(select DupRank = row_number() OVER (Partition By text1, text2 order by text1 asc)

from wdupholder) a

where duprank > 1
go

select * from wdupholder
go

drop table wdupholder
go

In that case, we had no way to tell the duplicate records apart. In this next scenario, our job is easier. We have duplicate records but we have an integer column that may have been some sort of autoincrement or serialized value. In your database, this may be a [createddate] datetime field that could be used to differentiate duplicate fields.

We want to delete all but the lowest value for the column [int1] of any set of duplicate rows. This is easy to do by using the integer column in the ORDER BY of the OVER. We could delete all but the most recent values, based on [int1], by changing the ORDER BY to desc.

You should be able to apply this simple lab to your environment.

create table wdupholder
( text1 varchar(10) not null
, text2 varchar(10) not null
, int1 int null
)

go

insert into wdupholder (text1, text2, int1) values ('test1','test1','1')
,('test1','test1','2')
,('test2','test2','3')
,('test2','test2','4')
,('test3','test3','5')
,('test3','test3','6')
,('test3','test3','7')
,('test3','test3',null)