Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

Removing Identical Duplicate Rows

Deleting duplicate rows out of a table can be tricky.  A brute force way to do this is with a TOP 1 and a cursor, but clearly there are set-based ways to accomplish this.  Using the ROW_NUMBER function with an appropriate OVER is fast and simple to understand.


This is a simple lab you can run to test it out.  



use sandboxdb
go 
drop table wdupholder
go 
create table wdupholder
( text1 varchar(10) not null
, text2 varchar(10) not null
)  
go 
insert into wdupholder (text1, text2) values ('test1','test1')
,('test1','test1')
,('test2','test2')
,('test2','test2')
,('test3','test3')
,('test3','test3')
,('test3','test3')
,('test3','test3') 
go 
select * from wdupholder
Note that we have a number of rows that are completely identical to eachother - no way to tell them apart.
delete a from 
(select DupRank = row_number() OVER (Partition By text1, text2 order by text1 asc) 
from wdupholder) a
where duprank > 1
go 
select * from wdupholder
go 
drop table wdupholder
go
In that case, we had no way to tell the duplicate records apart.  In this next scenario, our job is easier.  We have duplicate records but we have an integer column that may have been some sort of autoincrement or serialized value.  In your database, this may be a [createddate] datetime field that could be used to differentiate duplicate fields. 


We want to delete all but the lowest value for the column [int1] of any set of duplicate rows.  This is easy to do by using the integer column in the ORDER BY of the OVER.  We could delete all but the most recent values, based on [int1], by changing the ORDER BY to desc.


You should be able to apply this simple lab to your environment.  
create table wdupholder
( text1 varchar(10) not null
, text2 varchar(10) not null
, int1 int null
) 
go 
insert into wdupholder (text1, text2, int1) values ('test1','test1','1')
,('test1','test1','2')
,('test2','test2','3')
,('test2','test2','4')
,('test3','test3','5')
,('test3','test3','6')
,('test3','test3','7')
,('test3','test3',null) 
go 
select * from wdupholder
go 
delete a from 
(select DupRank = row_number() OVER (Partition By text1, text2 order by int1 asc) 
from wdupholder ) a 
where duprank > 1
go 
select * from wdupholder

Note the behavior of the NULL value in the int1 column, and be aware of NULL values in your own dataset.


Comments

Leave a comment on the original post [www.sqltact.com, opens in a new window]

Loading comments...