duplicate record problem

Mark Samford

SSC Rookie

Points: 33
More actions
February 3, 2007 at 10:15 pm

#369096

This is pretty easy for most of you, but i need a little help if i can get it.
I have a table with duplicate records. no one column will tell me if my record is a duplicate.
It takes comparing 4 different columns in the table to distinguish which records are duplicates
I am trying to put a unique key on it for the 4 columns, but need to remove the duplicates first.
how can i create a query to do this. I know its possible, i just am not sure how to do it.
This is basically what my data looks like:
column1   column2   column3   column4
37             aa           bb           cc
36             aa           bb            cc
37             aa           bc            cc
36              aa           bb            cd
This is my theory on how it would work.
delete from table where (rowid)
    not in
     (select min(rowid), column1, column2, column3, column4 from table group by column1, column2, column3, column4)
David Scotland-132255

SSCertifiable

Points: 5599
More actions
February 4, 2007 at 5:39 am

#687642

This article should be helpful
http://www.sqlservercentral.com/columnists/sjones/removeduplicate.asp

it is also possible to remove duplicates rows using CTEs in sql server 2005
WITH
Dups AS
(
SELECT *,
ROW_NUMBER
() OVER(PARTITION BY col1, col2, col3, col4 ORDER BY col1, col2, col3, col4) AS rn
FROM dbo.Employees
)
delete
from dups where rn > 1;
hth
David
Scott Coleman

One Orange Chip

Points: 27452
More actions
February 5, 2007 at 9:50 pm

#687907

The only problem with your DELETE query is the column list in the subquery. It would work like this:
delete from table where (rowid)
not in
(select min(rowid) from table group by column1, column2, column3, column4)
Not that there's anything wrong with the CTE version, mind you.

Viewing 3 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply