February 3, 2007 at 10:15 pm
This is pretty easy for most of you, but i need a little help if i can get it.
I have a table with duplicate records. no one column will tell me if my record is a duplicate.
It takes comparing 4 different columns in the table to distinguish which records are duplicates
I am trying to put a unique key on it for the 4 columns, but need to remove the duplicates first.
how can i create a query to do this. I know its possible, i just am not sure how to do it.
This is basically what my data looks like:
column1 column2 column3 column4
37 aa bb cc
36 aa bb cc
37 aa bc cc
36 aa bb cd
This is my theory on how it would work.
delete from table where (rowid)
not in
(select min(rowid), column1, column2, column3, column4 from table group by column1, column2, column3, column4)
February 4, 2007 at 5:39 am
This article should be helpful
http://www.sqlservercentral.com/columnists/sjones/removeduplicate.asp
it is also possible to remove duplicates rows using CTEs in sql server 2005
WITH
Dups AS
(
SELECT *,
ROW_NUMBER
() OVER(PARTITION BY col1, col2, col3, col4 ORDER BY col1, col2, col3, col4) AS rn
FROM dbo.Employees
)
delete
from dups where rn > 1;
hth
David
February 5, 2007 at 9:50 pm
The only problem with your DELETE query is the column list in the subquery. It would work like this:
delete from table where (rowid)
not in
(select min(rowid) from table group by column1, column2, column3, column4)
Not that there's anything wrong with the CTE version, mind you.
Viewing 3 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply