Deleting duplicate records/rows except for 1

Question

Post reply

Deleting duplicate records/rows except for 1

adammenkes

Say Hey Kid

Points: 662
More actions
November 21, 2008 at 4:24 pm

#196567

I have searched the web for various ways to delete duplicate records, and I noticed various methods even within this forum. What I have come up with is a combination of ideas. Most importantly, the idea is to NOT use a CURSOR or TEMP TABLE with records that are to be manipulated.
I also have a question, if there is a way to improve this with large amounts of data to be deleted?
One thought would be to SET ROWCOUNT (@n - 1)
where '@n' is the number of duplicates - 1 so multiple records are deleted at a given time
However, I am open to suggestions.
Below is sample code to show duplicates, first on one column, and the second on two columns. For more columns, just add 'AND a.Column = b.Column' as well as adding the Column to the GROUP BY clause.
[font="Courier New"]
declare @dupes TABLE (
code varchar(10),
other varchar(20)
)
insert into @dupes values ('One', 'First One')
insert into @dupes values ('Two', 'Second Two')
insert into @dupes values ('Two', 'Second Two')
insert into @dupes values ('Three', 'Third Three')
insert into @dupes values ('Three', 'Third Three')
insert into @dupes values ('Three', 'Third Three')
insert into @dupes values ('Three', 'Third Different')
SELECT code, other
FROM @dupes a
WHERE
EXISTS (SELECT NULL
FROM @dupes b
WHERE b.code = a.code
GROUP BY b.code
HAVING COUNT(*) > 1)
SELECT code, other
FROM @dupes a
WHERE
EXISTS (SELECT NULL
FROM @dupes b
WHERE b.code = a.code
AND b.Other = a.other
GROUP BY b.code, b.Other
HAVING COUNT(*) > 1)
DECLARE @cnt int
SELECT @cnt = COUNT(*)
FROM @dupes a
WHERE
EXISTS (SELECT NULL
FROM @dupes b
WHERE b.code = a.code
AND b.Other = a.other
GROUP BY b.code, b.Other
HAVING COUNT(*) > 1)
SELECT @cnt -- Just for testing
WHILE @cnt > 0 BEGIN
SET ROWCOUNT 1
DELETE
FROM @dupes
WHERE
EXISTS (SELECT NULL
FROM @dupes b
WHERE b.code = code
AND b.Other = other
GROUP BY b.code, b.Other
HAVING COUNT(*) > 1)
SET ROWCOUNT 0
SELECT @cnt = COUNT(*)
FROM @dupes a
WHERE
EXISTS (SELECT NULL
FROM @dupes b
WHERE b.code = a.code
AND b.Other = a.other
GROUP BY b.code, b.Other
HAVING COUNT(*) > 1)
SELECT @cnt -- Just for testing
END
[/font]
Note: The idea of joining a table to itself based on NULL (SELECT NULL FROM...) comes from this article:
http://www.15seconds.com/Issue/011009.htm
However, this deleted ALL the records, it did not leave one, which is desired in many cases.
Also, the table variable is used for example purposes. For live data, for example an ORDERS table,
it would appear as follows:
SELECT {Column}
FROM Orders
WHERE
EXISTS (SELECT NULL
FROM Orders b
WHERE b.{Column} = Orders.{Column}
GROUP BY b.{Column}
HAVING COUNT(*) > 1)

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply

SwePeso SSC-Dedicated Points: 39752 More actions · Answer 1

DELETE f

FROM (SELECT ROW_NUMBER() OVER (PARTITION BY code, other ORDER by code) AS recID

FROM Table1

) AS f

WHERE recID > 1

WHILE @@ROWCOUNT > 0

DELETE top (10000) f

FROM (SELECT ROW_NUMBER() OVER (PARTITION BY code, other ORDER by code) AS recID

FROM Table1

) AS f

WHERE recID > 1

N 56°04'39.16"
E 12°55'05.25"

adammenkes Say Hey Kid Points: 662 More actions · Answer 2

Brilliant!

This is awesome, much easier to code, and very fast. Thank you SO much!