RE: Another Duplicate removal question

SSC Guru

Points: 1003863

August 25, 2013 at 2:08 pm

Lynn Pettis (8/25/2013)
Jeff Moden (8/24/2013)
I can't help with requirement #2 because I don't know what your data or table looks like and don't know what exactly you mean by "preferably the one with the most data in the row".
There are two fairly easy ways to accomplish what you ask. They will outperform takes turns outperforming each other depending on how many duplicates you have per ScrubID and what the indexes on the table are.
Here are the two different methods. I didn't test them because you didn't post any readily consumable data but you should get the idea. Both will handle virtually any number of duplicate ScrubID's but, like I said previously, will work at different performance levels based on how many dupes there are for each ScrubID.
DELETE lo
FROM dbo.Results lo
JOIN dbo.Results hi
ON lo.ScrubID = hi.ScrubID
AND lo.Score < hi.Score
;
WITH
cteEnumerateDupes AS
(
SELECT SortOrder = ROW_NUMBER() OVER (PARTITION BY ScrubID ORDER BY Score DESC),
ID --You probably won't need this but it gives people the nice warm fuzzies.
FROM dbo.Results
)
DELETE cteEnumarateDupes
WHERE SortOrder > 1
;
Just one problem with what you provided, Jeff. The OP needs a solution that will work with SQL Server 2000.

In that case, the first solution I provided will work.

Good to "see" you around, Lynn.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)