Deleting (non-PK) duplicate values based on another column

  • I have a query, building the below temporary table:

    SearchResultID (PK) - ContentID - Rank

    1 2134 22

    2 2134 108

    3 2666 66

    4 2666 69

    5 4567 4

    6 4567 123

    From this table, I want to remove the duplicate ContentID values, leaving only the one with the highest Rank value - and I will then use the PK of SearchResultID in another delete query.

    So after this action, the above table will look like:

    SearchResultID (PK) - ContentID - Rank

    2 2134 108

    4 2666 69

    6 4567 123

    This was originally done with cursors which was too slow, way too slow. I tried with a While loop, but again was too slow.

    Can anyone suggest the fastest way in SQL 2000 to achieve this result?

  • Either:

    DELETE T

    FROM TempTable T

    WHERE NOT EXISTS

    (

    SELECT T1.ContentID

    FROM TempTable T1

    WHERE T.ContentId = T1.ContentId

    GROUP BY T1.ContentID

    HAVING T.Rank = MAX(T1.Rank)

    )

    or

    DELETE T

    FROM TempTable T

    LEFT JOIN

    (

    SELECT T1.ContentId, MAX(T1.Rank) AS Rank

    FROM TempTable T1

    GROUP BY T1.ContentId

    ) D

    ON T.ContentId = D.ContentId

    AND T.Rank = D.Rank

    WHERE D.ContentId IS NULL

  • DELETE FROM MyTable

    WHERE EXISTS (SELECT * FROM MyTable t2 WHERE t2.ContentID=MyTable.ContentID AND t2.Rank>MyTable.Rank)

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • Here is another option:

    create table dbo.MyTestTable (

    SearchResultID int primary key,

    ContentID int,

    RankVal int

    )

    insert into dbo.MyTestTable

    select 1,2134,22 union all

    select 2,2134,108 union all

    select 3,2666,66 union all

    select 4,2666,69 union all

    select 5,4567,4 union all

    select 6,4567,123

    select * from dbo.MyTestTable

    delete dbo.MyTestTable

    from

    dbo.MyTestTable mt

    inner join (select

    ContentID,

    max(RankVal) as RankVal

    from

    dbo.MyTestTable

    group by

    ContentID) dt

    on (mt.ContentID = dt.ContentID)

    where

    mt.RankVal <> dt.RankVal

    select * from dbo.MyTestTable

  • All of the methods shown so far keep "ties" for the MAX if they are duplicated. Do you want to keep the ties?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • And that assumes that the value in RANK can have duplicates. That wasn't discussed and the sample data didn't have that problem.

  • Understood. But I don't believe one should make that assumption which is why I asked the question. Hopefully the OP will know.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Okay, abbreviated response, should have written that such that you (Jeff) would have known it was dirrected to the OP. Sorry. I was just adding additional info to your post.

    I got in a hurry as I was leaving work.

  • Heh... Same here... abbreviated post. Looking back at it, I certainly could have made it more apparent that it was directed at the OP. Sorry.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • heh, I was pretty sure that is who you meant. Looking at mine I can see how you didn't.

Viewing 10 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply