Deleting duplicate records/rows except for 1

  • I have searched the web for various ways to delete duplicate records, and I noticed various methods even within this forum. What I have come up with is a combination of ideas. Most importantly, the idea is to NOT use a CURSOR or TEMP TABLE with records that are to be manipulated.

    I also have a question, if there is a way to improve this with large amounts of data to be deleted?

    One thought would be to SET ROWCOUNT (@n - 1)

    where '@n' is the number of duplicates - 1 so multiple records are deleted at a given time

    However, I am open to suggestions.

    Below is sample code to show duplicates, first on one column, and the second on two columns. For more columns, just add 'AND a.Column = b.Column' as well as adding the Column to the GROUP BY clause.

    [font="Courier New"]

    declare @dupes TABLE (

    code varchar(10),

    other varchar(20)

    )

    insert into @dupes values ('One', 'First One')

    insert into @dupes values ('Two', 'Second Two')

    insert into @dupes values ('Two', 'Second Two')

    insert into @dupes values ('Three', 'Third Three')

    insert into @dupes values ('Three', 'Third Three')

    insert into @dupes values ('Three', 'Third Three')

    insert into @dupes values ('Three', 'Third Different')

    SELECT code, other

    FROM @dupes a

    WHERE

    EXISTS (SELECT NULL

    FROM @dupes b

    WHERE b.code = a.code

    GROUP BY b.code

    HAVING COUNT(*) > 1)

    SELECT code, other

    FROM @dupes a

    WHERE

    EXISTS (SELECT NULL

    FROM @dupes b

    WHERE b.code = a.code

    AND b.Other = a.other

    GROUP BY b.code, b.Other

    HAVING COUNT(*) > 1)

    DECLARE @cnt int

    SELECT @cnt = COUNT(*)

    FROM @dupes a

    WHERE

    EXISTS (SELECT NULL

    FROM @dupes b

    WHERE b.code = a.code

    AND b.Other = a.other

    GROUP BY b.code, b.Other

    HAVING COUNT(*) > 1)

    SELECT @cnt -- Just for testing

    WHILE @cnt > 0 BEGIN

    SET ROWCOUNT 1

    DELETE

    FROM @dupes

    WHERE

    EXISTS (SELECT NULL

    FROM @dupes b

    WHERE b.code = code

    AND b.Other = other

    GROUP BY b.code, b.Other

    HAVING COUNT(*) > 1)

    SET ROWCOUNT 0

    SELECT @cnt = COUNT(*)

    FROM @dupes a

    WHERE

    EXISTS (SELECT NULL

    FROM @dupes b

    WHERE b.code = a.code

    AND b.Other = a.other

    GROUP BY b.code, b.Other

    HAVING COUNT(*) > 1)

    SELECT @cnt -- Just for testing

    END

    [/font]

    Note: The idea of joining a table to itself based on NULL (SELECT NULL FROM...) comes from this article:

    http://www.15seconds.com/Issue/011009.htm

    However, this deleted ALL the records, it did not leave one, which is desired in many cases.

    Also, the table variable is used for example purposes. For live data, for example an ORDERS table,

    it would appear as follows:

    SELECT {Column}

    FROM Orders

    WHERE

    EXISTS (SELECT NULL

    FROM Orders b

    WHERE b.{Column} = Orders.{Column}

    GROUP BY b.{Column}

    HAVING COUNT(*) > 1)

  • DELETE f

    FROM (SELECT ROW_NUMBER() OVER (PARTITION BY code, other ORDER by code) AS recID

    FROM Table1

    ) AS f

    WHERE recID > 1

    WHILE @@ROWCOUNT > 0

    DELETE top (10000) f

    FROM (SELECT ROW_NUMBER() OVER (PARTITION BY code, other ORDER by code) AS recID

    FROM Table1

    ) AS f

    WHERE recID > 1


    N 56°04'39.16"
    E 12°55'05.25"

  • Brilliant!

    This is awesome, much easier to code, and very fast. Thank you SO much!

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply