delete repeating rows

  • Hi all,

    I am having hard time forming this query to dedulicate - duplicate rows in a table:

    select 1_KEY,2_KEY

    from MY_TABLE

    group by 1_KEY,2_KEY

    having count(*) >1

    now this query gives me a bunch of repeated rows(repeated 3 times) .

    I want to delete them but I want to keep one single row for those repeated rows....I dont want to delete all the 3 rows, just the 2 repeating rows should be removed.

    Please let me know if this can be done...

    thanks in advance

    [font="Comic Sans MS"]
    ---------------------------------------------------

    Thanks [/font]

  • You can either delete them all and reinsert a single copy or create a join and use delete top (n). The top (n) strategy will only work when n is a constant, unless you implement dynamic SQL, I believe. You need to take into consideration any constraints that may be in place.

    Why is your query using generic names for tables and fields? This isn't a homework assignment, is it?

  • You might want to look into ROW_NUMBER together with CTEs. There are several solutions available on this site. Please search for "duplicate rows" in the scripts section.



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Here is one version of the code to delete duplicate values

    -- Create the table

    if object_id('Duplicates') is not null

    drop table Duplicates

    Create table Duplicates

    (

    c1 int

    )

    -- insert some sample records

    declare @i int set @i = 1

    while (@i <= 5 )

    begin

    insert into Duplicates

    select 1

    union all

    select 2

    union all

    select 3

    set @i = @i + 1

    end

    select * From Duplicates order by c1

    -- delete script

    ;WITH CTE (Rank_of_Duplicates , c1)

    as

    (

    select row_number() over(partition by c1 order by c1 desc) as Rank_of_Duplicates, c1 from Duplicates

    )

    DELETE FROM CTE

    WHERE

    Rank_of_Duplicates not in

    (

    Select max(Rank_of_Duplicates) from cte

    group by c1

    )

    select * From Duplicates order by c1

  • Kajal123 (4/7/2010)


    Hi all,

    I am having hard time forming this query to dedulicate - duplicate rows in a table:

    select 1_KEY,2_KEY

    from MY_TABLE

    group by 1_KEY,2_KEY

    having count(*) >1

    now this query gives me a bunch of repeated rows(repeated 3 times) .

    I want to delete them but I want to keep one single row for those repeated rows....I dont want to delete all the 3 rows, just the 2 repeating rows should be removed.

    Please let me know if this can be done...

    thanks in advance

    Lutz has it right... ROW_NUMBER (and a couple of other methods if there's something temporal in the table) is one of the easiest ways to do this.

    If you have a large table with more dupes than not, the suggestion of copying distinct info to another table and doing a rename may be more effective.

    If the table with the duplicates is NOT simply a staging table, I'd suggest that a bit of a redesign is in order and should definitely include a Primary Key to prevent duplictes in the future.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Sorry... I didn't scroll down to see COldCoffee's solution which is a coded example of what Lutz was talking about.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (4/8/2010)


    Sorry... I didn't scroll down to see COldCoffee's solution which is a coded example of what Lutz was talking about.

    No issues Jeff... i am infact feeling great to have coded what Lutz was saying and honored to get a compliment from you for that 🙂 Thanks 😎

  • COldCoffee (4/8/2010)


    Jeff Moden (4/8/2010)


    Sorry... I didn't scroll down to see COldCoffee's solution which is a coded example of what Lutz was talking about.

    No issues Jeff... i am infact feeling great to have coded what Lutz was saying and honored to get a compliment from you for that 🙂 Thanks 😎

    Anyone who tries deserves a pat on the back. Glad to have you on board.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • As a bit of a side bar, if you key off the ROW_NUMBER for the DELETE, you can eliminate one full execution of the CTE. Like this...

    --=============================================================================

    -- Create and populate a test table. This is NOT a part of the solution.

    --=============================================================================

    --===== Conditionally drop the test table to make reruns easier

    IF OBJECT_ID('TempDB..#Duplicates') IS NOT NULL

    DROP TABLE #Duplicates

    ;

    --===== Create and populate the test table on the fly

    SELECT ca.SomeValue

    INTO #Duplicates

    FROM Master.dbo.spt_values v --Used this because most folks still don't have a Tally table

    CROSS JOIN

    (SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3) ca(SomeValue)

    WHERE v.Type = 'P' --Isolates the numbers of 0 to 2047 in SQL Server 2005 (Mini Tally table)

    AND v.Number BETWEEN 1 AND 5 --This controls the number of copies of "ca" to insert

    ;

    --=============================================================================

    -- Here's a DELETE solution that only causes the CTE to execute once.

    --=============================================================================

    WITH

    cteDuplicates AS

    (

    SELECT ROW_NUMBER() OVER(PARTITION BY SomeValue ORDER BY (SELECT 1)) AS NumberedDupes,

    SomeValue

    FROM #Duplicates

    )

    DELETE cteDuplicates

    WHERE NumberedDupes > 1

    ;

    If temporal data is available, you can easily eliminate all but the latest duplicated rows by using the ORDER BY of the ROW_NUMBER to control what get's the number "1" so it won't get deleted. This also saves on one full execution of the CTE. Here's an example where an IDENTITY column represents "temporal data"...

    --=============================================================================

    -- Create and populate a test table. This is NOT a part of the solution.

    --=============================================================================

    --===== Conditionally drop the test table to make reruns easier

    IF OBJECT_ID('TempDB..#Duplicates') IS NOT NULL

    DROP TABLE #Duplicates

    ;

    --===== Create and populate the test table on the fly

    SELECT IDENTITY(INT,1,1) AS SomeID,

    ca.SomeValue

    INTO #Duplicates

    FROM Master.dbo.spt_values v --Used this because most folks still don't have a Tally table

    CROSS JOIN

    (SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3) ca(SomeValue)

    WHERE v.Type = 'P' --Isolates the numbers of 0 to 2047 in SQL Server 2005 (Mini Tally table)

    AND v.Number BETWEEN 1 AND 5 --This controls the number of copies of "ca" to insert

    ;

    --===== Display the content of the table before the deletes.

    -- This is NOT a part of the solution.

    SELECT * FROM #Duplicates ORDER BY SomeValue, SomeID

    ;

    --=============================================================================

    -- Here's a DELETE solution that only causes the CTE to execute once.

    -- This deletes all dupes except for the "latest" entry for each set of

    -- duplicates (according to the IDENTITY column, anyway).

    --=============================================================================

    WITH

    cteDuplicates AS

    (

    SELECT ROW_NUMBER() OVER(PARTITION BY SomeValue ORDER BY SomeID DESC) AS NumberedDupes,

    SomeValue

    FROM #Duplicates

    )

    DELETE cteDuplicates

    WHERE NumberedDupes > 1

    ;

    --===== Display the content of the table after the deletes.

    -- This is NOT a part of the solution.

    SELECT * FROM #Duplicates ORDER BY SomeValue, SomeID

    ;

    Heh... since I'm mostly allergic to loops, I also built the test data using my friend the "Cross Join" just to show an alternate method. "Cross Apply" would have also done the job nicely.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (4/8/2010)


    COldCoffee (4/8/2010)


    Jeff Moden (4/8/2010)


    Sorry... I didn't scroll down to see COldCoffee's solution which is a coded example of what Lutz was talking about.

    No issues Jeff... i am infact feeling great to have coded what Lutz was saying and honored to get a compliment from you for that 🙂 Thanks 😎

    Anyone who tries deserves a pat on the back. Glad to have you on board.

    Thanks Sir :-)... i am at home so have not yet checked out the code you have given, once i reach office i will go thro that and feel it's superiority.. Thanks once again, jeff!:-)

  • Hi,

    use

    "set rowcount 2"

    it will show top 2 row of matching criteria out of 3,if there is 3 rows for that criteria and delete those row using delete query like

    set rowcount 2

    delete from MY_TABLE where 1_KEY=1 and 2_KEY=2

  • Trouble Shooter (4/9/2010)


    Hi,

    use

    "set rowcount 2"

    it will show top 2 row of matching criteria out of 3,if there is 3 rows for that criteria and delete those row using delete query like

    set rowcount 2

    delete from MY_TABLE where 1_KEY=1 and 2_KEY=2

    Um... probably not a good idea. If there are only 2 rows, they'll both be deleted. If there are 4 rows, two will remain.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi,

    2 is the variable ,u can set as your requirement.

    u can set it to 3 or 1 also as your requirement.

  • Trouble Shooter (4/12/2010)


    Hi,

    2 is the variable ,u can set as your requirement.

    u can set it to 3 or 1 also as your requirement.

    This is the same error you made in another thread tonight - where several people have shown you a better way. You can't remove a variable number of rows of duplicates, with set-based code, from different groups with SET ROWCOUNT.

  • Trouble Shooter (4/12/2010)


    Hi,

    2 is the variable ,u can set as your requirement.

    u can set it to 3 or 1 also as your requirement.

    I know... but for only one set of duplicates at a time which makes a RBAR solution because it won't work as a full table set based solution.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 21 total)

You must be logged in to reply to this topic. Login to reply