Delete the duplicate records of table having only one column

  • Hi all,

    Kindly suggest  or  share  such query to  me , How we can delete the rows having duplicate records 
    Condition; Table having only one column from the we have to delete duplicate records

    Thanks

  • Write a CTE that numbers the rows, partitioned on the values of the single column, then delete from that CTE where the row number is greater than 1.

    John

  • Bharat21 - Thursday, February 2, 2017 3:57 AM

    Hi all,

    Kindly suggest  or  share  such query to  me , How we can delete the rows having duplicate records 
    Condition; Table having only one column from the we have to delete duplicate records

    Thanks

    What percent of the table contains duplicates?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • John Mitchell-245523 - Thursday, February 2, 2017 4:05 AM

    Write a CTE that numbers the rows, partitioned on the values of the single column, then delete from that CTE where the row number is greater than 1.

    John

    If the table in question only has 1 column, how would the DELETE know to only remove 1 of the records of the matching criteria?  I think this might be a bit more complicated than that, involving some type of DELETE TOP 1... kind of expression, not sure how to make that a set based solution though. :ermm:

  • How many rows are in the table?  It might be easier to copy the data to a new table with a SELECT DISTINCT ... INTO .. and then rename the tables afterward.

  • Chris Harshman - Thursday, February 2, 2017 6:58 AM

    John Mitchell-245523 - Thursday, February 2, 2017 4:05 AM

    Write a CTE that numbers the rows, partitioned on the values of the single column, then delete from that CTE where the row number is greater than 1.

    John

    If the table in question only has 1 column, how would the DELETE know to only remove 1 of the records of the matching criteria?  I think this might be a bit more complicated than that, involving some type of DELETE TOP 1... kind of expression, not sure how to make that a set based solution though. :ermm:

    Upon re-reading, I notice the requirement isn't totally clear, but I read it as meaning delete duplicates so that there's only one of each value left.  If I'm right, my solution will work.  If we're only removing one, then delete where row number is 1.  If the requirement is to remove all duplicated values, use COUNT instead of ROW_NUMBER, and delete where the count is greater than 1.  But as you and (I think) Jeff have suggested, it may be more efficient to move and rename (although that won't be possible if there are no DDL permissions in the database).

    John

  • Chris Harshman - Thursday, February 2, 2017 6:58 AM

    John Mitchell-245523 - Thursday, February 2, 2017 4:05 AM

    Write a CTE that numbers the rows, partitioned on the values of the single column, then delete from that CTE where the row number is greater than 1.

    John

    If the table in question only has 1 column, how would the DELETE know to only remove 1 of the records of the matching criteria?  I think this might be a bit more complicated than that, involving some type of DELETE TOP 1... kind of expression, not sure how to make that a set based solution though. :ermm:

    You mean this?

    IF OBJECT_ID('tempdb..#Clients') IS NOT NULL DROP TABLE #Clients; CREATE TABLE #Clients (Surname VARCHAR(30));

    INSERT INTO #Clients (Surname) VALUES

    ('Smith'),('Smith'),('Smith'),('Smith'),

    ('Jones'),('Jones'),('Jones'),('Jones'),('Jones'),

    ('sahathevarajan'),('sahathevarajan');

    SELECT Surname, rn = ROW_NUMBER() OVER(PARTITION BY Surname ORDER BY (SELECT NULL)) FROM #Clients;

    WITH Deleter AS (SELECT Surname, rn = ROW_NUMBER() OVER(PARTITION BY Surname ORDER BY (SELECT NULL)) FROM #Clients)

    DELETE FROM Deleter WHERE rn > 1

    SELECT Surname FROM #Clients;

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Chris Harshman - Thursday, February 2, 2017 6:58 AM

    John Mitchell-245523 - Thursday, February 2, 2017 4:05 AM

    Write a CTE that numbers the rows, partitioned on the values of the single column, then delete from that CTE where the row number is greater than 1.

    John

    If the table in question only has 1 column, how would the DELETE know to only remove 1 of the records of the matching criteria?  I think this might be a bit more complicated than that, involving some type of DELETE TOP 1... kind of expression, not sure how to make that a set based solution though. :ermm:

    It's not more complicated.  It works perfectly thanks to the internal magic of CTEs and hidden row identification using RID that's associated with the return of the generated ROW_NUMBER() OVER..

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • " It works perfectly thanks to the internal magic of CTEs and hidden row identification using RID that's associated with the return of the generated ROW_NUMBER() OVER.."
    Good to know, thanks!

  • Thanks to all

  • This was removed by the editor as SPAM

  • JasonClark - Thursday, February 16, 2017 8:45 PM

    PS: Please perform delete operation within a transaction.

    I have to ask "why"?  One of the advantages of doing it using the good method you described is that the new de-duped data exists in a separate table.  Once completed, there's no real need to protect the original table.  Another advantage of your method is that you can take advantage of minimal logging.  As of 2008 and provided that you have the same clustered index keys as the original table, the INSERT/SELECT will be minimally logged and won't require the duplication of reserved space like a SELECT INTO followed by a CI build would.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Don't you just love it when DB designers leave out primary key or unique key constraints?  If it wasn't for poor database design we wouldn't get to have so much fun with T-SQL.

  • Yes, I had been wanted this.

    Thanks, it's worked for me

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply