Deleting only one of rows among duplicate rows

  • hi

    i have two rows with redundant data and i would like to delete only top record .Please let me know the best approach to do it .

    thanks

    sa

  • The only way you can do it is if your table has a unique primary key for each duplicate.

    delete mytable

    Where PK in (select min(a.pk)

                 From mytable a

                 join mytable b on a.dupcol = b.dupcol

                 where a.pk <> b.pk)

    or if you do not have a unique primay key  you need to

    select distinct into temp table

    truncate table

    insert rows from temp table

  • There are few good article on this subject in this site. search for them. You will find them usefull.




    My Blog: http://dineshasanka.spaces.live.com/

  • Try DELETE .... WHERE CURRENT OF ...cursor.  See Positioned Delete in BOL.  This will allow you to delete a single row from a set of tue duplicates. 

    If you can identify the row to delete with a WHERE clause, it would probably be better to use DELETE ... WHERE than to use a cursor.  However, for true duplicates you need a positioned delete.

    Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.

    When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara

  • Actualy, if you do a search for "DELETE DUPLICATES", you come up with just a couple of articles which use cursors and group bys and adding columns and using MAX/GROUP BY... and ... and... and... none of them appear to be straight forward.

    So here's a straight forward, scalable approach (as a generic boilerplate).  On my humble home computer, it will find and delete 400 duplicate rows in a million record table in less than a minute.

     

    First, the disclaimer: 

    THIS SCRIPT DELETES DATA!  DO NOT TEST ON PRODUCTION DATA!  DO NOT USE ON PRODUCTION DATA UNLESS YOU HAVE A VIABLE AND RESTORABLE BACKUP FOR YOUR DATA.  I AM NOT AND CANNOT BE MADE TO BE RESPONSIBLE FOR YOUR DATA IN ANY WAY, SHAPE, OR FORM!

    DELETE

      FROM tablename

     WHERE tablename.ID IN

           --The following SELECT statement acts as a derived table.  The DISTINCT term is

           --required because if there are 3 or more dupes for any given fields, the return

           --will grow, exponentially.

           (

           SELECT DISTINCT t1.ID

              FROM tablename AS t1

        INNER JOIN tablename AS t2

                -- These "ON" conditions contain the comparisons of the fields to be checked for dupes

                ON t1.dupchkfield1 = t2.dupchkfield1

               AND t1.dupchkfield2 = t2.dupchkfield2

             --AND t1.dupchkfield3 = t2.dupchkfield3 --Add additional fields like this

             --AND t1.dupchkfield4 = t2.dupchkfield4 --Add additional fields like this

               --  This "ON" condition contains the comparison of the field to be checked for age.

               --  "Age" can be based on date/datetime/time or auto-incrementing ID.

               --  In all cases, the earlier/lower the value, the older the record is compared to others.

               --  The comparison operator (">" in this example) determines which duplicate will be kept.

               --  The functionality of this example can be changed by changing the comparison operator to...

               --  ">" KEEPS only the OLDEST or EARLIEST record if duplicates are detected (OLD IS GREATer)

               --  "<" KEEPS only the YOUNGEST or LATEST record if duplicates are detected (YOUNG is LESS-THAN OLD)

               --  "<>" REMOVES ALL DUPLICATED RECORDS INCLUDING YOUNGEST AND OLDEST (be careful)

               AND t1.chronologyfield > t2.chronologyfield

    _________________________________________________________________________________________________________________

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Another way, if you don't have a primary key and don't want to use a cursor or group by:

     

    SET ROWCOUNT 1

    DELETE My Table WHERE field1 = 'DUPE'

    SET ROWCOUNT 0

     

    Dylan Peters
    SQL Server DBA

  • --I have found that the following script works well

    /* If your table is called "TheTable" and three fields "F1", "F2" and "F3" */

    /*This needs to be done within a transaction. Just in case something goes wrong*/

    BEGIN TRAN

            /* Save Distinct Versions of the Duplicate Rows */

            SELECT DISTINCT F1, F2, F3

            INTO #TheTable

            FROM TheTable T1

            WHERE EXISTS (SELECT F1, F2, F3, Count(*)

            FROM TheTable T2

            WHERE T2.F1 = T1.F1

            AND T2.F2 = T1.F2

            AND T2.F3 = T1.F3

            GROUP BY F1, F2, F3

            HAVING Count(*) > 1)

     

            /* DELETE the Duplicate Rows */

            DELETE T1

            FROM TheTable T1

            WHERE EXISTS (SELECT F1, F2, F3, Count(*)

            FROM TheTable T2

            WHERE T2.F1 = T1.F1

            AND T2.F2 = T1.F2

            AND T2.F3 = T1.F3

            GROUP BY F1, F2, F3

            HAVING Count(*) > 1)

     

            /* Restore one the saved Rows */

            INSERT INTO TheTable (F1, F2, F3)

            SELECT F1, F2, F3

            FROM #TheTable

     

            DROP TABLE #TheTable

    COMMIT TRAN

     

    --Hope this helps

    --Robert

  • See what I mean?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi Jeff,

    I tried your example on table which had three fields "F1", "F2" and "F3" with no primary or unique key. I could not get it to work. Whereas I did with mine. I must be doing something wrong. Could you please enpart some more of your wisdom and suggest where I might be using your simpler approach incorrectly.

    Cheers

    Robert

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply