I need to delete a duplicate line

  • Hi everybody, I need help to find and delete a duplicate line in a table.  The row I am trying to delete is actually a duplicate because there are some columns that makes them different.

    Please note in the table below.  Rows with line_No of 1, 2, 3, and 5.  They have the same code, dept_cd, ID, and version.

    I need to delete the rows where status = 1  I have many rows with this scenario.  The rows that had a status of 1 should have been deleted when the new row with status = 3 was aggregated.

    CD DEPT ID              LN_NO        VERSION          STATUS

    PO HH 16362135     1                     1                              3

    PO HH 16362135     1                      1                              1

    PO HH 16362135    2                      1                              3

    PO HH 16362135    2                      1                               1

    PO HH 16362135    3                     1                                1

    PO HH 16362135    3                     1                                3

    PO HH 16362135   5                      1                                  1

    PO HH 16362135   5                     1                                 3

    PO HH 16362135   6                       1                              3

    here is the sql to create the table and insert test rows.

    CREATE TABLE [dbo].[TESTPO](

    [CD] [varchar](8) NOT NULL,

    [DEPT] [varchar](4) NOT NULL,

    [ID] [varchar](20) NOT NULL,

    [LN_NO] [numeric](10, 0) NOT NULL,

    [VERSION] [numeric](10, 0) NOT NULL,

    [STATUS] [numeric](10, 0) NOT NULL)

    insert into TESTPO

    ([CD], [DEPT], [ID], [LN_NO], [VERSION], [STATUS])

    SELECT 'PO', 'HH', '16362135', 1, 1, 3 UNION ALL

    SELECT 'PO', 'HH', '16362135', 1, 1, 1 UNION ALL

    SELECT 'PO', 'HH', '16362135', 2, 1, 3 UNION ALL

    SELECT 'PO', 'HH', '16362135', 2, 1, 1 UNION ALL

    SELECT 'PO', 'HH', '16362135', 3, 1, 1 UNION ALL

    SELECT 'PO', 'HH', '16362135', 3, 1, 3 UNION ALL

    SELECT 'PO', 'HH', '16362135', 5, 1, 1 UNION ALL

    SELECT 'PO', 'HH', '16362135', 5, 1, 3 UNION ALL

    SELECT 'PO', 'HH', '16362135', 6, 1, 3

    I have tried queries to find duplicate rows but have not worked

    Please help.

    Thanks

  • I need to delete the rows where status = 1

    DELETE

    FROM TestPO

    WHERE Status=1;

    ?

    From your description, it's not at all clear what records you really want deleted. Which record(s) are duplicates? How do we tell?

  • pietlinden wrote:

    I need to delete the rows where status = 1

    DELETE

    FROM TestPO

    WHERE Status=1;

    ?

    From your description, it's not at all clear what records you really want deleted. Which record(s) are duplicates? How do we tell?

    Your code works perfectly for the data given but... if there's ever a row where there's only a status of 1 that needs to be kept, your code would delete it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I think this will be a bit more safe... details are in the comments in the code

     WITH cteEnumerate AS
    (--==== Number each set of rows in descending order according to status.
    SELECT Dupe# = ROW_NUMBER()OVER (PARTITION BY CD,DEPT,ID,LN_NO,VERSION ORDER BY [STATUS] DESC)
    ,[STATUS]
    FROM TESTPO
    )--==== Only delete rows from each group that have more than one row and have a status of "1"
    DELETE FROM cteEnumerate
    WHERE Dupe# > 1
    AND Status = 1
    ;

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff, thank you.

    I ran the query you provided and it does delete the intended rows.

    I will run it against a test table with a lot more data to select the duplicate rows, validate them and then delete.

    Thank you!!!!

  • Sounds like a plan... especially the part about validation before deletion.   Thanks for the feedback.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • DELETE tp
    FROM TESTPO tp
    WHERE STATUS = 1
    AND EXISTS(SELECT *
    FROM TESTPO tp2
    WHERE tp2.CD = tp.CD
    AND tp2.DEPT = tp.DEPT
    AND tp2.ID = tp.ID
    AND tp2.LN_NO = tp.LN_NO
    AND tp2.STATUS = 3)

    If you want to just keep the row with the highest status then

    DELETE tp
    FROM TESTPO tp
    WHERE EXISTS(SELECT *
    FROM TESTPO tp2
    WHERE tp2.CD = tp.CD
    AND tp2.DEPT = tp.DEPT
    AND tp2.ID = tp.ID
    AND tp2.LN_NO = tp.LN_NO
    AND tp2.STATUS > tp.STATUS)

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply