I need to delete a duplicate line

Question

Post reply

I need to delete a duplicate line

josetur12

Default port

Points: 1424
More actions
May 27, 2020 at 1:24 am

#3757588

Hi everybody, I need help to find and delete a duplicate line in a table. The row I am trying to delete is actually a duplicate because there are some columns that makes them different.
Please note in the table below. Rows with line_No of 1, 2, 3, and 5. They have the same code, dept_cd, ID, and version.
I need to delete the rows where status = 1 I have many rows with this scenario. The rows that had a status of 1 should have been deleted when the new row with status = 3 was aggregated.
CD DEPT ID LN_NO VERSION STATUS
PO HH 16362135 1 1 3
PO HH 16362135 1 1 1
PO HH 16362135 2 1 3
PO HH 16362135 2 1 1
PO HH 16362135 3 1 1
PO HH 16362135 3 1 3
PO HH 16362135 5 1 1
PO HH 16362135 5 1 3
PO HH 16362135 6 1 3
here is the sql to create the table and insert test rows.
CREATE TABLE [dbo].[TESTPO](
[CD] [varchar](8) NOT NULL,
[DEPT] [varchar](4) NOT NULL,
[ID] [varchar](20) NOT NULL,
[LN_NO] [numeric](10, 0) NOT NULL,
[VERSION] [numeric](10, 0) NOT NULL,
[STATUS] [numeric](10, 0) NOT NULL)
insert into TESTPO
([CD], [DEPT], [ID], [LN_NO], [VERSION], [STATUS])
SELECT 'PO', 'HH', '16362135', 1, 1, 3 UNION ALL
SELECT 'PO', 'HH', '16362135', 1, 1, 1 UNION ALL
SELECT 'PO', 'HH', '16362135', 2, 1, 3 UNION ALL
SELECT 'PO', 'HH', '16362135', 2, 1, 1 UNION ALL
SELECT 'PO', 'HH', '16362135', 3, 1, 1 UNION ALL
SELECT 'PO', 'HH', '16362135', 3, 1, 3 UNION ALL
SELECT 'PO', 'HH', '16362135', 5, 1, 1 UNION ALL
SELECT 'PO', 'HH', '16362135', 5, 1, 3 UNION ALL
SELECT 'PO', 'HH', '16362135', 6, 1, 3
I have tried queries to find duplicate rows but have not worked
Please help.
Thanks

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

pietlinden SSC Guru Points: 63428 More actions · Answer 1

I need to delete the rows where status = 1

DELETE

FROM TestPO

WHERE Status=1;

?

From your description, it's not at all clear what records you really want deleted. Which record(s) are duplicates? How do we tell?

Jeff Moden SSC Guru Points: 1003863 More actions · Answer 2

pietlinden wrote:

I need to delete the rows where status = 1
DELETE
FROM TestPO
WHERE Status=1;
?
From your description, it's not at all clear what records you really want deleted. Which record(s) are duplicates? How do we tell?

Your code works perfectly for the data given but... if there's ever a row where there's only a status of 1 that needs to be kept, your code would delete it.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1003863 More actions · Answer 3

I think this will be a bit more safe... details are in the comments in the code

 WITH cteEnumerate AS
(--==== Number each set of rows in descending order according to status.
 SELECT  Dupe# = ROW_NUMBER()OVER (PARTITION BY CD,DEPT,ID,LN_NO,VERSION ORDER BY [STATUS] DESC)
        ,[STATUS]
   FROM TESTPO
)--==== Only delete rows from each group that have more than one row and have a status of "1"
 DELETE FROM cteEnumerate 
  WHERE Dupe#  > 1
    AND Status = 1
;

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

josetur12 Default port Points: 1424 More actions · Answer 4

Jeff, thank you.

I ran the query you provided and it does delete the intended rows.

I will run it against a test table with a lot more data to select the duplicate rows, validate them and then delete.

Thank you!!!!

Jeff Moden SSC Guru Points: 1003863 More actions · Answer 5

Sounds like a plan... especially the part about validation before deletion. Thanks for the feedback.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jonathan AC Roberts SSCoach Points: 18263 More actions · Answer 6

DELETE tp
  FROM TESTPO tp
 WHERE STATUS = 1
   AND EXISTS(SELECT * 
                FROM TESTPO tp2
               WHERE tp2.CD = tp.CD
                 AND tp2.DEPT = tp.DEPT
                 AND tp2.ID = tp.ID
                 AND tp2.LN_NO = tp.LN_NO
                 AND tp2.STATUS = 3)

If you want to just keep the row with the highest status then

DELETE tp
  FROM TESTPO tp
 WHERE EXISTS(SELECT * 
                FROM TESTPO tp2
               WHERE tp2.CD = tp.CD
                 AND tp2.DEPT = tp.DEPT
                 AND tp2.ID = tp.ID
                 AND tp2.LN_NO = tp.LN_NO
                 AND tp2.STATUS > tp.STATUS)