deleting duplicate rows based on a condition

  • I've cobbled together other answers from similar posts but can't quite get over the finish line.
    I'm trying to remove duplicate records. With the definition of "duplicate" being all fields except for one particular field.
    The duplicate rows to delete will have that field set to NULL.
    Within a set of duplicate records there will always be one (and only one) record where that field is not NULL.
    Other records may have that field set to NULL, but they have no respective "duplicates". So they need to remain.
    The field to check is named 'checkMe'.
    USE [Test]
    GO
    IF OBJECT_ID('[dbo].[PinheadTest]', 'U') IS NOT NULL
    DROP TABLE [dbo].[PinheadTest]
    GO
    SET ANSI_NULLS ON
    GO
    SET QUOTED_IDENTIFIER ON
    GO
    CREATE TABLE [dbo].[PinheadTest](
        [ID] [int] IDENTITY(1,1) NOT NULL,
        [name] [nchar](10) NULL,
         [nchar](10) NULL,
        [checkMe] [nchar](10) NULL
    ) ON [PRIMARY]

    GO
    SET IDENTITY_INSERT [dbo].[PinheadTest] ON

    INSERT [dbo].[PinheadTest] ([ID], [name], , [checkMe]) VALUES (3, N'John  ', N'asd@asd.cm', N'julie  ')
    INSERT [dbo].[PinheadTest] ([ID], [name], , [checkMe]) VALUES (8, N'Sam   ', N'asd@asd.cm', N'sally  ')
    INSERT [dbo].[PinheadTest] ([ID], [name], , [checkMe]) VALUES (5, N'Tom   ', N'asd@asd.cm', NULL)
    INSERT [dbo].[PinheadTest] ([ID], [name], , [checkMe]) VALUES (6, N'Bob   ', N'bob@asd.cm', N'betsy  ')
    INSERT [dbo].[PinheadTest] ([ID], [name], , [checkMe]) VALUES (5, N'Tom   ', N'asd@asd.cm', N'teresa ')
    INSERT [dbo].[PinheadTest] ([ID], [name], , [checkMe]) VALUES (8, N'Sam   ', N'asd@asd.cm', NULL)
    INSERT [dbo].[PinheadTest] ([ID], [name], , [checkMe]) VALUES (8, N'Sam   ', N'asd@asd.cm', NULL)
    INSERT [dbo].[PinheadTest] ([ID], [name], , [checkMe]) VALUES (9, N'Fred  ', N'asd@asd.cm', NULL)
    SET IDENTITY_INSERT [dbo].[PinheadTest] OFF

    select *
    from dbo.PinheadTest

    SELECT *
    INTO #Temp
    FROM
    (SELECT
      y.id,y.name,y.email, y.checkMe
      FROM dbo.PinheadTest y
       INNER JOIN (SELECT
            name,email, COUNT(*) AS CountOf
            FROM dbo.PinheadTest
            GROUP BY name,email
            HAVING COUNT(*)>1
           ) dt ON y.name=dt.name and y.email=dt.email
    ) as x

    select *
    from #Temp

    drop Table #Temp

    In the picture, the rows to delete from the table are marked with the poorly drawn lines from the temp table output at the bottom, to the original table output above it.
    Suggestions?
    Thanks in advance.

  • Try this query, this should work. You can use select clause to check the data to be deleted


    DELETE FROM A
    --SELECT *
    FROM dbo.PinheadTest a
    where exists
    (SELECT
       COUNT(*) AS CountOf
       FROM dbo.PinheadTest b
            where a.name = b.name
            and a.email = b.email
       HAVING COUNT(*)>1
    )
    and a.checkMe is null

  • Awesome. Thank you!
    The first 2 lines
    DELETE FROM A
    FROM dbo.PinheadTest a

    I didn't imagine that could be done like that.
    DELETE FROM someTable FROM someTable WHERE <conditions>
    Good to know!

  • Avi1 - Wednesday, September 13, 2017 4:15 PM

    Try this query, this should work. You can use select clause to check the data to be deleted


    DELETE FROM A
    --SELECT *
    FROM dbo.PinheadTest a
    where exists
    (SELECT
       COUNT(*) AS CountOf
       FROM dbo.PinheadTest b
            where a.name = b.name
            and a.email = b.email
       HAVING COUNT(*)>1
    )
    and a.checkMe is null

    This requires that you read the table twice: once for the delete and once for the count.  The following only requires one read of the table.
    WITH Pinhead_rn AS
    (
        SELECT *, ROW_NUMBER() OVER(PARTITION BY [ID] ORDER BY checkMe DESC) AS rn
        FROM Pinhead
    )
    DELETE Pinhead_rn
    WHERE rn > 1

    Since NULLs sort before non-NULLs, sorting in DESCending order ensures that you retain the non-NULL value.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • @drew, I got the error "incorrect syntax near the keyword 'with'.
    WITH Pinhead_rn AS
    (
      SELECT *, ROW_NUMBER() OVER(PARTITION BY [ID] ORDER BY checkMe DESC) AS rn
      FROM dbo.PinheadTest
    )
    DELETE Pinhead_rn
    WHERE rn > 1

  • skippyV - Thursday, September 14, 2017 3:36 PM

    @drew, I got the error "incorrect syntax near the keyword 'with'.
    WITH Pinhead_rn AS
    (
      SELECT *, ROW_NUMBER() OVER(PARTITION BY [ID] ORDER BY checkMe DESC) AS rn
      FROM dbo.PinheadTest
    )
    DELETE Pinhead_rn
    WHERE rn > 1

    A CTE requires that you terminate the previous statement with a semicolon.  This terminator is obviously missing.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Ah yes. The missing semicolon was the problem. Thanks!

  • ALERT -  Deleting a value from a table should always be TRY CATCHED and ROLLED back on error. Below is the full proof sol in case of any error occurs while deletion.
    Adding an output clause will give you the sense of confirmation of what rows u actually deleted. 

    create table #temp (checkme char(1) )
    begin tran
            ;WITH Pinhead_rn AS
            (
             SELECT *, ROW_NUMBER() OVER(PARTITION BY [ID] ORDER BY checkMe DESC) AS rn
             FROM dbo.PinheadTest
            )
            DELETE Pinhead_rn
            output deleted.checkMe into #temp
            WHERE rn > 1

            begin try
                    select * from #temp where checkme is not null
                    if (@@rowcount <>0)
                    raiserror('error',16,1)
    commit
            end try
            begin catch
                    rollback;
            end catch

     

    drew.allen - Friday, September 15, 2017 2:41 PM

    skippyV - Thursday, September 14, 2017 3:36 PM

    @drew, I got the error "incorrect syntax near the keyword 'with'.
    WITH Pinhead_rn AS
    (
      SELECT *, ROW_NUMBER() OVER(PARTITION BY [ID] ORDER BY checkMe DESC) AS rn
      FROM dbo.PinheadTest
    )
    DELETE Pinhead_rn
    WHERE rn > 1

    [/quote-0]

    A CTE requires that you terminate the previous statement with a semicolon.  This terminator is obviously missing.

    Drew

    [/quote-1]

    First solve the problem then write the code !

  • TheCTEGuy - Saturday, September 30, 2017 6:20 AM

    ALERT -  Deleting a value from a table should always be TRY CATCHED and ROLLED back on error. Below is the full proof sol in case of any error occurs while deletion.
    Adding an output clause will give you the sense of confirmation of what rows u actually deleted. 

    create table #temp (checkme char(1) )
    begin tran
            ;WITH Pinhead_rn AS
            (
             SELECT *, ROW_NUMBER() OVER(PARTITION BY [ID] ORDER BY checkMe DESC) AS rn
             FROM dbo.PinheadTest
            )
            DELETE Pinhead_rn
            output deleted.checkMe into #temp
            WHERE rn > 1

            begin try
                    select * from #temp where checkme is not null
                    if (@@rowcount <>0)
                    raiserror('error',16,1)
    commit
            end try
            begin catch
                    rollback;
            end catch

     

    drew.allen - Friday, September 15, 2017 2:41 PM

    skippyV - Thursday, September 14, 2017 3:36 PM

    @drew, I got the error "incorrect syntax near the keyword 'with'.
    WITH Pinhead_rn AS
    (
      SELECT *, ROW_NUMBER() OVER(PARTITION BY [ID] ORDER BY checkMe DESC) AS rn
      FROM dbo.PinheadTest
    )
    DELETE Pinhead_rn
    WHERE rn > 1

    [/quote-0]

    A CTE requires that you terminate the previous statement with a semicolon.  This terminator is obviously missing.

    Drew

    [/quote-1]

    I disagree.  A DELETE is its own transaction and will rollback of it's own accord if an error occurs.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Sunday, October 1, 2017 5:30 PM

    TheCTEGuy - Saturday, September 30, 2017 6:20 AM

    ALERT -  Deleting a value from a table should always be TRY CATCHED and ROLLED back on error. Below is the full proof sol in case of any error occurs while deletion.
    Adding an output clause will give you the sense of confirmation of what rows u actually deleted. 

    create table #temp (checkme char(1) )
    begin tran
            ;WITH Pinhead_rn AS
            (
             SELECT *, ROW_NUMBER() OVER(PARTITION BY [ID] ORDER BY checkMe DESC) AS rn
             FROM dbo.PinheadTest
            )
            DELETE Pinhead_rn
            output deleted.checkMe into #temp
            WHERE rn > 1

            begin try
                    select * from #temp where checkme is not null
                    if (@@rowcount <>0)
                    raiserror('error',16,1)
    commit
            end try
            begin catch
                    rollback;
            end catch

     

    drew.allen - Friday, September 15, 2017 2:41 PM

    skippyV - Thursday, September 14, 2017 3:36 PM

    @drew, I got the error "incorrect syntax near the keyword 'with'.
    WITH Pinhead_rn AS
    (
      SELECT *, ROW_NUMBER() OVER(PARTITION BY [ID] ORDER BY checkMe DESC) AS rn
      FROM dbo.PinheadTest
    )
    DELETE Pinhead_rn
    WHERE rn > 1

    [/quote-0]

    A CTE requires that you terminate the previous statement with a semicolon.  This terminator is obviously missing.

    Drew

    [/quote-1]

    I disagree.  A DELETE is its own transaction and will rollback of it's own accord if an error occurs.

    Yes you are correct,  but what if I am setting Implicit transactions ON or any other Isolation settings on the session(connection) or i want my sol not to fail in any circumstances and the subsequent code runs. The solution itself should be full proof and standard as per my thoughts. 
    Thanks 🙂

    First solve the problem then write the code !

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply