Remove a Simple Duplicate

  • Hello Everyone

    I found some duplicate data as I was going thru the logic of a data pump. The entire row is not duplicated however.

    I would like to delete only the one row.

    This is a sample of the data:

    DECLARE @SomeData TABLE

    (

    FirstName varchar(25)

    , MiddleName varchar(25)

    , LastName varchar(25)

    , StreetAddress varchar(25)

    , Suite varchar(25)

    , City varchar(25)

    , [State] varchar(25)

    , PostalCode varchar(10)

    , AreaCode varchar(5)

    , PhoneNumber varchar(8)

    )

    INSERT INTO @SomeData

    SELECT 'Joe','Al','Smith','123 Main Street',NULL,'Dallas', 'North Carolina', '45789', NULL, NULL UNION ALL

    SELECT 'Joe','Al','Smith','123 Main Street',NULL,'Dallas', 'North Carolina', '45789', '256', '124-5879' UNION ALL

    SELECT 'Sally','Ann','Smith','123 Main Street',NULL,'Dallas', 'North Carolina', '45789', '256', '124-5879' UNION ALL

    SELECT 'Steven','David','Smith','123 Main Street',NULL,'Dallas', 'North Carolina', '45789', '256', '124-5879'

    SELECT * FROM

    @SomeData

    As you can see, Joe Smith has two rows, but only one of the rows is complete. I would like to delete only the row that has a NULL value in the phone and area code for Joe Smith. There are a few thousand rows that are like this. They have duplicates all but the area code and phone number.

    I am used to using a CTE to remove duplicates, but I am a little lost on this one. The things that I have tried, have not worked exactly as I planned.

    I hope that someone has a better way.

    Thank you in advance for your time, assistance, and suggestions

    Andrew SQLDBA

  • I'd stage into a temporary table with an identity column and delete from the temp table where

    1. The phone number or area code are null

    2. And the name and address columns exist elsewhere with a different identity than the current row.

    Did I capture the conditions correctly?

    Thanks

    John.

  • Thank You John

    That is what I tried using a CTE, but I am not getting something correct. I am having trouble with select the correct row to delete.

    Would you offer a sample of code?

    Thanks

    Andrew

  • If you only see nulls in areacode and phonenumber you can get rid of them with temp table like john mentioned

    select into #t1

    from @somedata where AreaCode is not null and PhoneNumber is not null

    Alex S
  • something like this... may be you need to modify it accordingly..

    DECLARE @SomeData TABLE

    (

    FirstName varchar(25)

    , MiddleName varchar(25)

    , LastName varchar(25)

    , StreetAddress varchar(25)

    , Suite varchar(25)

    , City varchar(25)

    , [State] varchar(25)

    , PostalCode varchar(10)

    , AreaCode varchar(5)

    , PhoneNumber varchar(8)

    )

    INSERT INTO @SomeData

    SELECT 'Joe','Al','Smith','123 Main Street',NULL,'Dallas', 'North Carolina', '45789', NULL, NULL UNION ALL

    SELECT 'Joe','Al','Smith','123 Main Street',NULL,'Dallas', 'North Carolina', '45789', '256', '124-5879' UNION ALL

    SELECT 'Sally','Ann','Smith','123 Main Street',NULL,'Dallas', 'North Carolina', '45789', '256', '124-5879' UNION ALL

    SELECT 'Steven','David','Smith','123 Main Street',NULL,'Dallas', 'North Carolina', '45789', '256', '124-5879'

    ;WITH CTE AS

    (

    SELECT *, ROW_NUMBER() OVER(PARTITION BY FirstName, MiddleName, LastName Order by AreaCode DESC, PhoneNumber DESC) AS Rnum FROM @SomeData

    )

    --SELECT * FROM CTE

    DELETE FROM CTE

    --WHERE Rnum = 1

    WHERE Rnum <> 1

    SELECT * FROM @SomeData

    http://sqlsaga.com/sql-server/how-to-remove-duplicates-from-a-table-in-sql-server/[/url]

    Good Luck 🙂 .. Visit www.sqlsaga.com for more t-sql code snippets and BI related how to articles.

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply