Identifying records that are duplicates with the exception of one column

  • Hi,

    We are trying to clean up some data in an old database which unfortunately contains various duplicate records.

    In this particular instance, I'm looking to identify any employee record where there is another employee record that is identical with the exception of the Phone field. For these employees, we want to retrieve the employee with the blank Phone value.

    Here is a simplified version of what I'm working with. Yes, even the EmployeePk is duplicated in some records. Its a mess.

    CREATE TABLE

    #Employee

    (

    EmployeePk INT,

    FirstName VARCHAR(100),

    LastName VARCHAR(100),

    BranchFk INT,

    Phone VARCHAR(15)

    )

    INSERT INTO #Employee (EmployeePk, FirstName, LastName, BranchFk, Phone) VALUES (105, 'Bob','Smith',1,'(123) 456-7890')

    INSERT INTO #Employee (EmployeePk, FirstName, LastName, BranchFk, Phone) VALUES (105, 'Bob','Smith',1,'')

    INSERT INTO #Employee (EmployeePk, FirstName, LastName, BranchFk, Phone) VALUES (106, 'Jane','Jones',2,'(434) 434-4343')

    INSERT INTO #Employee (EmployeePk, FirstName, LastName, BranchFk, Phone) VALUES (106, 'Jane','Jones',2,'(434) 434-4343')

    INSERT INTO #Employee (EmployeePk, FirstName, LastName, BranchFk, Phone) VALUES (107, 'John','Brown',NULL,'(222) 222-2222')

    INSERT INTO #Employee (EmployeePk, FirstName, LastName, BranchFk, Phone) VALUES (107, 'John','Brown',NULL,'(222) 222-2222')

    INSERT INTO #Employee (EmployeePk, FirstName, LastName, BranchFk, Phone) VALUES (107, 'John','Brown',NULL,'')

    --All records

    SELECT * FROM #Employee

    --Records I'd like to retrieve because it is a duplicate with a blank Phone value

    SELECT EmployeePk = 105, FirstName = 'Bob',LastName = 'Smith',BranchFk = 1,Phone = ''

    UNION

    SELECT EmployeePk = 107, FirstName = 'John',LastName = 'Brown',BranchFk = NULL,Phone = ''

    DROP TABLE #Employee

    I greatly appreciate any assistance!

  • Not sure I understand the requirement completely, but even if this is not exactly what you need, it should get you started:

    SELECT EmployeePk,

    FirstName,

    LastName,

    BranchFk,

    CASE WHEN MIN(Phone) = MAX(Phone) THEN MIN(Phone) ELSE '' END AS Phone

    FROM #Employee

    GROUP BY EmployeePk,

    FirstName,

    LastName,

    BranchFk;

    Good luck sorting out that data mess... Don't forget to add constraints to the table as soon as th rubbish is gone, otherwise it'll all start over again.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • If I understand the requirements correctly you want all the instances with blank Phone where the same FirstName,LastName,BranchFk exists with a non-blank phone.

    😎

    Quick suggestion

    USE tempdb;

    SET NOCOUNT ON;

    GO

    CREATE TABLE

    #Employee

    (

    EmployeePk INT,

    FirstName VARCHAR(100),

    LastName VARCHAR(100),

    BranchFk INT,

    Phone VARCHAR(15)

    )

    INSERT INTO #Employee (EmployeePk, FirstName, LastName, BranchFk, Phone)

    VALUES (105, 'Bob','Smith',1,'(123) 456-7890')

    ,(105, 'Bob','Smith',1,'')

    ,(106, 'Jane','Jones',2,'(434) 434-4343')

    ,(106, 'Jane','Jones',2,'(434) 434-4343')

    ,(107, 'John','Brown',NULL,'(222) 222-2222')

    ,(107, 'John','Brown',NULL,'(222) 222-2222')

    ,(107, 'John','Brown',NULL,'')

    ;

    ;WITH BASE_DATA AS

    (

    SELECT

    EMP.EmployeePk

    ,COUNT(*) OVER

    (

    PARTITION BY EMP.FirstName

    ,EMP.LastName

    ,EMP.BranchFk

    ) AS INST_COUNT

    ,MAX(LEN(EMP.Phone)) OVER

    (

    PARTITION BY EMP.FirstName

    ,EMP.LastName

    ,EMP.BranchFk

    ) AS MX_LEN_PHONE

    ,EMP.FirstName

    ,EMP.LastName

    ,EMP.BranchFk

    ,EMP.Phone

    FROM #Employee EMP

    )

    SELECT

    BD.EmployeePk

    ,BD.FirstName

    ,BD.LastName

    ,BD.BranchFk

    ,BD.Phone

    FROM BASE_DATA BD

    WHERE BD.Phone = ''

    AND BD.INST_COUNT > 1

    AND BD.MX_LEN_PHONE > 0;

    DROP TABLE #Employee;

    Output

    EmployeePk FirstName LastName BranchFk Phone

    ----------- ----------- ---------- ----------- ---------------

    105 Bob Smith 1

    107 John Brown NULL

  • Thank you for the responses! I'll give these a try.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply