Identify Duplicate Records according to Multiple Criteria

  • Hi There!

    I have an Employee table with Columns:

    EmpID | FirstName | MiddleName | LastName | SSN | BirthDate | HireDate | City | Zip

    Requirement: I have to identify the duplicate records based on below criteria:

    Criteria1: SSN

    Criteria2: FirstName, LastName, BirthDate

    Criteria2: FirstName, LastName, City, Zip

    Please tell how to accomplish this task........Thanks in Advance

  • Hi,

    do you want to accomplish all three conditions in single query or you need three different queries.



    Praveen D'sa
    MCITP - Database Administrator 2008
    http://sqlerrors.wordpress.com

  • Sy-1148362 (11/30/2013)


    Hi There!

    I have an Employee table with Columns:

    EmpID | FirstName | MiddleName | LastName | SSN | BirthDate | HireDate | City | Zip

    Requirement: I have to identify the duplicate records based on below criteria:

    Criteria1: SSN

    Criteria2: FirstName, LastName, BirthDate

    Criteria2: FirstName, LastName, City, Zip

    Please tell how to accomplish this task........Thanks in Advance

    You'll get much more help if you show what you've tried.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Sy-1148362 (11/30/2013)


    Hi There!

    I have an Employee table with Columns:

    EmpID | FirstName | MiddleName | LastName | SSN | BirthDate | HireDate | City | Zip

    Requirement: I have to identify the duplicate records based on below criteria:

    Criteria1: SSN

    Criteria2: FirstName, LastName, BirthDate

    Criteria2: FirstName, LastName, City, Zip

    Please tell how to accomplish this task........Thanks in Advance

    A search on this site would show you quite a few ways to identify duplicates.

    The use of ROW_NUMBER or GROUP BY being a couple that come to mind.

    With your criteria specified multiple times though you're not quite clear on how you would like to handle this.

    Maybe posting what you've tried as has been suggested would give an idea of what you are actually trying to do.

  • Here's the simplest way. Without further information it's unlikely to be the best.

    -- I have an Employee table with Columns:

    DROP TABLE #Employee

    CREATE TABLE #Employee (EmpID INT IDENTITY(1,1), FirstName VARCHAR(100), MiddleName VARCHAR(100), LastName VARCHAR(100),

    SSN VARCHAR(25), BirthDate DATE, HireDate DATE, City VARCHAR(100), Zip VARCHAR(10))

    INSERT INTO #Employee (FirstName, MiddleName, LastName, SSN, BirthDate, HireDate, City, Zip)

    SELECT 'John', NULL, 'Smith', 'Smooth bitter', GETDATE(), GETDATE(), 'London', 'W1' UNION ALL

    SELECT 'Timothy', NULL, 'Taylor', 'Landlord', GETDATE(), GETDATE(), 'London', 'W1' UNION ALL

    SELECT 'Timothy', NULL, 'Taylor', 'Y', GETDATE()+1, GETDATE()+1, 'London', 'W1' UNION ALL

    SELECT 'Timothy', NULL, 'Taylor', 'X', GETDATE(), GETDATE(), 'P', 'Q' UNION ALL

    SELECT 'X', NULL, 'Y', 'Landlord', GETDATE(), GETDATE(), 'Z', 'A'

    -- Requirement: I have to identify the duplicate records based on below criteria:

    SELECT

    EmpID, FirstName, MiddleName, LastName, SSN, BirthDate, HireDate, City, Zip,

    Criteria1, Criteria2, Criteria3

    FROM (

    SELECT

    EmpID, FirstName, MiddleName, LastName, SSN, BirthDate, HireDate, City, Zip,

    Criteria1 = COUNT(*) OVER(PARTITION BY SSN),

    Criteria2 = COUNT(*) OVER(PARTITION BY FirstName, LastName, BirthDate),

    Criteria3 = COUNT(*) OVER(PARTITION BY FirstName, LastName, City, Zip)

    FROM #Employee

    ) d

    WHERE Criteria1+Criteria2+Criteria3 > 3

    โ€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.โ€ - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • ChrisM@Work (12/2/2013)


    Here's the simplest way. Without further information it's unlikely to be the best.

    -- I have an Employee table with Columns:

    DROP TABLE #Employee

    CREATE TABLE #Employee (EmpID INT IDENTITY(1,1), FirstName VARCHAR(100), MiddleName VARCHAR(100), LastName VARCHAR(100),

    SSN VARCHAR(25), BirthDate DATE, HireDate DATE, City VARCHAR(100), Zip VARCHAR(10))

    INSERT INTO #Employee (FirstName, MiddleName, LastName, SSN, BirthDate, HireDate, City, Zip)

    SELECT 'John', NULL, 'Smith', 'Smooth bitter', GETDATE(), GETDATE(), 'London', 'W1' UNION ALL

    SELECT 'Timothy', NULL, 'Taylor', 'Landlord', GETDATE(), GETDATE(), 'London', 'W1' UNION ALL

    SELECT 'Timothy', NULL, 'Taylor', 'Y', GETDATE()+1, GETDATE()+1, 'London', 'W1' UNION ALL

    SELECT 'Timothy', NULL, 'Taylor', 'X', GETDATE(), GETDATE(), 'P', 'Q' UNION ALL

    SELECT 'X', NULL, 'Y', 'Landlord', GETDATE(), GETDATE(), 'Z', 'A'

    -- Requirement: I have to identify the duplicate records based on below criteria:

    SELECT

    EmpID, FirstName, MiddleName, LastName, SSN, BirthDate, HireDate, City, Zip,

    Criteria1, Criteria2, Criteria3

    FROM (

    SELECT

    EmpID, FirstName, MiddleName, LastName, SSN, BirthDate, HireDate, City, Zip,

    Criteria1 = COUNT(*) OVER(PARTITION BY SSN),

    Criteria2 = COUNT(*) OVER(PARTITION BY FirstName, LastName, BirthDate),

    Criteria3 = COUNT(*) OVER(PARTITION BY FirstName, LastName, City, Zip)

    FROM #Employee

    ) d

    WHERE Criteria1+Criteria2+Criteria3 > 3

    I could certainly be wrong but this seemed to be a school assignment which is why I wanted the OP to show what has been tried. ๐Ÿ˜‰

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply