Home Forums SQL Server 2008 T-SQL (SS2K8) Identify Duplicate Records according to Multiple Criteria RE: Identify Duplicate Records according to Multiple Criteria

  • Here's the simplest way. Without further information it's unlikely to be the best.

    -- I have an Employee table with Columns:

    DROP TABLE #Employee

    CREATE TABLE #Employee (EmpID INT IDENTITY(1,1), FirstName VARCHAR(100), MiddleName VARCHAR(100), LastName VARCHAR(100),

    SSN VARCHAR(25), BirthDate DATE, HireDate DATE, City VARCHAR(100), Zip VARCHAR(10))

    INSERT INTO #Employee (FirstName, MiddleName, LastName, SSN, BirthDate, HireDate, City, Zip)

    SELECT 'John', NULL, 'Smith', 'Smooth bitter', GETDATE(), GETDATE(), 'London', 'W1' UNION ALL

    SELECT 'Timothy', NULL, 'Taylor', 'Landlord', GETDATE(), GETDATE(), 'London', 'W1' UNION ALL

    SELECT 'Timothy', NULL, 'Taylor', 'Y', GETDATE()+1, GETDATE()+1, 'London', 'W1' UNION ALL

    SELECT 'Timothy', NULL, 'Taylor', 'X', GETDATE(), GETDATE(), 'P', 'Q' UNION ALL

    SELECT 'X', NULL, 'Y', 'Landlord', GETDATE(), GETDATE(), 'Z', 'A'

    -- Requirement: I have to identify the duplicate records based on below criteria:

    SELECT

    EmpID, FirstName, MiddleName, LastName, SSN, BirthDate, HireDate, City, Zip,

    Criteria1, Criteria2, Criteria3

    FROM (

    SELECT

    EmpID, FirstName, MiddleName, LastName, SSN, BirthDate, HireDate, City, Zip,

    Criteria1 = COUNT(*) OVER(PARTITION BY SSN),

    Criteria2 = COUNT(*) OVER(PARTITION BY FirstName, LastName, BirthDate),

    Criteria3 = COUNT(*) OVER(PARTITION BY FirstName, LastName, City, Zip)

    FROM #Employee

    ) d

    WHERE Criteria1+Criteria2+Criteria3 > 3

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden