Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Identify Duplicate Records according to Multiple Criteria


Identify Duplicate Records according to Multiple Criteria

Author
Message
Sy-1148362
Sy-1148362
Grasshopper
Grasshopper (21 reputation)Grasshopper (21 reputation)Grasshopper (21 reputation)Grasshopper (21 reputation)Grasshopper (21 reputation)Grasshopper (21 reputation)Grasshopper (21 reputation)Grasshopper (21 reputation)

Group: General Forum Members
Points: 21 Visits: 275
Hi There!
I have an Employee table with Columns:

EmpID | FirstName | MiddleName | LastName | SSN | BirthDate | HireDate | City | Zip


Requirement: I have to identify the duplicate records based on below criteria:
Criteria1: SSN
Criteria2: FirstName, LastName, BirthDate
Criteria2: FirstName, LastName, City, Zip

Please tell how to accomplish this task........Thanks in Advance
PravB4u
PravB4u
SSC-Enthusiastic
SSC-Enthusiastic (147 reputation)SSC-Enthusiastic (147 reputation)SSC-Enthusiastic (147 reputation)SSC-Enthusiastic (147 reputation)SSC-Enthusiastic (147 reputation)SSC-Enthusiastic (147 reputation)SSC-Enthusiastic (147 reputation)SSC-Enthusiastic (147 reputation)

Group: General Forum Members
Points: 147 Visits: 600
Hi,
do you want to accomplish all three conditions in single query or you need three different queries.



Praveen D'sa
MCITP - Database Administrator 2008
http://sqlerrors.wordpress.com
Jeff Moden
Jeff Moden
SSC-Forever
SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)

Group: General Forum Members
Points: 45241 Visits: 39927
Sy-1148362 (11/30/2013)
Hi There!
I have an Employee table with Columns:

EmpID | FirstName | MiddleName | LastName | SSN | BirthDate | HireDate | City | Zip


Requirement: I have to identify the duplicate records based on below criteria:
Criteria1: SSN
Criteria2: FirstName, LastName, BirthDate
Criteria2: FirstName, LastName, City, Zip

Please tell how to accomplish this task........Thanks in Advance


You'll get much more help if you show what you've tried.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
Although they tell us that they want it real bad, our primary goal is to ensure that we dont actually give it to them that way.
Although change is inevitable, change for the better is not.
Just because you can do something in PowerShell, doesnt mean you should. Wink

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
CKX
CKX
SSC Journeyman
SSC Journeyman (99 reputation)SSC Journeyman (99 reputation)SSC Journeyman (99 reputation)SSC Journeyman (99 reputation)SSC Journeyman (99 reputation)SSC Journeyman (99 reputation)SSC Journeyman (99 reputation)SSC Journeyman (99 reputation)

Group: General Forum Members
Points: 99 Visits: 970
Sy-1148362 (11/30/2013)
Hi There!
I have an Employee table with Columns:

EmpID | FirstName | MiddleName | LastName | SSN | BirthDate | HireDate | City | Zip


Requirement: I have to identify the duplicate records based on below criteria:
Criteria1: SSN
Criteria2: FirstName, LastName, BirthDate
Criteria2: FirstName, LastName, City, Zip

Please tell how to accomplish this task........Thanks in Advance


A search on this site would show you quite a few ways to identify duplicates.
The use of ROW_NUMBER or GROUP BY being a couple that come to mind.
With your criteria specified multiple times though you're not quite clear on how you would like to handle this.
Maybe posting what you've tried as has been suggested would give an idea of what you are actually trying to do.
ChrisM@Work
ChrisM@Work
SSCrazy Eights
SSCrazy Eights (9K reputation)SSCrazy Eights (9K reputation)SSCrazy Eights (9K reputation)SSCrazy Eights (9K reputation)SSCrazy Eights (9K reputation)SSCrazy Eights (9K reputation)SSCrazy Eights (9K reputation)SSCrazy Eights (9K reputation)

Group: General Forum Members
Points: 9011 Visits: 19028
Here's the simplest way. Without further information it's unlikely to be the best.
-- I have an Employee table with Columns:
DROP TABLE #Employee
CREATE TABLE #Employee (EmpID INT IDENTITY(1,1), FirstName VARCHAR(100), MiddleName VARCHAR(100), LastName VARCHAR(100),
SSN VARCHAR(25), BirthDate DATE, HireDate DATE, City VARCHAR(100), Zip VARCHAR(10))

INSERT INTO #Employee (FirstName, MiddleName, LastName, SSN, BirthDate, HireDate, City, Zip)
SELECT 'John', NULL, 'Smith', 'Smooth bitter', GETDATE(), GETDATE(), 'London', 'W1' UNION ALL
SELECT 'Timothy', NULL, 'Taylor', 'Landlord', GETDATE(), GETDATE(), 'London', 'W1' UNION ALL
SELECT 'Timothy', NULL, 'Taylor', 'Y', GETDATE()+1, GETDATE()+1, 'London', 'W1' UNION ALL
SELECT 'Timothy', NULL, 'Taylor', 'X', GETDATE(), GETDATE(), 'P', 'Q' UNION ALL
SELECT 'X', NULL, 'Y', 'Landlord', GETDATE(), GETDATE(), 'Z', 'A'

-- Requirement: I have to identify the duplicate records based on below criteria:
SELECT
EmpID, FirstName, MiddleName, LastName, SSN, BirthDate, HireDate, City, Zip,
Criteria1, Criteria2, Criteria3
FROM (
SELECT
EmpID, FirstName, MiddleName, LastName, SSN, BirthDate, HireDate, City, Zip,
Criteria1 = COUNT(*) OVER(PARTITION BY SSN),
Criteria2 = COUNT(*) OVER(PARTITION BY FirstName, LastName, BirthDate),
Criteria3 = COUNT(*) OVER(PARTITION BY FirstName, LastName, City, Zip)
FROM #Employee
) d
WHERE Criteria1+Criteria2+Criteria3 > 3




“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Exploring Recursive CTEs by Example Dwain Camps
Jeff Moden
Jeff Moden
SSC-Forever
SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)

Group: General Forum Members
Points: 45241 Visits: 39927
ChrisM@Work (12/2/2013)
Here's the simplest way. Without further information it's unlikely to be the best.
-- I have an Employee table with Columns:
DROP TABLE #Employee
CREATE TABLE #Employee (EmpID INT IDENTITY(1,1), FirstName VARCHAR(100), MiddleName VARCHAR(100), LastName VARCHAR(100),
SSN VARCHAR(25), BirthDate DATE, HireDate DATE, City VARCHAR(100), Zip VARCHAR(10))

INSERT INTO #Employee (FirstName, MiddleName, LastName, SSN, BirthDate, HireDate, City, Zip)
SELECT 'John', NULL, 'Smith', 'Smooth bitter', GETDATE(), GETDATE(), 'London', 'W1' UNION ALL
SELECT 'Timothy', NULL, 'Taylor', 'Landlord', GETDATE(), GETDATE(), 'London', 'W1' UNION ALL
SELECT 'Timothy', NULL, 'Taylor', 'Y', GETDATE()+1, GETDATE()+1, 'London', 'W1' UNION ALL
SELECT 'Timothy', NULL, 'Taylor', 'X', GETDATE(), GETDATE(), 'P', 'Q' UNION ALL
SELECT 'X', NULL, 'Y', 'Landlord', GETDATE(), GETDATE(), 'Z', 'A'

-- Requirement: I have to identify the duplicate records based on below criteria:
SELECT
EmpID, FirstName, MiddleName, LastName, SSN, BirthDate, HireDate, City, Zip,
Criteria1, Criteria2, Criteria3
FROM (
SELECT
EmpID, FirstName, MiddleName, LastName, SSN, BirthDate, HireDate, City, Zip,
Criteria1 = COUNT(*) OVER(PARTITION BY SSN),
Criteria2 = COUNT(*) OVER(PARTITION BY FirstName, LastName, BirthDate),
Criteria3 = COUNT(*) OVER(PARTITION BY FirstName, LastName, City, Zip)
FROM #Employee
) d
WHERE Criteria1+Criteria2+Criteria3 > 3




I could certainly be wrong but this seemed to be a school assignment which is why I wanted the OP to show what has been tried. ;-)

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
Although they tell us that they want it real bad, our primary goal is to ensure that we dont actually give it to them that way.
Although change is inevitable, change for the better is not.
Just because you can do something in PowerShell, doesnt mean you should. Wink

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search