SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Identify Duplicate Records according to Multiple Criteria


Identify Duplicate Records according to Multiple Criteria

Author
Message
Sy-1148362
Sy-1148362
Valued Member
Valued Member (55 reputation)Valued Member (55 reputation)Valued Member (55 reputation)Valued Member (55 reputation)Valued Member (55 reputation)Valued Member (55 reputation)Valued Member (55 reputation)Valued Member (55 reputation)

Group: General Forum Members
Points: 55 Visits: 275
Hi There!
I have an Employee table with Columns:

EmpID | FirstName | MiddleName | LastName | SSN | BirthDate | HireDate | City | Zip


Requirement: I have to identify the duplicate records based on below criteria:
Criteria1: SSN
Criteria2: FirstName, LastName, BirthDate
Criteria2: FirstName, LastName, City, Zip

Please tell how to accomplish this task........Thanks in Advance
PravB4u
PravB4u
SSC-Enthusiastic
SSC-Enthusiastic (191 reputation)SSC-Enthusiastic (191 reputation)SSC-Enthusiastic (191 reputation)SSC-Enthusiastic (191 reputation)SSC-Enthusiastic (191 reputation)SSC-Enthusiastic (191 reputation)SSC-Enthusiastic (191 reputation)SSC-Enthusiastic (191 reputation)

Group: General Forum Members
Points: 191 Visits: 603
Hi,
do you want to accomplish all three conditions in single query or you need three different queries.



Praveen D'sa
MCITP - Database Administrator 2008
http://sqlerrors.wordpress.com
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (88K reputation)SSC Guru (88K reputation)SSC Guru (88K reputation)SSC Guru (88K reputation)SSC Guru (88K reputation)SSC Guru (88K reputation)SSC Guru (88K reputation)SSC Guru (88K reputation)

Group: General Forum Members
Points: 88626 Visits: 41130
Sy-1148362 (11/30/2013)
Hi There!
I have an Employee table with Columns:

EmpID | FirstName | MiddleName | LastName | SSN | BirthDate | HireDate | City | Zip


Requirement: I have to identify the duplicate records based on below criteria:
Criteria1: SSN
Criteria2: FirstName, LastName, BirthDate
Criteria2: FirstName, LastName, City, Zip

Please tell how to accomplish this task........Thanks in Advance


You'll get much more help if you show what you've tried.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
CKX
CKX
SSC-Enthusiastic
SSC-Enthusiastic (193 reputation)SSC-Enthusiastic (193 reputation)SSC-Enthusiastic (193 reputation)SSC-Enthusiastic (193 reputation)SSC-Enthusiastic (193 reputation)SSC-Enthusiastic (193 reputation)SSC-Enthusiastic (193 reputation)SSC-Enthusiastic (193 reputation)

Group: General Forum Members
Points: 193 Visits: 1014
Sy-1148362 (11/30/2013)
Hi There!
I have an Employee table with Columns:

EmpID | FirstName | MiddleName | LastName | SSN | BirthDate | HireDate | City | Zip


Requirement: I have to identify the duplicate records based on below criteria:
Criteria1: SSN
Criteria2: FirstName, LastName, BirthDate
Criteria2: FirstName, LastName, City, Zip

Please tell how to accomplish this task........Thanks in Advance


A search on this site would show you quite a few ways to identify duplicates.
The use of ROW_NUMBER or GROUP BY being a couple that come to mind.
With your criteria specified multiple times though you're not quite clear on how you would like to handle this.
Maybe posting what you've tried as has been suggested would give an idea of what you are actually trying to do.
ChrisM@Work
ChrisM@Work
SSCoach
SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)

Group: General Forum Members
Points: 16702 Visits: 19557
Here's the simplest way. Without further information it's unlikely to be the best.
-- I have an Employee table with Columns:
DROP TABLE #Employee
CREATE TABLE #Employee (EmpID INT IDENTITY(1,1), FirstName VARCHAR(100), MiddleName VARCHAR(100), LastName VARCHAR(100),
SSN VARCHAR(25), BirthDate DATE, HireDate DATE, City VARCHAR(100), Zip VARCHAR(10))

INSERT INTO #Employee (FirstName, MiddleName, LastName, SSN, BirthDate, HireDate, City, Zip)
SELECT 'John', NULL, 'Smith', 'Smooth bitter', GETDATE(), GETDATE(), 'London', 'W1' UNION ALL
SELECT 'Timothy', NULL, 'Taylor', 'Landlord', GETDATE(), GETDATE(), 'London', 'W1' UNION ALL
SELECT 'Timothy', NULL, 'Taylor', 'Y', GETDATE()+1, GETDATE()+1, 'London', 'W1' UNION ALL
SELECT 'Timothy', NULL, 'Taylor', 'X', GETDATE(), GETDATE(), 'P', 'Q' UNION ALL
SELECT 'X', NULL, 'Y', 'Landlord', GETDATE(), GETDATE(), 'Z', 'A'

-- Requirement: I have to identify the duplicate records based on below criteria:
SELECT
EmpID, FirstName, MiddleName, LastName, SSN, BirthDate, HireDate, City, Zip,
Criteria1, Criteria2, Criteria3
FROM (
SELECT
EmpID, FirstName, MiddleName, LastName, SSN, BirthDate, HireDate, City, Zip,
Criteria1 = COUNT(*) OVER(PARTITION BY SSN),
Criteria2 = COUNT(*) OVER(PARTITION BY FirstName, LastName, BirthDate),
Criteria3 = COUNT(*) OVER(PARTITION BY FirstName, LastName, City, Zip)
FROM #Employee
) d
WHERE Criteria1+Criteria2+Criteria3 > 3




“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Exploring Recursive CTEs by Example Dwain Camps
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (88K reputation)SSC Guru (88K reputation)SSC Guru (88K reputation)SSC Guru (88K reputation)SSC Guru (88K reputation)SSC Guru (88K reputation)SSC Guru (88K reputation)SSC Guru (88K reputation)

Group: General Forum Members
Points: 88626 Visits: 41130
ChrisM@Work (12/2/2013)
Here's the simplest way. Without further information it's unlikely to be the best.
-- I have an Employee table with Columns:
DROP TABLE #Employee
CREATE TABLE #Employee (EmpID INT IDENTITY(1,1), FirstName VARCHAR(100), MiddleName VARCHAR(100), LastName VARCHAR(100),
SSN VARCHAR(25), BirthDate DATE, HireDate DATE, City VARCHAR(100), Zip VARCHAR(10))

INSERT INTO #Employee (FirstName, MiddleName, LastName, SSN, BirthDate, HireDate, City, Zip)
SELECT 'John', NULL, 'Smith', 'Smooth bitter', GETDATE(), GETDATE(), 'London', 'W1' UNION ALL
SELECT 'Timothy', NULL, 'Taylor', 'Landlord', GETDATE(), GETDATE(), 'London', 'W1' UNION ALL
SELECT 'Timothy', NULL, 'Taylor', 'Y', GETDATE()+1, GETDATE()+1, 'London', 'W1' UNION ALL
SELECT 'Timothy', NULL, 'Taylor', 'X', GETDATE(), GETDATE(), 'P', 'Q' UNION ALL
SELECT 'X', NULL, 'Y', 'Landlord', GETDATE(), GETDATE(), 'Z', 'A'

-- Requirement: I have to identify the duplicate records based on below criteria:
SELECT
EmpID, FirstName, MiddleName, LastName, SSN, BirthDate, HireDate, City, Zip,
Criteria1, Criteria2, Criteria3
FROM (
SELECT
EmpID, FirstName, MiddleName, LastName, SSN, BirthDate, HireDate, City, Zip,
Criteria1 = COUNT(*) OVER(PARTITION BY SSN),
Criteria2 = COUNT(*) OVER(PARTITION BY FirstName, LastName, BirthDate),
Criteria3 = COUNT(*) OVER(PARTITION BY FirstName, LastName, City, Zip)
FROM #Employee
) d
WHERE Criteria1+Criteria2+Criteria3 > 3




I could certainly be wrong but this seemed to be a school assignment which is why I wanted the OP to show what has been tried. ;-)

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search