Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


filter duplicate students via T-SQL


filter duplicate students via T-SQL

Author
Message
kevin_nikolai
kevin_nikolai
SSC-Enthusiastic
SSC-Enthusiastic (191 reputation)SSC-Enthusiastic (191 reputation)SSC-Enthusiastic (191 reputation)SSC-Enthusiastic (191 reputation)SSC-Enthusiastic (191 reputation)SSC-Enthusiastic (191 reputation)SSC-Enthusiastic (191 reputation)SSC-Enthusiastic (191 reputation)

Group: General Forum Members
Points: 191 Visits: 511
/*

I need to filter for duplicate students:
---------------------------------------
For same student, if one or both instance/s of IDNo is null then do the following:

1.
if both instances of Tel3 are identical for same Student (Alison), if they are
use it to populate field Sort with the value, 1111.

2.
if both instances of Tel3 are not identical for same Student (Carmen), check if both instances of Tel1 are identical for same Student, if they are use it to populate field Sort with the value, 8888.

3.
if both instances of Tel3 and Tel1 are not identical for same Student (Amber), check if an instance of Tel3 & Tel1 are identical for same Student, if they are use it to populate field Sort with the value, 7777.

Since there is no match between 1st row and 2nd row (except for the name) we can't say it's the
same user for sure, so don't populate the field Sort with any value where row = ( 'Amber' as Student, NULL as IDNo, '5555' as Tel3, '6666' as Tel1 ).

*/

SELECT 'Alison' as Student, NULL as IDNo, '1111' as Tel3, '4444' as Tel1, NULL as Sort UNION ALL
SELECT 'Alison' as Student, '80405088' as IDNo, '1111' as Tel3, '4444' as Tel1, NULL as Sort UNION ALL

SELECT 'Amber' as Student, NULL as IDNo, '5555' as Tel3, '6666' as Tel1, NULL as Sort UNION ALL
SELECT 'Amber' as Student, NULL as IDNo, '7777' as Tel3, '7777' as Tel1, NULL as Sort UNION ALL

SELECT 'Carmen' as Student, NULL as IDNo, '8888' as Tel3, '8888' as Tel1, NULL as Sort UNION ALL
SELECT 'Carmen' as Student, '20083085' as IDNo, '9999' as Tel3, '8888' as Tel1, NULL as Sort



***
Apologies for duplicate posting, when trying to post initially I got DNS errors from Internet Explorer, so re-posted, again DNS errors, etc. After the 3rd attempt my post was successful, only to notice more entries posted.
***
Sean Lange
Sean Lange
SSCoach
SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)

Group: General Forum Members
Points: 16624 Visits: 17024
This would be a lot easier with ddl (create table statements) and sample data (insert statements) along with the desired output based on the sample data. Please take a few minutes and read the first article in my signature for best practices when posting questions.

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Moden's splitter.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs
Understanding and Using APPLY (Part 1)
Understanding and Using APPLY (Part 2)
sdhanpaul
sdhanpaul
SSC Rookie
SSC Rookie (43 reputation)SSC Rookie (43 reputation)SSC Rookie (43 reputation)SSC Rookie (43 reputation)SSC Rookie (43 reputation)SSC Rookie (43 reputation)SSC Rookie (43 reputation)SSC Rookie (43 reputation)

Group: General Forum Members
Points: 43 Visits: 566
SELECT     EmpPin, trxdate, trxtime, COUNT(*) AS dupes  
FROM dbo.rawtrx
GROUP BY EmpPin, trxdate, trxtime
HAVING (COUNT(*) > 1)



play around with this query
sdhanpaul
sdhanpaul
SSC Rookie
SSC Rookie (43 reputation)SSC Rookie (43 reputation)SSC Rookie (43 reputation)SSC Rookie (43 reputation)SSC Rookie (43 reputation)SSC Rookie (43 reputation)SSC Rookie (43 reputation)SSC Rookie (43 reputation)

Group: General Forum Members
Points: 43 Visits: 566
okay, after reading further your questions and scenarios... i doubt that my last post would work...

how many times do you have the same student recurring? do you ever have a situation like this?


kyle null 1111 1111
kyle null 1111 1112
kyle 111 1111 1111



kevin_nikolai
kevin_nikolai
SSC-Enthusiastic
SSC-Enthusiastic (191 reputation)SSC-Enthusiastic (191 reputation)SSC-Enthusiastic (191 reputation)SSC-Enthusiastic (191 reputation)SSC-Enthusiastic (191 reputation)SSC-Enthusiastic (191 reputation)SSC-Enthusiastic (191 reputation)SSC-Enthusiastic (191 reputation)

Group: General Forum Members
Points: 191 Visits: 511
sdhanpaul,

Prior to removing duplicate instance of user, I first want to get list of duplicate users, to do that I need to filter them and make sure they really are duplicates.
I can filter them by IDNo - unfortunately more than half of the users with same name (firstname + surname) don't have IDNo, so I am forced to check TelephoneNo3 and TelephoneNo1 fields to try and identify if they really are the same user/s. As in my example above, the first row for user Alison is not identical to the second row for user Alison, same applied to the other users. So using COUNT(*) and HAVING (COUNT(*) > 1) clause won't do the job.
sdhanpaul
sdhanpaul
SSC Rookie
SSC Rookie (43 reputation)SSC Rookie (43 reputation)SSC Rookie (43 reputation)SSC Rookie (43 reputation)SSC Rookie (43 reputation)SSC Rookie (43 reputation)SSC Rookie (43 reputation)SSC Rookie (43 reputation)

Group: General Forum Members
Points: 43 Visits: 566
how about if you join the fields?

Try this:

WITH CTE (emppin,trxdate, trxtime, DuplicateCount)
AS
(
SELECT emppin,trxdate, trxtime,
ROW_NUMBER() OVER(PARTITION BY emppin,trxdate, trxtime ORDER BY emppin) AS DuplicateCount
FROM dbo.rawtrx
)
DELETE
FROM CTE
WHERE DuplicateCount > 1



you need to replace the fields etc

this however deletes the duplicates... im really not much of a bottle feeder so you need to do the modifications.
KoldCoffee
KoldCoffee
SSC Eights!
SSC Eights! (839 reputation)SSC Eights! (839 reputation)SSC Eights! (839 reputation)SSC Eights! (839 reputation)SSC Eights! (839 reputation)SSC Eights! (839 reputation)SSC Eights! (839 reputation)SSC Eights! (839 reputation)

Group: General Forum Members
Points: 839 Visits: 1905
I would like to follow this, but I don't understand the sentence "if they are use it to populate field Sort with the value". I don't see the Sort column populated at all for any of the examples.

Did the replies here already help you?
Sean Lange
Sean Lange
SSCoach
SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)

Group: General Forum Members
Points: 16624 Visits: 17024
Notice how people just keep taking random attempts at helping you. If you would take a few minutes and post ddl, sample data and desired output you would have a tested answer quickly. This really isn't too difficult but without something to work with it is just guessing.

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Moden's splitter.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs
Understanding and Using APPLY (Part 1)
Understanding and Using APPLY (Part 2)
Jan Van der Eecken
Jan Van der Eecken
SSCrazy
SSCrazy (2.2K reputation)SSCrazy (2.2K reputation)SSCrazy (2.2K reputation)SSCrazy (2.2K reputation)SSCrazy (2.2K reputation)SSCrazy (2.2K reputation)SSCrazy (2.2K reputation)SSCrazy (2.2K reputation)

Group: General Forum Members
Points: 2214 Visits: 6490
Apart from what Sean said (i.e. the missing DDL, which is basically table create statements plus some INSERT statements of sample data to create your scenario), I assume you also have a Tel2 column in that table? Not a good idea, since it makes your database design incredibly inflexible. Read up on Normalization (start with this Wikipedia article or read through Tom Thomson's really great series on Normalization on this site. By the way, is this an exam question?

--------------------------------------------------------------------------
A little knowledge is a dangerous thing (Alexander Pope)

In order for us to help you as efficiently as possible, please read this before posting (courtesy of Jeff Moden)
Sergiy
Sergiy
SSCertifiable
SSCertifiable (5.8K reputation)SSCertifiable (5.8K reputation)SSCertifiable (5.8K reputation)SSCertifiable (5.8K reputation)SSCertifiable (5.8K reputation)SSCertifiable (5.8K reputation)SSCertifiable (5.8K reputation)SSCertifiable (5.8K reputation)

Group: General Forum Members
Points: 5846 Visits: 11406
First, you need to have some kind of ID on the records in the table, otherwise each record will be always matched to itself.

Assuming, you have such ID here is a test table (which should be really provided by you in the initial post):

--   DROP TABLE #Student
SELECT IDENTITY(int, 1,1) RowID, *
INTO #Student
FROM (
SELECT 'Alison' as Student, NULL as IDNo, '1111' as Tel3, '4444' as Tel1, NULL as Sort UNION ALL
SELECT 'Alison' as Student, '80405088' as IDNo, '1111' as Tel3, '4444' as Tel1, NULL as Sort UNION ALL

SELECT 'Amber' as Student, NULL as IDNo, '5555' as Tel3, '6666' as Tel1, NULL as Sort UNION ALL
SELECT 'Amber' as Student, NULL as IDNo, '7777' as Tel3, '7777' as Tel1, NULL as Sort UNION ALL

SELECT 'Carmen' as Student, NULL as IDNo, '8888' as Tel3, '8888' as Tel1, NULL as Sort UNION ALL
SELECT 'Carmen' as Student, '20083085' as IDNo, '9999' as Tel3, '8888' as Tel1, NULL as Sort
) DT (Student, IDNo, Tel3, Tel1, Sort)



And here is the query returning matching pairs with appropriate Sort value:

SELECT  T1.Student, T1.RowID, T1.IDNo, T1.Tel3, T1.Tel1, 
CASE
WHEN T2.Tel3 = T1.Tel3 THEN 1111
WHEN T2.Tel1 = T1.Tel1 THEN 8888
WHEN T2.Tel1 = T1.Tel3 OR T2.Tel3 = T1.Tel1 THEN 7777
END Sort
, T2.RowID,T2.IDNo, T2.Tel3, T2.Tel1
FROM #Student T1
INNER JOIN #Student T2 ON T2.Student = T1.Student AND T2.RowID <> T1.RowID AND T2.IDNo IS NULL AND (
T2.Tel3 = T1.Tel3
OR T2.Tel1 = T1.Tel1
OR T2.Tel1 = T1.Tel3 OR T2.Tel3 = T1.Tel1
)



Assuming that it's the lines with IDNo which must be marked as duplicates, I apply the Sort status to T2 entries:

UPDATE T2
SET Sort = CASE
WHEN T2.Tel3 = T1.Tel3 THEN 1111
WHEN T2.Tel1 = T1.Tel1 THEN 8888
WHEN T2.Tel1 = T1.Tel3 OR T2.Tel3 = T1.Tel1 THEN 7777
END
FROM ..



Is it close to what you need?
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search