Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 123»»»

filter duplicate students via T-SQL Expand / Collapse
Author
Message
Posted Friday, April 19, 2013 6:52 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Yesterday @ 3:31 PM
Points: 161, Visits: 461
/*

I need to filter for duplicate students:
---------------------------------------
For same student, if one or both instance/s of IDNo is null then do the following:

1.
if both instances of Tel3 are identical for same Student (Alison), if they are
use it to populate field Sort with the value, 1111.

2.
if both instances of Tel3 are not identical for same Student (Carmen), check if both instances of Tel1 are identical for same Student, if they are use it to populate field Sort with the value, 8888.

3.
if both instances of Tel3 and Tel1 are not identical for same Student (Amber), check if an instance of Tel3 & Tel1 are identical for same Student, if they are use it to populate field Sort with the value, 7777.

Since there is no match between 1st row and 2nd row (except for the name) we can't say it's the
same user for sure, so don't populate the field Sort with any value where row = ( 'Amber' as Student, NULL as IDNo, '5555' as Tel3, '6666' as Tel1 ).

*/

SELECT 'Alison' as Student, NULL as IDNo, '1111' as Tel3, '4444' as Tel1, NULL as Sort UNION ALL
SELECT 'Alison' as Student, '80405088' as IDNo, '1111' as Tel3, '4444' as Tel1, NULL as Sort UNION ALL

SELECT 'Amber' as Student, NULL as IDNo, '5555' as Tel3, '6666' as Tel1, NULL as Sort UNION ALL
SELECT 'Amber' as Student, NULL as IDNo, '7777' as Tel3, '7777' as Tel1, NULL as Sort UNION ALL

SELECT 'Carmen' as Student, NULL as IDNo, '8888' as Tel3, '8888' as Tel1, NULL as Sort UNION ALL
SELECT 'Carmen' as Student, '20083085' as IDNo, '9999' as Tel3, '8888' as Tel1, NULL as Sort



***
Apologies for duplicate posting, when trying to post initially I got DNS errors from Internet Explorer, so re-posted, again DNS errors, etc. After the 3rd attempt my post was successful, only to notice more entries posted.
***
Post #1444358
Posted Friday, April 19, 2013 7:23 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Yesterday @ 9:26 PM
Points: 12,990, Visits: 12,398
This would be a lot easier with ddl (create table statements) and sample data (insert statements) along with the desired output based on the sample data. Please take a few minutes and read the first article in my signature for best practices when posting questions.

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Moden's splitter.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs
Understanding and Using APPLY (Part 1)
Understanding and Using APPLY (Part 2)
Post #1444382
Posted Friday, April 19, 2013 7:49 PM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Monday, August 19, 2013 8:08 PM
Points: 43, Visits: 566
SELECT     EmpPin, trxdate, trxtime, COUNT(*) AS dupes  
FROM dbo.rawtrx
GROUP BY EmpPin, trxdate, trxtime
HAVING (COUNT(*) > 1)

play around with this query
Post #1444662
Posted Friday, April 19, 2013 7:55 PM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Monday, August 19, 2013 8:08 PM
Points: 43, Visits: 566
okay, after reading further your questions and scenarios... i doubt that my last post would work...

how many times do you have the same student recurring? do you ever have a situation like this?

kyle       null     1111     1111
kyle null 1111 1112
kyle 111 1111 1111


Post #1444663
Posted Saturday, April 20, 2013 4:41 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Yesterday @ 3:31 PM
Points: 161, Visits: 461
sdhanpaul,

Prior to removing duplicate instance of user, I first want to get list of duplicate users, to do that I need to filter them and make sure they really are duplicates.
I can filter them by IDNo - unfortunately more than half of the users with same name (firstname + surname) don't have IDNo, so I am forced to check TelephoneNo3 and TelephoneNo1 fields to try and identify if they really are the same user/s. As in my example above, the first row for user Alison is not identical to the second row for user Alison, same applied to the other users. So using COUNT(*) and HAVING (COUNT(*) > 1) clause won't do the job.
Post #1444700
Posted Sunday, April 21, 2013 5:26 PM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Monday, August 19, 2013 8:08 PM
Points: 43, Visits: 566
how about if you join the fields?

Try this:

WITH CTE (emppin,trxdate, trxtime, DuplicateCount)
AS
(
SELECT emppin,trxdate, trxtime,
ROW_NUMBER() OVER(PARTITION BY emppin,trxdate, trxtime ORDER BY emppin) AS DuplicateCount
FROM dbo.rawtrx
)
DELETE
FROM CTE
WHERE DuplicateCount > 1

you need to replace the fields etc

this however deletes the duplicates... im really not much of a bottle feeder so you need to do the modifications.
Post #1444829
Posted Sunday, April 21, 2013 10:06 PM
SSChasing Mays

SSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing Mays

Group: General Forum Members
Last Login: Tuesday, September 23, 2014 9:06 PM
Points: 658, Visits: 1,541
I would like to follow this, but I don't understand the sentence "if they are use it to populate field Sort with the value". I don't see the Sort column populated at all for any of the examples.

Did the replies here already help you?
Post #1444845
Posted Monday, April 22, 2013 7:40 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Yesterday @ 9:26 PM
Points: 12,990, Visits: 12,398
Notice how people just keep taking random attempts at helping you. If you would take a few minutes and post ddl, sample data and desired output you would have a tested answer quickly. This really isn't too difficult but without something to work with it is just guessing.

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Moden's splitter.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs
Understanding and Using APPLY (Part 1)
Understanding and Using APPLY (Part 2)
Post #1444988
Posted Monday, April 22, 2013 3:26 PM


SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Friday, September 26, 2014 3:23 PM
Points: 2,116, Visits: 6,439
Apart from what Sean said (i.e. the missing DDL, which is basically table create statements plus some INSERT statements of sample data to create your scenario), I assume you also have a Tel2 column in that table? Not a good idea, since it makes your database design incredibly inflexible. Read up on Normalization (start with this Wikipedia article or read through Tom Thomson's really great series on Normalization on this site. By the way, is this an exam question?


--------------------------------------------------------------------------
A little knowledge is a dangerous thing (Alexander Pope)

In order for us to help you as efficiently as possible, please read this before posting (courtesy of Jeff Moden)
Post #1445207
Posted Monday, April 22, 2013 9:10 PM
SSCarpal Tunnel

SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

Group: General Forum Members
Last Login: Thursday, September 18, 2014 11:10 PM
Points: 4,573, Visits: 8,351
First, you need to have some kind of ID on the records in the table, otherwise each record will be always matched to itself.

Assuming, you have such ID here is a test table (which should be really provided by you in the initial post):

--	DROP TABLE #Student
SELECT IDENTITY(int, 1,1) RowID, *
INTO #Student
FROM (
SELECT 'Alison' as Student, NULL as IDNo, '1111' as Tel3, '4444' as Tel1, NULL as Sort UNION ALL
SELECT 'Alison' as Student, '80405088' as IDNo, '1111' as Tel3, '4444' as Tel1, NULL as Sort UNION ALL

SELECT 'Amber' as Student, NULL as IDNo, '5555' as Tel3, '6666' as Tel1, NULL as Sort UNION ALL
SELECT 'Amber' as Student, NULL as IDNo, '7777' as Tel3, '7777' as Tel1, NULL as Sort UNION ALL

SELECT 'Carmen' as Student, NULL as IDNo, '8888' as Tel3, '8888' as Tel1, NULL as Sort UNION ALL
SELECT 'Carmen' as Student, '20083085' as IDNo, '9999' as Tel3, '8888' as Tel1, NULL as Sort
) DT (Student, IDNo, Tel3, Tel1, Sort)

And here is the query returning matching pairs with appropriate Sort value:

SELECT  T1.Student, T1.RowID, T1.IDNo, T1.Tel3, T1.Tel1, 
CASE
WHEN T2.Tel3 = T1.Tel3 THEN 1111
WHEN T2.Tel1 = T1.Tel1 THEN 8888
WHEN T2.Tel1 = T1.Tel3 OR T2.Tel3 = T1.Tel1 THEN 7777
END Sort
, T2.RowID,T2.IDNo, T2.Tel3, T2.Tel1
FROM #Student T1
INNER JOIN #Student T2 ON T2.Student = T1.Student AND T2.RowID <> T1.RowID AND T2.IDNo IS NULL AND (
T2.Tel3 = T1.Tel3
OR T2.Tel1 = T1.Tel1
OR T2.Tel1 = T1.Tel3 OR T2.Tel3 = T1.Tel1
)

Assuming that it's the lines with IDNo which must be marked as duplicates, I apply the Sort status to T2 entries:

UPDATE T2
SET Sort = CASE
WHEN T2.Tel3 = T1.Tel3 THEN 1111
WHEN T2.Tel1 = T1.Tel1 THEN 8888
WHEN T2.Tel1 = T1.Tel3 OR T2.Tel3 = T1.Tel1 THEN 7777
END
FROM ..

Is it close to what you need?
Post #1445253
« Prev Topic | Next Topic »

Add to briefcase 123»»»

Permissions Expand / Collapse