Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase «««123

filter duplicate students via T-SQL Expand / Collapse
Author
Message
Posted Wednesday, April 24, 2013 7:43 PM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Monday, August 19, 2013 8:08 PM
Points: 43, Visits: 566
first of all naming a column and the table "Students is no good!
, this gave me a duplicate and counted the number of duplicates

SELECT     Student, IDNo, Tel3,Tel1, COUNT(*) AS dupes  
FROM dbo.Student
GROUP BY Student, IDNo, Tel3,Tel1
HAVING (COUNT(*) > 1)

now if u need to find the duplicates and delete one of them:

/* Delete Duplicate records */
WITH CTE (Student, IDNo, Tel3,Tel1, DuplicateCount)
AS
(
SELECT [Student], IDNo, Tel3,Tel1,
ROW_NUMBER() OVER(PARTITION BY Student, IDNo, Tel3,Tel1 ORDER BY Student) AS DuplicateCount
FROM dbo.Student
)
DELETE
FROM CTE
WHERE DuplicateCount > 1

problem solved...


i may be going on a wild thing here ... and cant seem to understand your problem well... but i dont know..
Post #1446270
Posted Wednesday, April 24, 2013 7:48 PM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Monday, August 19, 2013 8:08 PM
Points: 43, Visits: 566
i have a problem with using numbers like 1111, 8888, 7777 etc in that sort field... whats the point of limiting a "sort" field to these numbers? i dont get it.
Post #1446271
Posted Wednesday, April 24, 2013 8:05 PM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Thursday, August 21, 2014 7:49 PM
Points: 171, Visits: 501
sdhanpaul,

The problem with the "group by" approach you propose is that it defines a duplicate row as that shares values in all the columns Student AND IDNo AND Tel3 AND Tel1. This does not meet the definition of a duplicate row stated in the original post, which is that a duplicate row is one that shares values in Student AND (IDNo OR Tel3 OR Tel1). To handle the latter definition of "duplicate" is much more complex than a basic GROUP BY clause can address.

The situation is further complicated by the fact that the value in Tel1 in one row (which I presume to be a telephone number in the actual real-world data) could be stored as Tel3 in another row and vice versa. So the test of "duplicate" requires checking not just the values within each of these columns but also between these two columns on different rows.

I think the solution I posted earlier addresses these concerns. I welcome your feedback, if you test it and find that it does not.
Post #1446272
Posted Monday, April 29, 2013 9:38 PM
SSCarpal Tunnel

SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

Group: General Forum Members
Last Login: Yesterday @ 3:25 PM
Points: 4,573, Visits: 8,354
geoff5 (4/24/2013)
I think the solution I posted earlier addresses these concerns. I welcome your feedback, if you test it and find that it does not.

That would be for Kevin to answer...
But he's gone and we're not getting to a solution of this riddle.
Such a pity...
Post #1447858
« Prev Topic | Next Topic »

Add to briefcase «««123

Permissions Expand / Collapse