RE: filter duplicate students via T-SQL

Hall of Fame

Points: 3560

April 24, 2013 at 8:05 pm

sdhanpaul,

The problem with the "group by" approach you propose is that it defines a duplicate row as that shares values in all the columns Student AND IDNo AND Tel3 AND Tel1. This does not meet the definition of a duplicate row stated in the original post, which is that a duplicate row is one that shares values in Student AND (IDNo OR Tel3 OR Tel1). To handle the latter definition of "duplicate" is much more complex than a basic GROUP BY clause can address.

The situation is further complicated by the fact that the value in Tel1 in one row (which I presume to be a telephone number in the actual real-world data) could be stored as Tel3 in another row and vice versa. So the test of "duplicate" requires checking not just the values within each of these columns but also between these two columns on different rows.

I think the solution I posted earlier addresses these concerns. I welcome your feedback, if you test it and find that it does not.