RE: Get Rid of Duplicates! – SQLServerCentral

SSC Journeyman

Points: 95

December 1, 2009 at 2:42 am

I prefer to use GROUP BY when I want to see the number of duplicates for each group of attributes used to check uniqueness, while ROW_NUMBER is a nicer solution and I use it especially when I want to use the latest or earliest entered version of the same record.

The check for duplicates might be required when merging data from two different sources or when breaking a not normalized table into multiple tables, for example a headers/lines set of tables. Another situation in which I had to check for duplicates is when importing data from non-relational sources (e.g. text files, Excel sheets, etc.) in which the chances of having duplicates are quite high.

As already stressed, it's preferable to reduce upfront the possibility of entering duplicates, unfortunately that's not always possible.

It's not always required to add unique indexes/constraints, though that was a good tip.