Deleting Duplicate Records

Question

Post reply

Deleting Duplicate Records

Shankar Walvekar

SSC Veteran

Points: 278
More actions
November 19, 2012 at 8:08 pm

#266303

Comments posted to this topic are about the item Deleting Duplicate Records

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply

Robert-378556 SSCertifiable Points: 5917 More actions · Answer 1

It's a lot more complex problem.

In a well designed database there should be no duplicates.

The simplest solution is to have proper constraints (primary, unique key) and have trigger validation on at least natural keys

Insert into #Employee VALUES (1, 'Mark Dunn', 'HR');

Insert into #Employee VALUES (2, 'Mark.Dunn', 'HR');

Insert into #Employee VALUES (3, 'Mark Dunn', 'HR');

Based on name column, these are duplicates as well, but would pass index constraints, so you have to check on normalized form of name in insert/update trigger.

It can still happen a user (curse them!) finds a pattern not covered in your trigger logic and inserts a sort-of-duplicate.

Second part of the problem is how to merge such duplicates as they may have a lot of references.

Just a bit of food for thought. 🙂

Arnold Lieberman Valued Member Points: 57 More actions · Answer 2

Right, it shouldn't happen in a production database but will have to be dealt with during ETL processes, especially if you want to be able to report on such errors.

akljfhnlaflkj SSC Guru Points: 76202 More actions · Answer 3

We would have an identity column so wouldn't need to be using RowNumber in the query. Also we would have a unique natural key to not allow duplicates in the first place. But thanks for the interesting read.