Deleting Duplicate Records

  • Comments posted to this topic are about the item Deleting Duplicate Records

  • It's a lot more complex problem.

    In a well designed database there should be no duplicates.

    The simplest solution is to have proper constraints (primary, unique key) and have trigger validation on at least natural keys

    Insert into #Employee VALUES (1, 'Mark Dunn', 'HR');

    Insert into #Employee VALUES (2, 'Mark.Dunn', 'HR');

    Insert into #Employee VALUES (3, 'Mark Dunn', 'HR');

    Based on name column, these are duplicates as well, but would pass index constraints, so you have to check on normalized form of name in insert/update trigger.

    It can still happen a user (curse them!) finds a pattern not covered in your trigger logic and inserts a sort-of-duplicate.

    Second part of the problem is how to merge such duplicates as they may have a lot of references.

    Just a bit of food for thought. 🙂

  • Right, it shouldn't happen in a production database but will have to be dealt with during ETL processes, especially if you want to be able to report on such errors.

  • We would have an identity column so wouldn't need to be using RowNumber in the query. Also we would have a unique natural key to not allow duplicates in the first place. But thanks for the interesting read.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply