• hans.reidt (1/28/2015)


    I do this on tables with > 1.000.000 records

    Something to keep in mind of course, is the number / complexity of indexes.

    A possibility is to disable all indexes, remove the duplicates and rebuild the indexes.

    Of course, when you have duplicates in your data, the problem is usually the import.

    That I try to solve with an LEFT OUTER JOIN on the unique columns.

    This way I get idempotence, which is nice

    Sure, I'm not saying it can't work. I'm saying that depending on your server's power & resources, that might not be a great option. Especially when you get to really big table, 10 million, 100 million or more. Or if you're doing this on a data import. And I don't think I'd want that code in any automated deduping procedures.

    --------------------------------------
    When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
    --------------------------------------
    It’s unpleasantly like being drunk.
    What’s so unpleasant about being drunk?
    You ask a glass of water. -- Douglas Adams