Duplicate Records using SQLCMD

  • Comments posted to this topic are about the item Duplicate Records using SQLCMD

  • Dear Sir/Ma'm,

    Hope this is the easiest method for deleting the duplicate records. Please go through this query:

    SET ROWCOUNT 1

    DELETE temp1 FROM temp1 a WHERE (SELECT COUNT(*) FROM temp1 b WHERE b.n = a.n) > 1

    WHILE @@rowcount > 0

    DELETE temp1 FROM temp1 a WHERE (SELECT COUNT(*) FROM temp1 b WHERE b.n = a.n ) > 1

    SET ROWCOUNT 0

  • If they are true duplicates from col 1 to the end of the record, how about a simple union to a temporary table and then replacing the table with duplicates with the temporary table? I use this method all the time. Quick and easy.:-) This won't work with a table having a unique key. In that case:

    select column into #Temp from tablewithduplicates

    group by column

    having count(*) > 1.

    select uniquekey, a.* into #Temp2 from tablewithduplicates a, #Temp b

    where a.column = b.column

    Then you can identify which records you want to delete in #Temp2.

  • First, let me take nothing from the article. It's well written and clear. Well done.

    However, the following statement seems to destroy whatever utility was intended in the code...

    Note that SETVAR cannot replace dynamic SQL in all situations because SETVAR can only accept a constant value, not a variable or expression on the right hand side,

    Further, viewing the duplicates before deleting them implies that this is going to be a manual process anyway.

    If you really want something generic, what's so bad about dynamic SQL? It can't be just the fact the everything shows up as red text especially after the code is put into production in the form of a stored procedure.

    The other problem indicated is a possible major design flaw. Unless it's a staging table for data, there should be no duplicates in any table. If it's for a staging table, then chances are that the process will be repeated in the future. What's wrong with a hard coded proc to handle the deletes for that table?

    Like I said, good article. I just don't see the need for such a thing.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (4/6/2009)


    [...] Unless it's a staging table for data, there should be no duplicates in any table. If it's for a staging table, then chances are that the process will be repeated in the future. What's wrong with a hard coded proc to handle the deletes for that table?

    Like I said, good article. I just don't see the need for such a thing.

    Alas, wish you were right... but reality has a cruel way to bite. Just found gazillions of duplicates in production... so I have a clear and immediate need !

    Yes, a hard-coded stored proc is what I'll use to clean this mess up. But I'm thankful for the pre-cooked code that so far has allowed me to devote my time to investigating the errors and assessing the extent of the cleanup.

    Many thanks Renato!

  • Thanks for your comments folks.

    I had forgotten about the SET ROWCOUNT and #tempTable methods . Those worked for all the older versions of SQL and still work now. I just like the new method because it is set based and does not need any extra objects like temp tables to housekeep. I find its easier to read and modify than dynamic sql. Simplicity is in the eye of the beholder, and most people will find the technique they already know to be the "simpler".

    I must also acknowledge Tom Huneke's script in http://www.sqlservercentral.com/scripts/CTE/62599/.

    I read his script last year and started using the technique, but forgot where the idea came from when I wrote the article.

    Jeff Moden's comment is an interesting one. Jeff must have the pleasure of always working with well designed databases.:-P. I do support and troubleshooting of custom developed systems and find the duplicates situation crops up a few times a year. Databases are often designed by application programmers and sometimes by accountants or stock market analysts. The programmers usually know that every table should have a primary key, and dutifully add a unqueidentier or identity column to every table. This does not actually solve all the duplicate problems. I recently found about 6000 duplicate expense records in a mortgage application database even though the table had a primary key on the uniqueidentier. The records should also have been unique on LoanApplicationID and ExpenseTypeID, but a program bug and a lack of another unique constraint allowed the duplicates in.

    Cheers, Renato.

  • Renato Buda (4/7/2009)


    Jeff Moden's comment is an interesting one. Jeff must have the pleasure of always working with well designed databases.:-P.

    Alas... if that were only true. I'm the one that usually has to do the cleanups and make the necessary changes to prevent future dupes.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply