Duplicate Records using SQLCMD

Question

Post reply

Duplicate Records using SQLCMD

R M Buda

SSCrazy

Points: 2225
More actions
April 6, 2009 at 12:30 am

#203210

Comments posted to this topic are about the item Duplicate Records using SQLCMD

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

pradeepsripada SSC Rookie Points: 35 More actions · Answer 1

Dear Sir/Ma'm,

Hope this is the easiest method for deleting the duplicate records. Please go through this query:

SET ROWCOUNT 1

DELETE temp1 FROM temp1 a WHERE (SELECT COUNT(*) FROM temp1 b WHERE b.n = a.n) > 1

WHILE @@rowcount > 0

DELETE temp1 FROM temp1 a WHERE (SELECT COUNT(*) FROM temp1 b WHERE b.n = a.n ) > 1

SET ROWCOUNT 0

jnichols-797753 Ten Centuries Points: 1085 More actions · Answer 2

If they are true duplicates from col 1 to the end of the record, how about a simple union to a temporary table and then replacing the table with duplicates with the temporary table? I use this method all the time. Quick and easy.:-) This won't work with a table having a unique key. In that case:

select column into #Temp from tablewithduplicates

group by column

having count(*) > 1.

select uniquekey, a.* into #Temp2 from tablewithduplicates a, #Temp b

where a.column = b.column

Then you can identify which records you want to delete in #Temp2.

Jeff Moden SSC Guru Points: 1003853 More actions · Answer 3

First, let me take nothing from the article. It's well written and clear. Well done.

However, the following statement seems to destroy whatever utility was intended in the code...

Note that SETVAR cannot replace dynamic SQL in all situations because SETVAR can only accept a constant value, not a variable or expression on the right hand side,

Further, viewing the duplicates before deleting them implies that this is going to be a manual process anyway.

If you really want something generic, what's so bad about dynamic SQL? It can't be just the fact the everything shows up as red text especially after the code is put into production in the form of a stored procedure.

The other problem indicated is a possible major design flaw. Unless it's a staging table for data, there should be no duplicates in any table. If it's for a staging table, then chances are that the process will be repeated in the future. What's wrong with a hard coded proc to handle the deletes for that table?

Like I said, good article. I just don't see the need for such a thing.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Ol'SureHand SSCrazy Points: 2744 More actions · Answer 4

Jeff Moden (4/6/2009)
[...] Unless it's a staging table for data, there should be no duplicates in any table. If it's for a staging table, then chances are that the process will be repeated in the future. What's wrong with a hard coded proc to handle the deletes for that table?
Like I said, good article. I just don't see the need for such a thing.

Alas, wish you were right... but reality has a cruel way to bite. Just found gazillions of duplicates in production... so I have a clear and immediate need !

Yes, a hard-coded stored proc is what I'll use to clean this mess up. But I'm thankful for the pre-cooked code that so far has allowed me to devote my time to investigating the errors and assessing the extent of the cleanup.

Many thanks Renato!

R M Buda SSCrazy Points: 2225 More actions · Answer 5

Thanks for your comments folks.

I had forgotten about the SET ROWCOUNT and #tempTable methods . Those worked for all the older versions of SQL and still work now. I just like the new method because it is set based and does not need any extra objects like temp tables to housekeep. I find its easier to read and modify than dynamic sql. Simplicity is in the eye of the beholder, and most people will find the technique they already know to be the "simpler".

I must also acknowledge Tom Huneke's script in http://www.sqlservercentral.com/scripts/CTE/62599/.

I read his script last year and started using the technique, but forgot where the idea came from when I wrote the article.

Jeff Moden's comment is an interesting one. Jeff must have the pleasure of always working with well designed databases.:-P. I do support and troubleshooting of custom developed systems and find the duplicates situation crops up a few times a year. Databases are often designed by application programmers and sometimes by accountants or stock market analysts. The programmers usually know that every table should have a primary key, and dutifully add a unqueidentier or identity column to every table. This does not actually solve all the duplicate problems. I recently found about 6000 duplicate expense records in a mortgage application database even though the table had a primary key on the uniqueidentier. The records should also have been unique on LoanApplicationID and ExpenseTypeID, but a program bug and a lack of another unique constraint allowed the duplicates in.

Cheers, Renato.

Jeff Moden SSC Guru Points: 1003853 More actions · Answer 6

Renato Buda (4/7/2009)
Jeff Moden's comment is an interesting one. Jeff must have the pleasure of always working with well designed databases.:-P.

Alas... if that were only true. I'm the one that usually has to do the cleanups and make the necessary changes to prevent future dupes.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)