SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Duplicate Records using SQLCMD


Duplicate Records using SQLCMD

Author
Message
R M Buda
R M Buda
Mr or Mrs. 500
Mr or Mrs. 500 (567 reputation)Mr or Mrs. 500 (567 reputation)Mr or Mrs. 500 (567 reputation)Mr or Mrs. 500 (567 reputation)Mr or Mrs. 500 (567 reputation)Mr or Mrs. 500 (567 reputation)Mr or Mrs. 500 (567 reputation)Mr or Mrs. 500 (567 reputation)

Group: General Forum Members
Points: 567 Visits: 303
Comments posted to this topic are about the item Duplicate Records using SQLCMD
pradeepsripada
pradeepsripada
Forum Newbie
Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)

Group: General Forum Members
Points: 7 Visits: 7
Dear Sir/Ma'm,
Hope this is the easiest method for deleting the duplicate records. Please go through this query:
SET ROWCOUNT 1
DELETE temp1 FROM temp1 a WHERE (SELECT COUNT(*) FROM temp1 b WHERE b.n = a.n) > 1
WHILE @@rowcount > 0
DELETE temp1 FROM temp1 a WHERE (SELECT COUNT(*) FROM temp1 b WHERE b.n = a.n ) > 1
SET ROWCOUNT 0
jnichols-797753
jnichols-797753
SSC-Enthusiastic
SSC-Enthusiastic (193 reputation)SSC-Enthusiastic (193 reputation)SSC-Enthusiastic (193 reputation)SSC-Enthusiastic (193 reputation)SSC-Enthusiastic (193 reputation)SSC-Enthusiastic (193 reputation)SSC-Enthusiastic (193 reputation)SSC-Enthusiastic (193 reputation)

Group: General Forum Members
Points: 193 Visits: 158
If they are true duplicates from col 1 to the end of the record, how about a simple union to a temporary table and then replacing the table with duplicates with the temporary table? I use this method all the time. Quick and easy.:-) This won't work with a table having a unique key. In that case:

select column into #Temp from tablewithduplicates
group by column
having count(*) > 1.

select uniquekey, a.* into #Temp2 from tablewithduplicates a, #Temp b
where a.column = b.column

Then you can identify which records you want to delete in #Temp2.
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (210K reputation)SSC Guru (210K reputation)SSC Guru (210K reputation)SSC Guru (210K reputation)SSC Guru (210K reputation)SSC Guru (210K reputation)SSC Guru (210K reputation)SSC Guru (210K reputation)

Group: General Forum Members
Points: 210499 Visits: 41973
First, let me take nothing from the article. It's well written and clear. Well done.

However, the following statement seems to destroy whatever utility was intended in the code...

Note that SETVAR cannot replace dynamic SQL in all situations because SETVAR can only accept a constant value, not a variable or expression on the right hand side,


Further, viewing the duplicates before deleting them implies that this is going to be a manual process anyway.

If you really want something generic, what's so bad about dynamic SQL? It can't be just the fact the everything shows up as red text especially after the code is put into production in the form of a stored procedure.

The other problem indicated is a possible major design flaw. Unless it's a staging table for data, there should be no duplicates in any table. If it's for a staging table, then chances are that the process will be repeated in the future. What's wrong with a hard coded proc to handle the deletes for that table?

Like I said, good article. I just don't see the need for such a thing.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Ol'SureHand
Ol'SureHand
SSC Eights!
SSC Eights! (829 reputation)SSC Eights! (829 reputation)SSC Eights! (829 reputation)SSC Eights! (829 reputation)SSC Eights! (829 reputation)SSC Eights! (829 reputation)SSC Eights! (829 reputation)SSC Eights! (829 reputation)

Group: General Forum Members
Points: 829 Visits: 720
Jeff Moden (4/6/2009)
[...] Unless it's a staging table for data, there should be no duplicates in any table. If it's for a staging table, then chances are that the process will be repeated in the future. What's wrong with a hard coded proc to handle the deletes for that table?

Like I said, good article. I just don't see the need for such a thing.

Alas, wish you were right... but reality has a cruel way to bite. Just found gazillions of duplicates in production... so I have a clear and immediate need !
Yes, a hard-coded stored proc is what I'll use to clean this mess up. But I'm thankful for the pre-cooked code that so far has allowed me to devote my time to investigating the errors and assessing the extent of the cleanup.
Many thanks Renato!
R M Buda
R M Buda
Mr or Mrs. 500
Mr or Mrs. 500 (567 reputation)Mr or Mrs. 500 (567 reputation)Mr or Mrs. 500 (567 reputation)Mr or Mrs. 500 (567 reputation)Mr or Mrs. 500 (567 reputation)Mr or Mrs. 500 (567 reputation)Mr or Mrs. 500 (567 reputation)Mr or Mrs. 500 (567 reputation)

Group: General Forum Members
Points: 567 Visits: 303
Thanks for your comments folks.

I had forgotten about the SET ROWCOUNT and #tempTable methods . Those worked for all the older versions of SQL and still work now. I just like the new method because it is set based and does not need any extra objects like temp tables to housekeep. I find its easier to read and modify than dynamic sql. Simplicity is in the eye of the beholder, and most people will find the technique they already know to be the "simpler".

I must also acknowledge Tom Huneke's script in http://www.sqlservercentral.com/scripts/CTE/62599/.
I read his script last year and started using the technique, but forgot where the idea came from when I wrote the article.

Jeff Moden's comment is an interesting one. Jeff must have the pleasure of always working with well designed databases.:-P. I do support and troubleshooting of custom developed systems and find the duplicates situation crops up a few times a year. Databases are often designed by application programmers and sometimes by accountants or stock market analysts. The programmers usually know that every table should have a primary key, and dutifully add a unqueidentier or identity column to every table. This does not actually solve all the duplicate problems. I recently found about 6000 duplicate expense records in a mortgage application database even though the table had a primary key on the uniqueidentier. The records should also have been unique on LoanApplicationID and ExpenseTypeID, but a program bug and a lack of another unique constraint allowed the duplicates in.

Cheers, Renato.
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (210K reputation)SSC Guru (210K reputation)SSC Guru (210K reputation)SSC Guru (210K reputation)SSC Guru (210K reputation)SSC Guru (210K reputation)SSC Guru (210K reputation)SSC Guru (210K reputation)

Group: General Forum Members
Points: 210499 Visits: 41973
Renato Buda (4/7/2009)
Jeff Moden's comment is an interesting one. Jeff must have the pleasure of always working with well designed databases.:-P.


Alas... if that were only true. I'm the one that usually has to do the cleanups and make the necessary changes to prevent future dupes.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search