Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

Duplicate Records using SQLCMD Expand / Collapse
Author
Message
Posted Monday, April 6, 2009 12:29 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Sunday, October 12, 2014 3:39 AM
Points: 146, Visits: 244
Comments posted to this topic are about the item Duplicate Records using SQLCMD
Post #690747
Posted Monday, April 6, 2009 6:44 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Wednesday, February 10, 2010 4:24 AM
Points: 1, Visits: 7
Dear Sir/Ma'm,
Hope this is the easiest method for deleting the duplicate records. Please go through this query:
SET ROWCOUNT 1
DELETE temp1 FROM temp1 a WHERE (SELECT COUNT(*) FROM temp1 b WHERE b.n = a.n) > 1
WHILE @@rowcount > 0
DELETE temp1 FROM temp1 a WHERE (SELECT COUNT(*) FROM temp1 b WHERE b.n = a.n ) > 1
SET ROWCOUNT 0
Post #690954
Posted Monday, April 6, 2009 6:52 AM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Monday, January 7, 2013 2:28 PM
Points: 51, Visits: 158
If they are true duplicates from col 1 to the end of the record, how about a simple union to a temporary table and then replacing the table with duplicates with the temporary table? I use this method all the time. Quick and easy. This won't work with a table having a unique key. In that case:

select column into #Temp from tablewithduplicates
group by column
having count(*) > 1.

select uniquekey, a.* into #Temp2 from tablewithduplicates a, #Temp b
where a.column = b.column

Then you can identify which records you want to delete in #Temp2.
Post #690961
Posted Monday, April 6, 2009 8:01 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Yesterday @ 1:53 PM
Points: 35,366, Visits: 31,905
First, let me take nothing from the article. It's well written and clear. Well done.

However, the following statement seems to destroy whatever utility was intended in the code...

Note that SETVAR cannot replace dynamic SQL in all situations because SETVAR can only accept a constant value, not a variable or expression on the right hand side,


Further, viewing the duplicates before deleting them implies that this is going to be a manual process anyway.

If you really want something generic, what's so bad about dynamic SQL? It can't be just the fact the everything shows up as red text especially after the code is put into production in the form of a stored procedure.

The other problem indicated is a possible major design flaw. Unless it's a staging table for data, there should be no duplicates in any table. If it's for a staging table, then chances are that the process will be repeated in the future. What's wrong with a hard coded proc to handle the deletes for that table?

Like I said, good article. I just don't see the need for such a thing.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #691029
Posted Monday, April 6, 2009 11:26 PM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Wednesday, September 3, 2014 1:12 AM
Points: 371, Visits: 716
Jeff Moden (4/6/2009)
[...] Unless it's a staging table for data, there should be no duplicates in any table. If it's for a staging table, then chances are that the process will be repeated in the future. What's wrong with a hard coded proc to handle the deletes for that table?

Like I said, good article. I just don't see the need for such a thing.

Alas, wish you were right... but reality has a cruel way to bite. Just found gazillions of duplicates in production... so I have a clear and immediate need !
Yes, a hard-coded stored proc is what I'll use to clean this mess up. But I'm thankful for the pre-cooked code that so far has allowed me to devote my time to investigating the errors and assessing the extent of the cleanup.
Many thanks Renato!
Post #691693
Posted Tuesday, April 7, 2009 6:51 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Sunday, October 12, 2014 3:39 AM
Points: 146, Visits: 244
Thanks for your comments folks.

I had forgotten about the SET ROWCOUNT and #tempTable methods . Those worked for all the older versions of SQL and still work now. I just like the new method because it is set based and does not need any extra objects like temp tables to housekeep. I find its easier to read and modify than dynamic sql. Simplicity is in the eye of the beholder, and most people will find the technique they already know to be the "simpler".

I must also acknowledge Tom Huneke's script in http://www.sqlservercentral.com/scripts/CTE/62599/.
I read his script last year and started using the technique, but forgot where the idea came from when I wrote the article.

Jeff Moden's comment is an interesting one. Jeff must have the pleasure of always working with well designed databases.. I do support and troubleshooting of custom developed systems and find the duplicates situation crops up a few times a year. Databases are often designed by application programmers and sometimes by accountants or stock market analysts. The programmers usually know that every table should have a primary key, and dutifully add a unqueidentier or identity column to every table. This does not actually solve all the duplicate problems. I recently found about 6000 duplicate expense records in a mortgage application database even though the table had a primary key on the uniqueidentier. The records should also have been unique on LoanApplicationID and ExpenseTypeID, but a program bug and a lack of another unique constraint allowed the duplicates in.

Cheers, Renato.
Post #691930
Posted Tuesday, April 7, 2009 9:31 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Yesterday @ 1:53 PM
Points: 35,366, Visits: 31,905
Renato Buda (4/7/2009)
Jeff Moden's comment is an interesting one. Jeff must have the pleasure of always working with well designed databases..


Alas... if that were only true. I'm the one that usually has to do the cleanups and make the necessary changes to prevent future dupes.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #692182
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse