Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Removing Duplicates


Removing Duplicates

Author
Message
Greg Edwards-268690
Greg Edwards-268690
SSC Eights!
SSC Eights! (821 reputation)SSC Eights! (821 reputation)SSC Eights! (821 reputation)SSC Eights! (821 reputation)SSC Eights! (821 reputation)SSC Eights! (821 reputation)SSC Eights! (821 reputation)SSC Eights! (821 reputation)

Group: General Forum Members
Points: 821 Visits: 8267
I can see the need to understand how to do this, although I tend to go back to the basics.
If you find yourself needing to do this, question if it be designed into the the table in the first place to prevent this?
After all, I think this leads into the initial issue - without deleting all, the engine needs a way to discern which one(s) to delete.
I realize sometimes you have no input / control into this, but a few words about this concept might be a worthwhile addition.

If you wanted to expand this, age (at least to me) should be calculated, not stored in most cases.
And the changing job title also drives me towards separating out to different tables and having effectivity dates.
But that is way beyond your intended scope.

Just trying to spark a thought or two, not to make a big deal about any of this.
Sioban Krzywicki
Sioban Krzywicki
Ten Centuries
Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)

Group: General Forum Members
Points: 1140 Visits: 8067
Jeff Moden (10/8/2013)
Short, sweet, to the point with great examples and good simple explanations. Very nicely done, Mr. Krzywicki.


Thank you!

--------------------------------------
When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
--------------------------------------
It’s unpleasantly like being drunk.
What’s so unpleasant about being drunk?
You ask a glass of water. -- Douglas Adams
Jozef Moravcik
Jozef Moravcik
Forum Newbie
Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)

Group: General Forum Members
Points: 3 Visits: 10
Hi Guys,

I see there is a wrong DB structure (table structure). Ussualy each master table should have at least a primary key. Using proper primary key you avoid the problem...
Sioban Krzywicki
Sioban Krzywicki
Ten Centuries
Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)

Group: General Forum Members
Points: 1140 Visits: 8067
Greg Edwards-268690 (10/8/2013)
I can see the need to understand how to do this, although I tend to go back to the basics.
If you find yourself needing to do this, question if it be designed into the the table in the first place to prevent this?
After all, I think this leads into the initial issue - without deleting all, the engine needs a way to discern which one(s) to delete.
I realize sometimes you have no input / control into this, but a few words about this concept might be a worthwhile addition.

If you wanted to expand this, age (at least to me) should be calculated, not stored in most cases.
And the changing job title also drives me towards separating out to different tables and having effectivity dates.
But that is way beyond your intended scope.

Just trying to spark a thought or two, not to make a big deal about any of this.



It is funny, I agonized a bit over the example table. As I was putting in the columns I kept thinking "Well, if this were a real database I'd put this in another table so it could have multiple values or historical data or maybe this should be calculated" then I reminded myself that was outside the scope of this article and I just needed something to use as an example. : -)

Same thing with table design, there are situations where you have no way to prevent this ahead of time, whether it is because the duplicates are in the data coming in or because the table is already in production and the powers that be won't approve a structural change. I wanted to keep the focus on this one task as dealing with every possibility would make the article far longer.

--------------------------------------
When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
--------------------------------------
It’s unpleasantly like being drunk.
What’s so unpleasant about being drunk?
You ask a glass of water. -- Douglas Adams
Sioban Krzywicki
Sioban Krzywicki
Ten Centuries
Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)

Group: General Forum Members
Points: 1140 Visits: 8067
Jozef Moravcik (10/8/2013)
Hi Guys,

I see there is a wrong DB structure (table structure). Ussualy each master table should have at least a primary key. Using proper primary key you avoid the problem...


Sure, and if duplicate data gets into your table, you can use the primary key to remove duplicates, but this gives you a way to find what those duplicates are.
But what do you do if you don't have the ability to add the primary key? Or if you need to remove duplicates on large batches of data that are coming into your system from outside sources? This should help with any of those situations.

--------------------------------------
When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
--------------------------------------
It’s unpleasantly like being drunk.
What’s so unpleasant about being drunk?
You ask a glass of water. -- Douglas Adams
PHYData DBA
PHYData DBA
Mr or Mrs. 500
Mr or Mrs. 500 (560 reputation)Mr or Mrs. 500 (560 reputation)Mr or Mrs. 500 (560 reputation)Mr or Mrs. 500 (560 reputation)Mr or Mrs. 500 (560 reputation)Mr or Mrs. 500 (560 reputation)Mr or Mrs. 500 (560 reputation)Mr or Mrs. 500 (560 reputation)

Group: General Forum Members
Points: 560 Visits: 533
Jozef Moravcik (10/8/2013)
Hi Guys,

I see there is a wrong DB structure (table structure). Ussualy each master table should have at least a primary key. Using proper primary key you avoid the problem...

Jozef,
I agree with your point that proper Database table structure will reduce the need for this code. However that does not apply to the real world. Everyday we find table structures that can not be changed in third party products databases that need to be cleaned of duplicate records.
There are MANY databases out there that have tables that are built this way for performance and other reasons. If you do not use one of the many methods to remove the duplicates and instead update the table schema, the product could stop working, the support staff could stop working, and you as a DBA might even stop working.

In short, knowing how to remove rows is something you do as a professional DBA. Updating the SCHEMA for databases used by live applications is something done by that applications development and support staff. Cool
PHYData DBA
PHYData DBA
Mr or Mrs. 500
Mr or Mrs. 500 (560 reputation)Mr or Mrs. 500 (560 reputation)Mr or Mrs. 500 (560 reputation)Mr or Mrs. 500 (560 reputation)Mr or Mrs. 500 (560 reputation)Mr or Mrs. 500 (560 reputation)Mr or Mrs. 500 (560 reputation)Mr or Mrs. 500 (560 reputation)

Group: General Forum Members
Points: 560 Visits: 533
Stefan Krzywicki (10/8/2013)
It is funny, I agonized a bit over the example table. As I was putting in the columns I kept thinking "Well, if this were a real database I'd put this in another table so it could have multiple values or historical data or maybe this should be calculated"


Stefan, your example table is perfect for your article. So perfect that if you Google "Removing Duplicate RowNumber()" similar table structure is in most of the other articles on how to do this. Cool
lptech
lptech
Old Hand
Old Hand (347 reputation)Old Hand (347 reputation)Old Hand (347 reputation)Old Hand (347 reputation)Old Hand (347 reputation)Old Hand (347 reputation)Old Hand (347 reputation)Old Hand (347 reputation)

Group: General Forum Members
Points: 347 Visits: 3276
My current shop gets a lot of data from the mainframe, using a variety of methods to load the data into SQL Server. As such, we have many table without primary keys. Since there are occasionally situations where duplicates slip past the existing processing, with article is extremely useful.
kit-1143032
kit-1143032
Forum Newbie
Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)

Group: General Forum Members
Points: 7 Visits: 44
I am wondering if anyone has used this in a CTE? I can't delete the duplicates from my actual tables but I would like to eliminate them within my stored procedure.

Thanks,
Kate
peterzeke
peterzeke
Old Hand
Old Hand (342 reputation)Old Hand (342 reputation)Old Hand (342 reputation)Old Hand (342 reputation)Old Hand (342 reputation)Old Hand (342 reputation)Old Hand (342 reputation)Old Hand (342 reputation)

Group: General Forum Members
Points: 342 Visits: 1766
Very good article, Stefan.

Found an odd thing -- Your article displays the date of "2013/10/08" next to your name, but when selecting the "printable" version the date shows "2013/09/23"?

-- Pete



Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search