Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««1234»»»

Removing Duplicates Expand / Collapse
Author
Message
Posted Tuesday, October 8, 2013 8:41 AM


Say Hey Kid

Say Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey Kid

Group: General Forum Members
Last Login: Friday, September 26, 2014 1:39 PM
Points: 670, Visits: 6,720
I can see the need to understand how to do this, although I tend to go back to the basics.
If you find yourself needing to do this, question if it be designed into the the table in the first place to prevent this?
After all, I think this leads into the initial issue - without deleting all, the engine needs a way to discern which one(s) to delete.
I realize sometimes you have no input / control into this, but a few words about this concept might be a worthwhile addition.

If you wanted to expand this, age (at least to me) should be calculated, not stored in most cases.
And the changing job title also drives me towards separating out to different tables and having effectivity dates.
But that is way beyond your intended scope.

Just trying to spark a thought or two, not to make a big deal about any of this.

Post #1502654
Posted Tuesday, October 8, 2013 8:48 AM
SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Yesterday @ 1:01 PM
Points: 830, Visits: 7,325
Jeff Moden (10/8/2013)
Short, sweet, to the point with great examples and good simple explanations. Very nicely done, Mr. Krzywicki.


Thank you!


--------------------------------------
When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
--------------------------------------
It’s unpleasantly like being drunk.
What’s so unpleasant about being drunk?
You ask a glass of water. -- Douglas Adams
Post #1502660
Posted Tuesday, October 8, 2013 8:53 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Sunday, February 16, 2014 11:16 PM
Points: 1, Visits: 10
Hi Guys,

I see there is a wrong DB structure (table structure). Ussualy each master table should have at least a primary key. Using proper primary key you avoid the problem...
Post #1502663
Posted Tuesday, October 8, 2013 8:53 AM
SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Yesterday @ 1:01 PM
Points: 830, Visits: 7,325
Greg Edwards-268690 (10/8/2013)
I can see the need to understand how to do this, although I tend to go back to the basics.
If you find yourself needing to do this, question if it be designed into the the table in the first place to prevent this?
After all, I think this leads into the initial issue - without deleting all, the engine needs a way to discern which one(s) to delete.
I realize sometimes you have no input / control into this, but a few words about this concept might be a worthwhile addition.

If you wanted to expand this, age (at least to me) should be calculated, not stored in most cases.
And the changing job title also drives me towards separating out to different tables and having effectivity dates.
But that is way beyond your intended scope.

Just trying to spark a thought or two, not to make a big deal about any of this.



It is funny, I agonized a bit over the example table. As I was putting in the columns I kept thinking "Well, if this were a real database I'd put this in another table so it could have multiple values or historical data or maybe this should be calculated" then I reminded myself that was outside the scope of this article and I just needed something to use as an example. : -)

Same thing with table design, there are situations where you have no way to prevent this ahead of time, whether it is because the duplicates are in the data coming in or because the table is already in production and the powers that be won't approve a structural change. I wanted to keep the focus on this one task as dealing with every possibility would make the article far longer.


--------------------------------------
When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
--------------------------------------
It’s unpleasantly like being drunk.
What’s so unpleasant about being drunk?
You ask a glass of water. -- Douglas Adams
Post #1502664
Posted Tuesday, October 8, 2013 8:56 AM
SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Yesterday @ 1:01 PM
Points: 830, Visits: 7,325
Jozef Moravcik (10/8/2013)
Hi Guys,

I see there is a wrong DB structure (table structure). Ussualy each master table should have at least a primary key. Using proper primary key you avoid the problem...


Sure, and if duplicate data gets into your table, you can use the primary key to remove duplicates, but this gives you a way to find what those duplicates are.
But what do you do if you don't have the ability to add the primary key? Or if you need to remove duplicates on large batches of data that are coming into your system from outside sources? This should help with any of those situations.


--------------------------------------
When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
--------------------------------------
It’s unpleasantly like being drunk.
What’s so unpleasant about being drunk?
You ask a glass of water. -- Douglas Adams
Post #1502666
Posted Tuesday, October 8, 2013 9:16 AM


Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Friday, September 26, 2014 8:14 AM
Points: 340, Visits: 305
Jozef Moravcik (10/8/2013)
Hi Guys,

I see there is a wrong DB structure (table structure). Ussualy each master table should have at least a primary key. Using proper primary key you avoid the problem...

Jozef,
I agree with your point that proper Database table structure will reduce the need for this code. However that does not apply to the real world. Everyday we find table structures that can not be changed in third party products databases that need to be cleaned of duplicate records.
There are MANY databases out there that have tables that are built this way for performance and other reasons. If you do not use one of the many methods to remove the duplicates and instead update the table schema, the product could stop working, the support staff could stop working, and you as a DBA might even stop working.

In short, knowing how to remove rows is something you do as a professional DBA. Updating the SCHEMA for databases used by live applications is something done by that applications development and support staff.
Post #1502677
Posted Tuesday, October 8, 2013 9:19 AM


Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Friday, September 26, 2014 8:14 AM
Points: 340, Visits: 305
Stefan Krzywicki (10/8/2013)
It is funny, I agonized a bit over the example table. As I was putting in the columns I kept thinking "Well, if this were a real database I'd put this in another table so it could have multiple values or historical data or maybe this should be calculated"


Stefan, your example table is perfect for your article. So perfect that if you Google "Removing Duplicate RowNumber()" similar table structure is in most of the other articles on how to do this.
Post #1502679
Posted Tuesday, October 8, 2013 9:31 AM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Yesterday @ 1:35 PM
Points: 226, Visits: 2,142
My current shop gets a lot of data from the mainframe, using a variety of methods to load the data into SQL Server. As such, we have many table without primary keys. Since there are occasionally situations where duplicates slip past the existing processing, with article is extremely useful.
Post #1502684
Posted Tuesday, October 8, 2013 9:33 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Wednesday, September 24, 2014 10:57 AM
Points: 2, Visits: 31
I am wondering if anyone has used this in a CTE? I can't delete the duplicates from my actual tables but I would like to eliminate them within my stored procedure.

Thanks,
Kate
Post #1502685
Posted Tuesday, October 8, 2013 9:33 AM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Friday, September 19, 2014 7:51 AM
Points: 316, Visits: 1,488
Very good article, Stefan.

Found an odd thing -- Your article displays the date of "2013/10/08" next to your name, but when selecting the "printable" version the date shows "2013/09/23"?

-- Pete



Post #1502686
« Prev Topic | Next Topic »

Add to briefcase ««1234»»»

Permissions Expand / Collapse