Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase «««23456»»

Eliminating Duplicate Rows using The PARTITION BY clause Expand / Collapse
Author
Message
Posted Wednesday, September 22, 2010 7:53 PM


SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Yesterday @ 5:59 AM
Points: 897, Visits: 1,488
Earlier today before this article was posted I was making an example and showing it to a workmate that it was almost identical to the provided example. I used CTEs though instead of a subquery though. This article being posted today was really a coincidence.

Very good and useful. Thanks for article. :)


Best regards,

Andre Guerreiro Neto

Database Analyst
http://www.softplan.com.br
MCITPx1/MCTSx2
Post #991673
Posted Friday, September 24, 2010 8:09 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Wednesday, July 9, 2014 6:59 AM
Points: 4, Visits: 83
This would be a very nice technique during our ETL process of cleansing incoming data before merging into our tables.
Post #992811
Posted Sunday, September 26, 2010 10:04 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Thursday, June 27, 2013 9:51 AM
Points: 141, Visits: 113
First of all, I must thank everybody for their valuable comments and insights on this article. During the past several years, I had come across several situations where duplications came up in some or the other tables. Cases were due to either inadequate database design or programming. Everytime I wrote fresh code to solve the problem. The code in this article is a specific part of the overall solution. I have tried to concentrate specifically on a simple and fast method of finding and eliminating duplicate rows. For the sake of concentrating on the specific problem and clarifying it, I created a situation in the article to get the point across. The reason why I did not put in primary keys and foreign keys in there is to concentrate on the specific problem only. Had I put in foreign keys, then a case of consedering the characteristics of FK like NULL, CASCADE, DEFAULT, etc on UPDATE/DELETE would have come up which would have prolonged the article. Probably it could be part of another article.

The code could be enhanced to include OUTPUT clause for storing deleted rows in an audit table. Similarly, my aim was to solve the problem in one step. So, I did not use temporary tables.

Again, I refrain from using TRUNCATE TABLE in production unless the entire data set in the table has to be removed. If it is a live table, I would rather use DELETE so as not to affect a high volume of readers adversely.

Again since I wanted to scan the table only once, I did not use the IN clause with MIN function in the SUBQUERY.

Once again, I really appreciate your comments and would look forward to more.



Kindest Regards,

M Suresh Kumar

Post #993376
Posted Sunday, September 26, 2010 10:13 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Sunday, September 26, 2010 10:23 PM
Points: 21, Visits: 44
I didn't get too far because the code block that loads the data generate a syntax error I can't seem to get past.
insert into Emp_Details (Emp_Name, Company, Join_Date, Resigned_Date)
values ('John', 'Software', '20060101', '20061231')
,('John', 'Software', '20060101', '20061231')
,('John', 'SuperSoft', '20070101', '20071231')
,('John', 'UltraSoft', '20070201', '20080131')
...

generates:
Msg 102, Level 15, State 1, Line 3
Incorrect syntax near ','.

This on SQL 2005 SP3. Either this only works on 2008, or there is some environment condition assumed by the author that isn't the default.

Ok, loaded the data by an alternate method, then the RowNumber instance at the end of the order by line also scores a syntax error. Cut and paste isn't that difficult. Is there something else I'm missing?

I'm trying to apply this to a situation where I need the first duplicate record, but I need to return the name and address, while partitioning only on the address. (the duplicated part). If I considered the name too, of course the records are different, but I don't want to mail twice (or more) to the same address.
Post #993421
Posted Monday, September 27, 2010 7:02 AM


SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Today @ 6:02 AM
Points: 2,668, Visits: 19,229
hugh.hemington (9/26/2010)
I didn't get too far because the code block that loads the data generate a syntax error I can't seem to get past.
insert into Emp_Details (Emp_Name, Company, Join_Date, Resigned_Date)
values ('John', 'Software', '20060101', '20061231')
,('John', 'Software', '20060101', '20061231')
,('John', 'SuperSoft', '20070101', '20071231')
,('John', 'UltraSoft', '20070201', '20080131')
...

generates:
Msg 102, Level 15, State 1, Line 3
Incorrect syntax near ','.

This on SQL 2005 SP3. Either this only works on 2008, or there is some environment condition assumed by the author that isn't the default.

Ok, loaded the data by an alternate method, then the RowNumber instance at the end of the order by line also scores a syntax error. Cut and paste isn't that difficult. Is there something else I'm missing?

I'm trying to apply this to a situation where I need the first duplicate record, but I need to return the name and address, while partitioning only on the address. (the duplicated part). If I considered the name too, of course the records are different, but I don't want to mail twice (or more) to the same address.
That's 2008 syntax, don't think you're missing anything.


---------------------------------------------------------
How best to post your question
How to post performance problems
Tally Table:What it is and how it replaces a loop

"stewsterl 80804 (10/16/2009)I guess when you stop and try to understand the solution provided you not only learn, but save yourself some headaches when you need to make any slight changes."
Post #993624
Posted Monday, September 27, 2010 7:19 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 9:50 AM
Points: 7,191, Visits: 13,645
hugh.hemington (9/26/2010)
...I'm trying to apply this to a situation where I need the first duplicate record, but I need to return the name and address, while partitioning only on the address. (the duplicated part). If I considered the name too, of course the records are different, but I don't want to mail twice (or more) to the same address.


Hugh, start a new thread in the 2k5 forum section. Include a table create / populate script to provide some sample data, and a sample of what you would like as your output from it. This is a common problem and will take minutes to solve.


“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Exploring Recursive CTEs by Example Dwain Camps
Post #993632
Posted Monday, October 4, 2010 12:47 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: 2 days ago @ 11:56 AM
Points: 13, Visits: 271
I was able to use this approach to identify and delete duplicates very easily. I just had my FROM clause in a form of another query that just gets all duplicate rows - very limited output.

I think with careful consideration, the approach is very helpful.

Stan
Post #997788
Posted Friday, October 8, 2010 10:01 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Friday, August 9, 2013 9:36 AM
Points: 35, Visits: 178
Wow, what a great article - it was really clear to understand & helped introduce some new ideas to a TSQL newbie like myself. The comments on the article are also really useful, explaining a few different ways of doing the same thing but also highlighting how creative you can be with your code and also how different people approach the same problem in different ways.

I've learned something new & useful today - yay!

Doodles
Post #1001423
Posted Monday, June 13, 2011 2:47 PM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Tuesday, August 12, 2014 11:44 AM
Points: 46, Visits: 341
Nicely written. Thank you for your contribution!
Post #1124614
Posted Friday, March 2, 2012 12:17 AM


SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Today @ 7:26 AM
Points: 828, Visits: 1,363
I agree with those suggesting using a CTE, as I did in my old blog post: How do I remove duplicate tuples from a relation without any candidate keys?

Now, it is seldom a good idea not to have a table without any candidate keys




Ole Kristian Velstadbråten Bangås - Virinco - Facebook - Twitter

Concatenating Row Values in Transact-SQL
Post #1260644
« Prev Topic | Next Topic »

Add to briefcase «««23456»»

Permissions Expand / Collapse