Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Eliminating Duplicate Rows using The PARTITION BY clause


Eliminating Duplicate Rows using The PARTITION BY clause

Author
Message
Andre Guerreiro
Andre Guerreiro
Ten Centuries
Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)

Group: General Forum Members
Points: 1065 Visits: 1514
Earlier today before this article was posted I was making an example and showing it to a workmate that it was almost identical to the provided example. I used CTEs though instead of a subquery though. This article being posted today was really a coincidence.

Very good and useful. Thanks for article. Smile

Best regards,

Andre Guerreiro Neto

Database Analyst
http://www.softplan.com.br
MCITPx1/MCTSx2/MCSE/MCSA
jbeckett 65194
jbeckett 65194
Grasshopper
Grasshopper (12 reputation)Grasshopper (12 reputation)Grasshopper (12 reputation)Grasshopper (12 reputation)Grasshopper (12 reputation)Grasshopper (12 reputation)Grasshopper (12 reputation)Grasshopper (12 reputation)

Group: General Forum Members
Points: 12 Visits: 87
This would be a very nice technique during our ETL process of cleansing incoming data before merging into our tables.
Suresh Kumar Maganti
Suresh Kumar Maganti
SSC-Enthusiastic
SSC-Enthusiastic (159 reputation)SSC-Enthusiastic (159 reputation)SSC-Enthusiastic (159 reputation)SSC-Enthusiastic (159 reputation)SSC-Enthusiastic (159 reputation)SSC-Enthusiastic (159 reputation)SSC-Enthusiastic (159 reputation)SSC-Enthusiastic (159 reputation)

Group: General Forum Members
Points: 159 Visits: 125
First of all, I must thank everybody for their valuable comments and insights on this article. During the past several years, I had come across several situations where duplications came up in some or the other tables. Cases were due to either inadequate database design or programming. Everytime I wrote fresh code to solve the problem. The code in this article is a specific part of the overall solution. I have tried to concentrate specifically on a simple and fast method of finding and eliminating duplicate rows. For the sake of concentrating on the specific problem and clarifying it, I created a situation in the article to get the point across. The reason why I did not put in primary keys and foreign keys in there is to concentrate on the specific problem only. Had I put in foreign keys, then a case of consedering the characteristics of FK like NULL, CASCADE, DEFAULT, etc on UPDATE/DELETE would have come up which would have prolonged the article. Probably it could be part of another article.

The code could be enhanced to include OUTPUT clause for storing deleted rows in an audit table. Similarly, my aim was to solve the problem in one step. So, I did not use temporary tables.

Again, I refrain from using TRUNCATE TABLE in production unless the entire data set in the table has to be removed. If it is a live table, I would rather use DELETE so as not to affect a high volume of readers adversely.

Again since I wanted to scan the table only once, I did not use the IN clause with MIN function in the SUBQUERY.

Once again, I really appreciate your comments and would look forward to more.


Kindest Regards,

M Suresh Kumar

hugh.hemington
hugh.hemington
Grasshopper
Grasshopper (23 reputation)Grasshopper (23 reputation)Grasshopper (23 reputation)Grasshopper (23 reputation)Grasshopper (23 reputation)Grasshopper (23 reputation)Grasshopper (23 reputation)Grasshopper (23 reputation)

Group: General Forum Members
Points: 23 Visits: 44
I didn't get too far because the code block that loads the data generate a syntax error I can't seem to get past.
insert into Emp_Details (Emp_Name, Company, Join_Date, Resigned_Date)
values ('John', 'Software', '20060101', '20061231')
,('John', 'Software', '20060101', '20061231')
,('John', 'SuperSoft', '20070101', '20071231')
,('John', 'UltraSoft', '20070201', '20080131')
...

generates:
Msg 102, Level 15, State 1, Line 3
Incorrect syntax near ','.

This on SQL 2005 SP3. Either this only works on 2008, or there is some environment condition assumed by the author that isn't the default.

Ok, loaded the data by an alternate method, then the RowNumber instance at the end of the order by line also scores a syntax error. Cut and paste isn't that difficult. Is there something else I'm missing?

I'm trying to apply this to a situation where I need the first duplicate record, but I need to return the name and address, while partitioning only on the address. (the duplicated part). If I considered the name too, of course the records are different, but I don't want to mail twice (or more) to the same address.
jcrawf02
jcrawf02
Ten Centuries
Ten Centuries (1.4K reputation)Ten Centuries (1.4K reputation)Ten Centuries (1.4K reputation)Ten Centuries (1.4K reputation)Ten Centuries (1.4K reputation)Ten Centuries (1.4K reputation)Ten Centuries (1.4K reputation)Ten Centuries (1.4K reputation)

Group: General Forum Members
Points: 1424 Visits: 19324
hugh.hemington (9/26/2010)
I didn't get too far because the code block that loads the data generate a syntax error I can't seem to get past.
insert into Emp_Details (Emp_Name, Company, Join_Date, Resigned_Date)
values ('John', 'Software', '20060101', '20061231')
,('John', 'Software', '20060101', '20061231')
,('John', 'SuperSoft', '20070101', '20071231')
,('John', 'UltraSoft', '20070201', '20080131')
...

generates:
Msg 102, Level 15, State 1, Line 3
Incorrect syntax near ','.

This on SQL 2005 SP3. Either this only works on 2008, or there is some environment condition assumed by the author that isn't the default.

Ok, loaded the data by an alternate method, then the RowNumber instance at the end of the order by line also scores a syntax error. Cut and paste isn't that difficult. Is there something else I'm missing?

I'm trying to apply this to a situation where I need the first duplicate record, but I need to return the name and address, while partitioning only on the address. (the duplicated part). If I considered the name too, of course the records are different, but I don't want to mail twice (or more) to the same address.
That's 2008 syntax, don't think you're missing anything.

---------------------------------------------------------
How best to post your question
How to post performance problems
Tally Table:What it is and how it replaces a loop

"stewsterl 80804 (10/16/2009)I guess when you stop and try to understand the solution provided you not only learn, but save yourself some headaches when you need to make any slight changes."
ChrisM@Work
ChrisM@Work
SSCrazy Eights
SSCrazy Eights (9K reputation)SSCrazy Eights (9K reputation)SSCrazy Eights (9K reputation)SSCrazy Eights (9K reputation)SSCrazy Eights (9K reputation)SSCrazy Eights (9K reputation)SSCrazy Eights (9K reputation)SSCrazy Eights (9K reputation)

Group: General Forum Members
Points: 8997 Visits: 19028
hugh.hemington (9/26/2010)
...I'm trying to apply this to a situation where I need the first duplicate record, but I need to return the name and address, while partitioning only on the address. (the duplicated part). If I considered the name too, of course the records are different, but I don't want to mail twice (or more) to the same address.


Hugh, start a new thread in the 2k5 forum section. Include a table create / populate script to provide some sample data, and a sample of what you would like as your output from it. This is a common problem and will take minutes to solve.

“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Exploring Recursive CTEs by Example Dwain Camps
suslikovich
suslikovich
Grasshopper
Grasshopper (22 reputation)Grasshopper (22 reputation)Grasshopper (22 reputation)Grasshopper (22 reputation)Grasshopper (22 reputation)Grasshopper (22 reputation)Grasshopper (22 reputation)Grasshopper (22 reputation)

Group: General Forum Members
Points: 22 Visits: 300
I was able to use this approach to identify and delete duplicates very easily. I just had my FROM clause in a form of another query that just gets all duplicate rows - very limited output.

I think with careful consideration, the approach is very helpful.

Stan
doodlingdba
doodlingdba
SSC Rookie
SSC Rookie (43 reputation)SSC Rookie (43 reputation)SSC Rookie (43 reputation)SSC Rookie (43 reputation)SSC Rookie (43 reputation)SSC Rookie (43 reputation)SSC Rookie (43 reputation)SSC Rookie (43 reputation)

Group: General Forum Members
Points: 43 Visits: 213
Wow, what a great article - it was really clear to understand & helped introduce some new ideas to a TSQL newbie like myself. The comments on the article are also really useful, explaining a few different ways of doing the same thing but also highlighting how creative you can be with your code and also how different people approach the same problem in different ways.

I've learned something new & useful today - yay!

Doodles :-D
Scott Abrants
Scott Abrants
Valued Member
Valued Member (67 reputation)Valued Member (67 reputation)Valued Member (67 reputation)Valued Member (67 reputation)Valued Member (67 reputation)Valued Member (67 reputation)Valued Member (67 reputation)Valued Member (67 reputation)

Group: General Forum Members
Points: 67 Visits: 405
Nicely written. Thank you for your contribution!
okbangas
okbangas
SSC Eights!
SSC Eights! (935 reputation)SSC Eights! (935 reputation)SSC Eights! (935 reputation)SSC Eights! (935 reputation)SSC Eights! (935 reputation)SSC Eights! (935 reputation)SSC Eights! (935 reputation)SSC Eights! (935 reputation)

Group: General Forum Members
Points: 935 Visits: 1387
I agree with those suggesting using a CTE, as I did in my old blog post: How do I remove duplicate tuples from a relation without any candidate keys?

Now, it is seldom a good idea not to have a table without any candidate keys :-)



Ole Kristian Velstadbråten Bangås - Virinco - Facebook - Twitter

Concatenating Row Values in Transact-SQL
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search