﻿<?xml version='1.0' encoding='UTF-8'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/"><channel><title>SQLServerCentral / Discuss Content Posted by Suresh Maganti / Article Discussions / Article Discussions by Author  / Eliminating Duplicate Rows using The PARTITION BY clause / Latest Posts</title><generator>InstantForum.NET v2.9.0</generator><description>SQLServerCentral</description><link>http://www.sqlservercentral.com/Forums/</link><webMaster>notifications@sqlservercentral.com</webMaster><lastBuildDate>Tue, 18 Jun 2013 20:30:35 GMT</lastBuildDate><ttl>20</ttl><item><title>RE: Eliminating Duplicate Rows using The PARTITION BY clause</title><link>http://www.sqlservercentral.com/Forums/Topic990866-260-1.aspx</link><description>[quote][b]prakashp 84206 (6/1/2012)[/b][hr]Hi,The output of my query contains duplicate records (i.e. column1 value is similar but other column's value different) as well as single record having column2 value blanks(-). now I want to select only one record out of duplicate records for which column2 values are present (eliminating records for which column2 is blanks(-) from duplicate records) and all other single records having column2 values blanks. My query's Present Output column1      column2        column3abc            xyz            pqrabc            -               klmijk              -               uvwplo             -               ujnyhk            ttg            weayhk            -              erfExpected Outputcolumn1      column2        column3abc            xyz              pqrijk              -                 uvwplo             -                ujnyhk           ttg              weaCan someone help me. Thanks in advance.[/quote]Welcome aboard.Read the link in my sig, this will assist you in describing your problem and encouraging others to help. Then start a new thread. It's likely that your problem is sufficiently different to the original on this thread that it's better separated from it.</description><pubDate>Fri, 01 Jun 2012 05:41:05 GMT</pubDate><dc:creator>ChrisM@Work</dc:creator></item><item><title>RE: Eliminating Duplicate Rows using The PARTITION BY clause</title><link>http://www.sqlservercentral.com/Forums/Topic990866-260-1.aspx</link><description>Hi,The output of my query contains duplicate records (i.e. column1 value is similar but other column's value different) as well as single record having column2 value blanks(-). now I want to select only one record out of duplicate records for which column2 values are present (eliminating records for which column2 is blanks(-) from duplicate records) and all other single records having column2 values blanks. My query's Present Output column1      column2        column3abc            xyz            pqrabc            -               klmijk              -               uvwplo             -               ujnyhk            ttg            weayhk            -              erfExpected Outputcolumn1      column2        column3abc            xyz              pqrijk              -                 uvwplo             -                ujnyhk           ttg              weaCan someone help me. Thanks in advance.</description><pubDate>Fri, 01 Jun 2012 05:35:45 GMT</pubDate><dc:creator>prakashp 84206</dc:creator></item><item><title>RE: Eliminating Duplicate Rows using The PARTITION BY clause</title><link>http://www.sqlservercentral.com/Forums/Topic990866-260-1.aspx</link><description>[quote]If a foreign key referenced only one of the rows you are deleting, isn't that by definition not a duplicate? So then in that case can't you just delete with the unique key?[/quote]He is saying if the table has a primary key and many columns but the data is already screwed up. Now they identify four other columns combination as unique but data shows duplicates by these four. (The four columns should have had a design of unique key to begin with)</description><pubDate>Thu, 15 Mar 2012 15:03:02 GMT</pubDate><dc:creator>jswong05</dc:creator></item><item><title>RE: Eliminating Duplicate Rows using The PARTITION BY clause</title><link>http://www.sqlservercentral.com/Forums/Topic990866-260-1.aspx</link><description>[quote][b]David Lean (9/21/2010)[/b][hr]Nice article &amp; Nice solution. But I would caution anyone about to blindly using this in production. It is only the 1st part of the process. [b]Problem 1: need to Fix Declarative Referential Integrity (DRI)  for the rows deleted. [/b]Invariably the duplicate rows you are about to remove will be referenced by the foreign keys in another table. When you delete these rows you must keep a copy of the Keys of the rows you deleted &amp; map them to the key of the “duplicate” row you kept. That way you have some lookup table you can use to correct all the foreign key references. [/quote]If a foreign key referenced only one of the rows you are deleting, isn't that by definition not a duplicate? So then in that case can't you just delete with the unique key?</description><pubDate>Fri, 02 Mar 2012 06:50:04 GMT</pubDate><dc:creator>patrickmcginnis59</dc:creator></item><item><title>RE: Eliminating Duplicate Rows using The PARTITION BY clause</title><link>http://www.sqlservercentral.com/Forums/Topic990866-260-1.aspx</link><description>I agree with those suggesting using a CTE, as I did in my old blog post: [url=http://www.mssql.no/QnA/?id=1]How do I remove duplicate tuples from a relation without any candidate keys?[/url]Now, it is seldom a good idea not to have a table without any candidate keys :-)</description><pubDate>Fri, 02 Mar 2012 00:17:06 GMT</pubDate><dc:creator>okbangas</dc:creator></item><item><title>RE: Eliminating Duplicate Rows using The PARTITION BY clause</title><link>http://www.sqlservercentral.com/Forums/Topic990866-260-1.aspx</link><description>Nicely written.  Thank you for your contribution!</description><pubDate>Mon, 13 Jun 2011 14:47:47 GMT</pubDate><dc:creator>Scott Abrants</dc:creator></item><item><title>RE: Eliminating Duplicate Rows using The PARTITION BY clause</title><link>http://www.sqlservercentral.com/Forums/Topic990866-260-1.aspx</link><description>Wow, what a great article - it was really clear to understand &amp; helped introduce some new ideas to a TSQL newbie like myself. The comments on the article are also really useful, explaining a few different ways of doing the same thing but also highlighting how creative you can be with your code and also how different people approach the same problem in different ways.I've learned something new &amp; useful today  - yay!Doodles :-D</description><pubDate>Fri, 08 Oct 2010 10:01:23 GMT</pubDate><dc:creator>doodlingdba</dc:creator></item><item><title>RE: Eliminating Duplicate Rows using The PARTITION BY clause</title><link>http://www.sqlservercentral.com/Forums/Topic990866-260-1.aspx</link><description>I was able to use this approach to identify and delete duplicates very easily. I just had my FROM clause in a form of another query that just gets all duplicate rows - very limited output.I think with careful consideration, the approach is very helpful.Stan</description><pubDate>Mon, 04 Oct 2010 12:47:00 GMT</pubDate><dc:creator>suslikovich</dc:creator></item><item><title>RE: Eliminating Duplicate Rows using The PARTITION BY clause</title><link>http://www.sqlservercentral.com/Forums/Topic990866-260-1.aspx</link><description>[quote][b]hugh.hemington (9/26/2010)[/b][hr]...I'm trying to apply this to a situation where I need the first duplicate record, but I need to return the name and address, while partitioning only on the address. (the duplicated part).  If I considered the name too, of course the records are different, but I don't want to mail twice (or more) to the same address.[/quote]Hugh, start a new thread in the 2k5 forum section. Include a table create / populate script to provide some sample data, and a sample of what you would like as your output from it. This is a common problem and will take minutes to solve.</description><pubDate>Mon, 27 Sep 2010 07:19:29 GMT</pubDate><dc:creator>ChrisM@Work</dc:creator></item><item><title>RE: Eliminating Duplicate Rows using The PARTITION BY clause</title><link>http://www.sqlservercentral.com/Forums/Topic990866-260-1.aspx</link><description>[quote][b]hugh.hemington (9/26/2010)[/b][hr]I didn't get too far because the code block that loads the data generate a syntax error I can't seem to get past.insert into Emp_Details (Emp_Name, Company, Join_Date, Resigned_Date)values ('John', 'Software', '20060101', '20061231'),('John', 'Software', '20060101', '20061231'),('John', 'SuperSoft', '20070101', '20071231'),('John', 'UltraSoft', '20070201', '20080131')...generates:Msg 102, Level 15, State 1, Line 3Incorrect syntax near ','.This on SQL 2005 SP3.  Either this only works on 2008, or there is some environment condition assumed by the author that isn't the default.Ok, loaded the data by an alternate method, then the RowNumber instance at the end of the order by line also scores a syntax error.  Cut and paste isn't that difficult.  Is there something else I'm missing?I'm trying to apply this to a situation where I need the first duplicate record, but I need to return the name and address, while partitioning only on the address. (the duplicated part).  If I considered the name too, of course the records are different, but I don't want to mail twice (or more) to the same address.[/quote]That's 2008 syntax, don't think you're missing anything.</description><pubDate>Mon, 27 Sep 2010 07:02:40 GMT</pubDate><dc:creator>jcrawf02</dc:creator></item><item><title>RE: Eliminating Duplicate Rows using The PARTITION BY clause</title><link>http://www.sqlservercentral.com/Forums/Topic990866-260-1.aspx</link><description>I didn't get too far because the code block that loads the data generate a syntax error I can't seem to get past.insert into Emp_Details (Emp_Name, Company, Join_Date, Resigned_Date)values ('John', 'Software', '20060101', '20061231'),('John', 'Software', '20060101', '20061231'),('John', 'SuperSoft', '20070101', '20071231'),('John', 'UltraSoft', '20070201', '20080131')...generates:Msg 102, Level 15, State 1, Line 3Incorrect syntax near ','.This on SQL 2005 SP3.  Either this only works on 2008, or there is some environment condition assumed by the author that isn't the default.Ok, loaded the data by an alternate method, then the RowNumber instance at the end of the order by line also scores a syntax error.  Cut and paste isn't that difficult.  Is there something else I'm missing?I'm trying to apply this to a situation where I need the first duplicate record, but I need to return the name and address, while partitioning only on the address. (the duplicated part).  If I considered the name too, of course the records are different, but I don't want to mail twice (or more) to the same address.</description><pubDate>Sun, 26 Sep 2010 22:13:41 GMT</pubDate><dc:creator>hugh.hemington</dc:creator></item><item><title>RE: Eliminating Duplicate Rows using The PARTITION BY clause</title><link>http://www.sqlservercentral.com/Forums/Topic990866-260-1.aspx</link><description>First of all, I must thank everybody for their valuable comments and insights on this article. During the past several years, I had come across several situations where duplications came up in some or the other tables. Cases were due to either inadequate database design or programming. Everytime I wrote fresh code to solve the problem. The code in this article is a specific part of the overall solution. I have tried to concentrate specifically on a simple and fast method of finding and eliminating duplicate rows. For the sake of concentrating on the specific problem and clarifying it, I created a situation in the article to get the point across. The reason why I did not put in primary keys and foreign keys in there is to concentrate on the specific problem only. Had I put in foreign keys, then a case of consedering the characteristics of FK like NULL, CASCADE, DEFAULT, etc on UPDATE/DELETE would have come up which would have prolonged the article. Probably it could be part of another article.The code could be enhanced to include OUTPUT clause for storing deleted rows in an audit table. Similarly, my aim was to solve the problem in one step. So, I did not use temporary tables. Again, I refrain from using TRUNCATE TABLE in production unless the entire data set in the table has to be removed. If it is a live table, I would rather use DELETE so as not to affect a high volume of readers adversely.Again since I wanted to scan the table only once, I did not use the IN clause with MIN function in the SUBQUERY.Once again, I really appreciate your comments and would look forward to more.</description><pubDate>Sun, 26 Sep 2010 10:04:13 GMT</pubDate><dc:creator>Suresh Kumar Maganti</dc:creator></item><item><title>RE: Eliminating Duplicate Rows using The PARTITION BY clause</title><link>http://www.sqlservercentral.com/Forums/Topic990866-260-1.aspx</link><description>This would be a very nice technique during our ETL process of cleansing incoming data before merging into our tables.</description><pubDate>Fri, 24 Sep 2010 08:09:47 GMT</pubDate><dc:creator>jbeckett 65194</dc:creator></item><item><title>RE: Eliminating Duplicate Rows using The PARTITION BY clause</title><link>http://www.sqlservercentral.com/Forums/Topic990866-260-1.aspx</link><description>Earlier today before this article was posted I was making an example and showing it to a workmate that it was almost identical to the provided example. I used CTEs though instead of a subquery though. This article being posted today was really a coincidence.Very good and useful. Thanks for article. :)</description><pubDate>Wed, 22 Sep 2010 19:53:27 GMT</pubDate><dc:creator>codebyo</dc:creator></item><item><title>RE: Eliminating Duplicate Rows using The PARTITION BY clause</title><link>http://www.sqlservercentral.com/Forums/Topic990866-260-1.aspx</link><description>Nice. Very useful. Thanks.</description><pubDate>Wed, 22 Sep 2010 19:21:02 GMT</pubDate><dc:creator>yheon_17 66014</dc:creator></item><item><title>RE: Eliminating Duplicate Rows using The PARTITION BY clause</title><link>http://www.sqlservercentral.com/Forums/Topic990866-260-1.aspx</link><description>The principle is the same, the implementation depends on the nature of your business.If quick fix is applicable and acceptable without incurring much cost, you can quick-fix it everyday.By 80-20 rule, if you have to chase down and remedy all child records and parent records and that is too costly, you will have to determine where to cut off, to leave them along or fix at any cost or in between.First, propose your fix and have business and IT managers sign off. If they don't understand, have them sign off.In general, after quick-fix, it will cost your organization less in the long run if you can fix the root cause. That is not always the case. For example, if you don't have the source code, don't have a PASCAL programer, don't have the original design, don't have business rules.The only time you don't have to fix anything and hand it back to your manager is you already found another job.</description><pubDate>Wed, 22 Sep 2010 18:50:33 GMT</pubDate><dc:creator>jswong05</dc:creator></item><item><title>RE: Eliminating Duplicate Rows using The PARTITION BY clause</title><link>http://www.sqlservercentral.com/Forums/Topic990866-260-1.aspx</link><description>So it seems that we are all in agreement. 1. It is best that the system is designed to avoid these issues in the first place. This is similar to every dog owner removing their dog droppings. 2. Sometimes you find youself in the poo. In this case you need to a) Clean the data &amp; b) Change the system to prevent it happening again. There is often a third part to this issue. One that the "Hey I would just write a perfect database" folks may be ignoring. What happens if your perfect DB now starts to reject these dirty rows? Often systems with poor error handling in the DB also have poor error handling in the App tier. If one insert in a more complex process fails, will it be contained in a transaction to ensure everything fails cleanly? (Most likely not, few developers use transactions, many prefer WITH (NOLOCK) hints).If the app does correctly use transactions, will it check the return code from SQL (often not)If it does check return codes will it present some kind of error to the end-user in a way the User can correct the issue?And sometimes the User is already gone. eg: Batch systems, Real-Time capture ie: RFID, CEP, Process control, Toll Booth &amp; Speeding Cameras etc. In short it may be cheap to fix the DB, but you occasionally open the door to a huge rectification project. One that will take a long time to get resourced &amp; funded. Which is why, you will often hear IT mgrs request you to clean the huge mess their DB is in now, &amp; maybe write some scans they can run periodicly, till they get budget to do it right.Or to go with the dog analogy. If you've just fallen in the sewer. You need to priortise. Perhaps wipe it from your eyes. Shake off the big chunks. And then figure out how to sort out the rest. Having someone nearby tell you, that they wouldn't have fallen in the sewer, is rather redundant. </description><pubDate>Wed, 22 Sep 2010 16:43:27 GMT</pubDate><dc:creator>David Lean</dc:creator></item><item><title>RE: Eliminating Duplicate Rows using The PARTITION BY clause</title><link>http://www.sqlservercentral.com/Forums/Topic990866-260-1.aspx</link><description>Nice article - well written.</description><pubDate>Wed, 22 Sep 2010 16:34:05 GMT</pubDate><dc:creator>SQLRNNR</dc:creator></item><item><title>RE: Eliminating Duplicate Rows using The PARTITION BY clause</title><link>http://www.sqlservercentral.com/Forums/Topic990866-260-1.aspx</link><description>Excellent use of Row_Number.</description><pubDate>Wed, 22 Sep 2010 15:01:15 GMT</pubDate><dc:creator>Andrew Peterson-472853</dc:creator></item><item><title>RE: Eliminating Duplicate Rows using The PARTITION BY clause</title><link>http://www.sqlservercentral.com/Forums/Topic990866-260-1.aspx</link><description>[i]skamath wrote:[/i][quote]So, would you agree that if you encounter *exactly* such instances (and you seem to indicate that you do), that unless you have a time machine to travel back and pre-correct the poor design, your choices are:1. Surrender, stating that the database was poorly designed.2. Try to correct the mistake.If you choose 2., what is wrong with using the technique in this article? I do not recall the author suggesting that you should first design poorly and then use his technique to correct it. [/quote]You are quite correct.  Having encountered such instances, I explain to my client that while I would love to have his business writing a fix for his database, the real problem is the design and my fix will NOT eliminate any problems with the data beyond the moment it is applied.  I may risk a contract doing that, but more often than not the client will ask what exactly is the problem and what is the best correction.There is nothing wrong with the technique in the article, though I say that with some reservation concerning locks, keys, indexes and already published reports that may have relied on the faulty data.  I have also found that most cases involving duplicate rows are not nearly as clean and simple as the sample illustration and need quite a bit of stroking, especially if the offending duplicates are used as foreign keys or indexes.  I'm just saying there is much more to it and in the case of the simple example, there are still pitfalls and still more efficient ways to achieve the same end.</description><pubDate>Wed, 22 Sep 2010 14:16:10 GMT</pubDate><dc:creator>trubolotta</dc:creator></item><item><title>RE: Eliminating Duplicate Rows using The PARTITION BY clause</title><link>http://www.sqlservercentral.com/Forums/Topic990866-260-1.aspx</link><description>but what would u do if u have duplicates row not in one table but after joining to view, i was unable to delete duplicate records then , i got an error messageMsg 4405, Level 16, State 1, Line 1View or function 'a' is not updatable because the modification affects multiple base tables.delete from afrom(select v_rpt_Study_details.StudyId,view1.col1,view1.col2,view2.col1,view2.col2       ,ROW_NUMBER() over (partition by view1.col1,view1.col2,view2.col1,view2.col2 order by view1.col1,view1.col2,view2.col1,view2.col2 							,					) RowNumber from view1 inner  join view2 on view1.col1=view2.col1) awhere a.RowNumber &amp;gt; 1</description><pubDate>Wed, 22 Sep 2010 13:46:25 GMT</pubDate><dc:creator>tanyauskas</dc:creator></item><item><title>RE: Eliminating Duplicate Rows using The PARTITION BY clause</title><link>http://www.sqlservercentral.com/Forums/Topic990866-260-1.aspx</link><description>[quote][b]thisisfutile (9/22/2010)[/b][hr]@trubolotta  Not trying to start an argument here, but hindsight is 20/20 for everyone.  Sometimes we find ourselves in a situation like the OP is describing (or something similar) and we need a solution....This appears to be one of those articles that creates a problem based on poor design and purports to correct it using some bloated functionality of SQL Server. If the table were properly designed with uniqueness constraints, the problem would not exist. Allowing duplicate data into the table in the first place is the problem, not fixing it after the fact. The more likely scenario and the one I have seen most often comes from importing data from poorly designed databases or poorly trained users.[/quote]I agree that such problems are usually caused by poor design.But poor design *is* prevalent in the real world.So, would you agree that if you encounter *exactly* such instances (and you seem to indicate that you do), that unless you have a time machine to travel back and pre-correct the poor design, your choices are:1.  Surrender, stating that the database was poorly designed.2.  Try to correct the mistake.If you choose 2., what is wrong with using the technique in this article?  I do not recall the author suggesting that you should first design poorly and then use his technique to correct it.</description><pubDate>Wed, 22 Sep 2010 13:44:42 GMT</pubDate><dc:creator>skamath</dc:creator></item><item><title>RE: Eliminating Duplicate Rows using The PARTITION BY clause</title><link>http://www.sqlservercentral.com/Forums/Topic990866-260-1.aspx</link><description>[i]trubolatta wrote:[/i][i]My alternative is simply not to insert duplicate records in the production database in the first place [sample code follows] ... [/i]Yes, this is a nice method. Very clean, very elegant.[i]But then again, my solution doesn't offer that delicious complexity some seem to relish or use the whiz-bang new features of SQL Server. To each his own. [/i]As a developer, I always prefer the KISS approach, and your example certainly offers that. Thank you for posting an alternative.</description><pubDate>Wed, 22 Sep 2010 13:39:21 GMT</pubDate><dc:creator>Craig-315134</dc:creator></item><item><title>RE: Eliminating Duplicate Rows using The PARTITION BY clause</title><link>http://www.sqlservercentral.com/Forums/Topic990866-260-1.aspx</link><description>The code in the article works, but is it enough to leave it at that?  If you are stuck with a production database that allows duplicates in tables, you have a problem and should ask how frequently do you need to purge duplicates?  Also ask what prevents the data from being corrupted seconds after the purge, and then used to make a critical business decision?This is the sample table created by the author, renamed ony to illustrate a point:[code="sql"]create table Emp_Details_Raw	(Emp_Name varchar(10)	,Company varchar(15)	,Join_Date datetime	,Resigned_Date datetime)[/code]Of course the table should look more like this to provide unique rows, again the table name was chosen to illustrate a point:[code="sql"]CREATE TABLE [dbo].[Emp_Details_Unique](	[Emp_Name] [varchar](10) NOT NULL,	[Company] [varchar](15) NOT NULL,	[Join_Date] [datetime] NOT NULL,	[Resigned_Date] [datetime] NOT NULL) ON [PRIMARY]GOCREATE UNIQUE NONCLUSTERED INDEX [IX_Emp_Details_Unique] ON [dbo].[Emp_Details_Unique] (	[Emp_Name] ASC,	[Company] ASC,	[Join_Date] ASC,	[Resigned_Date] ASC)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]GO[/code]Using the author's code to populate the table Emp_Details_Raw pops the duplicates in without any problem because of the missing constraint.  This may be similar to the problem you face with third party data in whatever form you receive it:[code="sql"]insert into Emp_Details_Raw (Emp_Name, Company, Join_Date, Resigned_Date)values ('John', 'Software', '20060101', '20061231'),('John', 'Software', '20060101', '20061231'),('John', 'Software', '20060101', '20061231'),('John', 'SuperSoft', '20070101', '20071231'),('John', 'UltraSoft', '20070201', '20080131'),('John', 'ImproSoft', '20080201', '20081231'),('John', 'ImproSoft', '20080201', '20081231'),('Mary', 'Software', '20060101', '20081231'),('Mary', 'SuperSoft', '20090101', '20090531'),('Mary', 'SuperSoft', '20090101', '20090531'),('Mary', 'UltraSoft', '20090601', '20100531'),('Mary', 'UltraSoft', '20090601', '20100531')[/code]I have two choices.  First, I can delete duplicates from the Emp_Details_Raw table, but wouldn't it be a good idea to back that data up first?  I would just for accountability.  Once the duplicates are deleted, I can insert the data into my production table.My alternative is simply not to insert duplicate records in the production database in the first place:[code="sql"]INSERT dbo.Emp_Details_Unique	(Emp_Name, Company, Join_Date, Resigned_Date)SELECT DISTINCT	Emp_Name, Company, Join_Date, Resigned_DateFROM dbo.Emp_Details_Raw[/code]Advantages include less coding, preservation of the original data without creating a duplicate data store and assurance the production table has unique rows at all times.  But then again, my solution doesn't offer that delicious complexity some seem to relish or use the whiz-bang new features of SQL Server.  To each his own.</description><pubDate>Wed, 22 Sep 2010 13:11:09 GMT</pubDate><dc:creator>trubolotta</dc:creator></item><item><title>RE: Eliminating Duplicate Rows using The PARTITION BY clause</title><link>http://www.sqlservercentral.com/Forums/Topic990866-260-1.aspx</link><description>Nice article:DWt a coincidence 2 days back I tried the same method.I was not aware this method of removing duplicated is accepted by lot others as wel:D</description><pubDate>Wed, 22 Sep 2010 12:59:35 GMT</pubDate><dc:creator>samita.sawant</dc:creator></item><item><title>RE: Eliminating Duplicate Rows using The PARTITION BY clause</title><link>http://www.sqlservercentral.com/Forums/Topic990866-260-1.aspx</link><description>If you work for the bank, you wait for import duplicates then delete, you are dead already. If you work for car-wash, it is OK.Row_number() function needs order by.Never use "Not in" on large table. You are killing yourself or the server.http://usa.redirectme.net/repriser/data/WriteTSQLPerformace.htmlThere is one more method people have not bought it up in this long discussion. SSMS Edit Rows, right-mouse drop to Delete command one row at a time. (I am just kidding.) My point is some people chew up words and miss the point of the article. Since there only five rows for delete, you can make this joke.Can we all stop that endless debate design vs implementation, developer vs DBA now? The author put in a lot work to write a nice article.</description><pubDate>Wed, 22 Sep 2010 12:48:21 GMT</pubDate><dc:creator>jswong05</dc:creator></item><item><title>RE: Eliminating Duplicate Rows using The PARTITION BY clause</title><link>http://www.sqlservercentral.com/Forums/Topic990866-260-1.aspx</link><description>Joe,I didn't know that technique - here is your code with the correct columns demonstrating the efficiency.-- List all entries regardless of frequencySELECT *, 1 AS grp_size FROM Emp_DetailsINTERSECTSELECT emp_name, company, join_date, resigned_date,ROW_NUMBER()OVER (PARTITION BY emp_name, company, join_date, resigned_dateORDER BY emp_name, company, join_date, resigned_date) AS grp_sizeFROM Emp_Details;-- List all with 2 or more entriesSELECT *, 2 AS grp_size FROM Emp_DetailsINTERSECTSELECT emp_name, company, join_date, resigned_date,ROW_NUMBER()OVER (PARTITION BY emp_name, company, join_date, resigned_dateORDER BY emp_name, company, join_date, resigned_date) AS grp_sizeFROM Emp_Details;-- List all with 3 or more entriesSELECT *, 3 AS grp_size FROM Emp_DetailsINTERSECTSELECT emp_name, company, join_date, resigned_date,ROW_NUMBER()OVER (PARTITION BY emp_name, company, join_date, resigned_dateORDER BY emp_name, company, join_date, resigned_date) AS grp_sizeFROM Emp_Details;Nice,Doug</description><pubDate>Wed, 22 Sep 2010 12:28:39 GMT</pubDate><dc:creator>Douglas Osborne-456728</dc:creator></item><item><title>RE: Eliminating Duplicate Rows using The PARTITION BY clause</title><link>http://www.sqlservercentral.com/Forums/Topic990866-260-1.aspx</link><description>We can also use the newer set oriented operations to pick subsets from a (non-)table with duplicates:SELECT *, 1 AS grp_size FROM Emp_Details -- or 2 or 3 INTERSECT SELECT emp_name, company_name, membership_date, resignation_date,       ROW_NUMBER()        OVER (PARTITION BY emp_name, company_name, membership_date, resignation_date                  ORDER BY emp_name, company_name, membership_date, resignation_date) AS grp_sizeFROM Emp_Details;GO</description><pubDate>Wed, 22 Sep 2010 10:18:00 GMT</pubDate><dc:creator>CELKO</dc:creator></item><item><title>RE: Eliminating Duplicate Rows using The PARTITION BY clause</title><link>http://www.sqlservercentral.com/Forums/Topic990866-260-1.aspx</link><description>Suresh,I'm a developer, not a DBA; still, I've been working with RDBMSs since 1984 (IBM DB/2 v1), and I can tell you that yours is not a 'solution looking for a problem', but one very much for the 'real world'. Dirty data will always be with us, and we often have little or no control over those providing it - so deal with it we must!It is is, then, with much gratitude that fellows like me read and use articles by fellows like you. Well done, sir.</description><pubDate>Wed, 22 Sep 2010 09:38:01 GMT</pubDate><dc:creator>Craig-315134</dc:creator></item><item><title>RE: Eliminating Duplicate Rows using The PARTITION BY clause</title><link>http://www.sqlservercentral.com/Forums/Topic990866-260-1.aspx</link><description>This is a fantastic post, and as a newer DBA, it really clarified exactly how the ROW_NUMBER/PARTITION BY function works.  The solution is a perfect fit for me on a project that I am currently working on, where we have data with duplicates coming in from a third party that needs to be entered in to the database.  I will be using this function to clean the data.  Thanks!</description><pubDate>Wed, 22 Sep 2010 09:34:15 GMT</pubDate><dc:creator>rlwilde</dc:creator></item><item><title>RE: Eliminating Duplicate Rows using The PARTITION BY clause</title><link>http://www.sqlservercentral.com/Forums/Topic990866-260-1.aspx</link><description>Thanks for the article.  I've used this method a lot since sql 2005 came out and have found it particularily usefull when a) the data comes from a 3rd party and is part of a larger data cleansing that I do.  Usually data dumps like the earlier posted mentioned about surveys.  b) 3rd party databases that are poorly designed without proper constraints and disasters occur where dupicates somehow get inserted. (This is much more rare).</description><pubDate>Wed, 22 Sep 2010 09:15:04 GMT</pubDate><dc:creator>Trey Staker</dc:creator></item><item><title>RE: Eliminating Duplicate Rows using The PARTITION BY clause</title><link>http://www.sqlservercentral.com/Forums/Topic990866-260-1.aspx</link><description>From some of the replies to my post, I'm seeing that old schism of "code guru" and "design guru" is alive and well.  I have never denied duplicates are a problem and require a solution, but that schism gets in the way of offering clients the best solution.  If a "code guru" can't see the design problem or doesn't point it out to the designer, or the "design guru" can't see the code burden or inefficiency caused by his design, neither is doing his client or employer a service.  What can I sell the client appears to drive the answer, and that is where I have a problem.The article is a coding bandaid that indeed may be useful for data cleansing or conditioning "before" the data is placed into production, as one poster with a real life example illustrated with "staging tables".  Not once however, does the article suggest the table design is fundamentally flawed or present a realistic scenario that may require such a solution.  Duplicates were being removed during the cleansing process quite handily long before the featured SQL Server functions came into existence.As to the projects I work on, they are not always well funded but they certainly are well investigated and well thought out before a single table is created or a single line of SQL is written.  And no, I do not have "code gurus" or "design gurus" at my disposal.  What I do have are people motivated to do the best job for our clients, even if it means not getting a project because a better solution lies elsewhere.  Clients don't forget that.</description><pubDate>Wed, 22 Sep 2010 09:09:08 GMT</pubDate><dc:creator>trubolotta</dc:creator></item><item><title>RE: Eliminating Duplicate Rows using The PARTITION BY clause</title><link>http://www.sqlservercentral.com/Forums/Topic990866-260-1.aspx</link><description>Nice content. Thanks. I agree, this method is not a fixall or a substitution for poor design. However, given certain scenarios, it is very useful and a concise way to eliminate duplicates. We don't all have the luxury of inheriting well architected database structures or being involved in the design process.Well done!</description><pubDate>Wed, 22 Sep 2010 08:52:57 GMT</pubDate><dc:creator>jxhopper</dc:creator></item><item><title>RE: Eliminating Duplicate Rows using The PARTITION BY clause</title><link>http://www.sqlservercentral.com/Forums/Topic990866-260-1.aspx</link><description>This is a solution that fits some problem domains. One that might have poor database design, or other issues, but that can be a very large domain of systems.Disregarding this because you think there is a more fundamental problem is ignoring the reality in many situations. Yes, I'd love to have someone redesign things and make them better, but you can't do that.There are things in the db design here at SSC that I'd like changed, or I think are poor design. However I just don't have the resources to deal with that. As a result, I need to find workarounds until such time as I can redesign them.</description><pubDate>Wed, 22 Sep 2010 08:33:02 GMT</pubDate><dc:creator>Steve Jones - SSC Editor</dc:creator></item><item><title>RE: Eliminating Duplicate Rows using The PARTITION BY clause</title><link>http://www.sqlservercentral.com/Forums/Topic990866-260-1.aspx</link><description>[quote][b]trubolotta (9/22/2010)[/b]My own experience indicates that datra cleansing is a highly variable problem and requires ingenuity and coding expertise, but it is also a one-shot deal.[/quote]Roger that.  In my case, just yesterday I created a process to upload new pricing information from one of our vendors.  For whatever reason, they usually have duplicated part numbers on the spreadsheet (fortunately with the same pricing information for each) and I've included a step to delete these dupes.  I haven't had to deal wtih dupes in so long that I found myself looking for "new and improved" ways to handle the problem.  Thought it was very coincidental that in my inbox this morning I got a SqlServerCentral mail discussing the very topic.While my solution is a "one-time" fix, it must be done every time I work with this vendor's spreadsheets (about once-per-month).</description><pubDate>Wed, 22 Sep 2010 08:08:40 GMT</pubDate><dc:creator>thisisfutile</dc:creator></item><item><title>RE: Eliminating Duplicate Rows using The PARTITION BY clause</title><link>http://www.sqlservercentral.com/Forums/Topic990866-260-1.aspx</link><description>[quote][b]trubolotta (9/22/2010)[/b][hr]This appears to be one of those articles that creates a problem based on poor design and purports to correct it using some bloated functionality of SQL Server.  If the table were properly designed with uniqueness constraints, the problem would not exist.  Allowing duplicate data into the table in the first place is the problem, not fixing it after the fact.  The more likely scenario and the one I have seen most often comes from importing data from poorly designed databases or poorly trained users.It seems to me we have entered a phase with SQL Server fuctions in search of problems to justify their cost and footprint, no matter how contrived those problems may be.[/quote]This is not a solution looking for a problem. Almost every OLTP database I see has: lousy performance, dirty data, duplicate rows, poor indexing, inefficient queries, cursors, WITH NOLOCK hints on everything, thin to no security, no integrity checks, &amp; never tested their ability to restore their system.Often created by developers who "know a little TSQL" or a DBA constantly apologising that this was his first project when he joined the company &amp; he's learnt a lot since then. OR they were really short of time &amp; he just didn't get around to fixing it.   Also I've never done a datawarehouse or BI project where the cube didn't highlight a drastic need to fix up the data they thought was perfect. So ... It is possible that the only companies that ask me to help with their database, are those who know they have a problem. And don't have a team of skilled DBA's. So the sample I see is likely to skewed.Or it is possible that you've only worked on well funded projects that benefit from you high level of DBA skills. Either way, trust me. There is a huge line of people with no idea how to fix their data that will benefit from this thread. </description><pubDate>Wed, 22 Sep 2010 08:04:54 GMT</pubDate><dc:creator>David Lean</dc:creator></item><item><title>RE: Eliminating Duplicate Rows using The PARTITION BY clause</title><link>http://www.sqlservercentral.com/Forums/Topic990866-260-1.aspx</link><description>[quote][b]trubolotta (9/22/2010)[/b][hr][quote][b]thisisfutile (9/22/2010)[/b][hr]@trubolotta  Not trying to start an argument here, but hindsight is 20/20 for everyone.  Sometimes we find ourselves in a situation like the OP is describing (or something similar) and we need a solution.[/quote]No mal-intentions assumed, but the idea this "code" is efficient is ludicrous because is suggests it must be run periodically.  So what happens to all the bad results in between?  Don't worry about those erroneous reports, we are going run our very efficient "clean-up" code shortly?  Do you run this with each data entry?You are correct, it is a problem to deal with but it is not a fix nor is it viable for a production scenario.  My own experience indicates that datra cleansing is a highly variable problem and requires ingenuity and coding expertise, but it is also a one-shot deal.[/quote]Not everyone who frequents SSC is a DBA or has the ability to change table structures. Some of us just need to be able to quickly remove dupes from resultsets for reporting, and not all columns are able to apply a DISTINCT to it. This is an easy solution that does the job well.</description><pubDate>Wed, 22 Sep 2010 08:00:08 GMT</pubDate><dc:creator>jcrawf02</dc:creator></item><item><title>RE: Eliminating Duplicate Rows using The PARTITION BY clause</title><link>http://www.sqlservercentral.com/Forums/Topic990866-260-1.aspx</link><description>[quote][b]thisisfutile (9/22/2010)[/b][hr]@trubolotta  Not trying to start an argument here, but hindsight is 20/20 for everyone.  Sometimes we find ourselves in a situation like the OP is describing (or something similar) and we need a solution.[/quote]No mal-intentions assumed, but the idea this "code" is efficient is ludicrous because is suggests it must be run periodically.  So what happens to all the bad results in between?  Don't worry about those erroneous reports, we are going run our very efficient "clean-up" code shortly?  Do you run this with each data entry?You are correct, it is a problem to deal with but it is not a fix nor is it viable for a production scenario.  My own experience indicates that datra cleansing is a highly variable problem and requires ingenuity and coding expertise, but it is also a one-shot deal.</description><pubDate>Wed, 22 Sep 2010 07:51:34 GMT</pubDate><dc:creator>trubolotta</dc:creator></item><item><title>RE: Eliminating Duplicate Rows using The PARTITION BY clause</title><link>http://www.sqlservercentral.com/Forums/Topic990866-260-1.aspx</link><description>We have a fairly standard real life example where this would be useful and not a indication of poor table design.  We get daily data feeds from third party vendors that conduct customer service surveys for us.  Our business rule is that we only contact customers once every 90 days.  Due to the distributed nature of the dialing/survey process, timing delays on the data transfer to/from our vendors, and plain old user error, sometimes a customer will be contacted more than once either on the same day or within the 90 window, and we need to remove the "duplicate" survey from our system and notify the call center that they violated this rule so they can remove the record and reset their quotas.  We load the data into a staging table and evaluate within the for duplicate records based on customer id - which is different from our primary key (survey id).  Customer ID cannot be the primary key because customers can have multiple surveys but just not within 90 days of each other.  We don't use the exact code posted in the article since we can identify unique rows via the primary key.  When a "duplicate" is found in the staging data, the offending row is copied to an audit table and removed from the staging table.  Our loading process then sends out the necessary communications to the vendor and continues on with loading the cleansed data.  In a perfect world our vendors would always follow our business rules - but you have to code for reality.  </description><pubDate>Wed, 22 Sep 2010 07:51:29 GMT</pubDate><dc:creator>MWise</dc:creator></item><item><title>RE: Eliminating Duplicate Rows using The PARTITION BY clause</title><link>http://www.sqlservercentral.com/Forums/Topic990866-260-1.aspx</link><description>@trubolotta  Not trying to start an argument here, but hindsight is 20/20 for everyone.  Sometimes we find ourselves in a situation like the OP is describing (or something similar) and we need a solution.</description><pubDate>Wed, 22 Sep 2010 07:41:41 GMT</pubDate><dc:creator>thisisfutile</dc:creator></item></channel></rss>