﻿<?xml version='1.0' encoding='UTF-8'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/"><channel><title>SQLServerCentral / Article Discussions / Article Discussions by Author / Discuss content posted by Seth Delconte  / Get Rid of Duplicates! / Latest Posts</title><generator>InstantForum.NET v2.9.0</generator><description>SQLServerCentral</description><link>http://www.sqlservercentral.com/Forums/</link><webMaster>notifications@sqlservercentral.com</webMaster><lastBuildDate>Wed, 22 May 2013 04:47:06 GMT</lastBuildDate><ttl>20</ttl><item><title>RE: Get Rid of Duplicates!</title><link>http://www.sqlservercentral.com/Forums/Topic825835-1700-1.aspx</link><description>Heh... nah... it's just the human element on a (thankfully) open forum.  Sometimes ya just gotta play the ol' Jedi mind trick on your self... "These are not the droids I want... I'll move along." :-D</description><pubDate>Tue, 01 Dec 2009 14:36:26 GMT</pubDate><dc:creator>Jeff Moden</dc:creator></item><item><title>RE: Get Rid of Duplicates!</title><link>http://www.sqlservercentral.com/Forums/Topic825835-1700-1.aspx</link><description>Sad isn't it when my forum posts offer more challenge and value ..... cheer,</description><pubDate>Tue, 01 Dec 2009 13:54:00 GMT</pubDate><dc:creator>G33kKahuna</dc:creator></item><item><title>RE: Get Rid of Duplicates!</title><link>http://www.sqlservercentral.com/Forums/Topic825835-1700-1.aspx</link><description>[quote][b]G33kKahuna (12/1/2009)[/b][hr][quote]Heh... man... you don't need be to be so rude. Most everything that folks write an article having to do with SQL Server are all "age old dilemma's" and yet nothing is truly at rest. A newbie might stumble into the discussion that follows such an article and actually learn something new.  [/quote]Jeff, with all due respects, there is nothing new to learn in the article. It's done, closed and available across  the internet ... [url=http://lmgtfy.com/?q=t-sql+get+rid+of+duplicates] here is a simple google search[/url] .. Those were the days when SSC had interesting articles everyday; these days anyone with access to internet and oxygen is dumping garbage on the site. I wish SSC moderated articles post more ...[/quote]With the same respect, you don't stike me as an SSC 'old timer' ("Those were the days...") with a whopping big 139 posts. ;-)  Heh... and there are other things to learn about from such an article like the rare alternate method found in the discussions that follow or the human element that causes people to waste their time flaming about such articles.  At least the guy tried... how many articles have you written?  :hehe:Instead of offering a Google link on this subject, take a look at item 6 in the following article...[url]http://www.ehow.com/how_2106033_use-proper-forum-etiquette.html[/url]</description><pubDate>Tue, 01 Dec 2009 13:30:46 GMT</pubDate><dc:creator>Jeff Moden</dc:creator></item><item><title>RE: Get Rid of Duplicates!</title><link>http://www.sqlservercentral.com/Forums/Topic825835-1700-1.aspx</link><description>[quote][b]tpoulsen (12/1/2009)[/b][hr]Well don't read it then - the only thing that's worse than low quality forum posts is people poncing about complaining about how offended their intellect is by this terrible posts that they are having to read.If you know how to find duplicates then yes - you probably should stop reading articles about how to find duplicates.And before you get all huffed up and spend all night drafting your reply - that's my final word on the matter!Cheers mate! :0)[/quote]Ditto to you .... don't like my comments ... move on ...</description><pubDate>Tue, 01 Dec 2009 13:28:26 GMT</pubDate><dc:creator>G33kKahuna</dc:creator></item><item><title>RE: Get Rid of Duplicates!</title><link>http://www.sqlservercentral.com/Forums/Topic825835-1700-1.aspx</link><description>Well don't read it then - the only thing that's worse than low quality forum posts is people poncing about complaining about how offended their intellect is by this terrible posts that they are having to read.If you know how to find duplicates then yes - you probably should stop reading articles about how to find duplicates.And before you get all huffed up and spend all night drafting your reply - that's my final word on the matter!Cheers mate! :0)</description><pubDate>Tue, 01 Dec 2009 13:12:02 GMT</pubDate><dc:creator>tpoulsen</dc:creator></item><item><title>RE: Get Rid of Duplicates!</title><link>http://www.sqlservercentral.com/Forums/Topic825835-1700-1.aspx</link><description>[quote]Heh... man... you don't need be to be so rude. Most everything that folks write an article having to do with SQL Server are all "age old dilemma's" and yet nothing is truly at rest. A newbie might stumble into the discussion that follows such an article and actually learn something new.  [/quote]Jeff, with all due respects, there is nothing new to learn in the article. It's done, closed and available across  the internet ... [url=http://lmgtfy.com/?q=t-sql+get+rid+of+duplicates] here is a simple google search[/url] .. Those were the days when SSC had interesting articles everyday; these days anyone with access to internet and oxygen is dumping garbage on the site. I wish SSC moderated articles post more ...</description><pubDate>Tue, 01 Dec 2009 12:36:29 GMT</pubDate><dc:creator>G33kKahuna</dc:creator></item><item><title>RE: Get Rid of Duplicates!</title><link>http://www.sqlservercentral.com/Forums/Topic825835-1700-1.aspx</link><description>[quote][b]G33kKahuna (12/1/2009)[/b][hr]Seth,this is an age old dilemma and was put to rest with many variations like JP de Jong-202059 pointed out. In fact this site has articles with script to perform the same task. Stop wasting our time with your eureka moments ....[/quote]Heh... man... you don't need be to be so rude.  Most everything that folks write an article having to do with SQL Server are all "age old dilemma's" and yet nothing is truly at rest.  A newbie might stumble into the discussion that follows such an article and actually learn something new. ;-)</description><pubDate>Tue, 01 Dec 2009 11:38:23 GMT</pubDate><dc:creator>Jeff Moden</dc:creator></item><item><title>RE: Get Rid of Duplicates!</title><link>http://www.sqlservercentral.com/Forums/Topic825835-1700-1.aspx</link><description>Hey all - can I firstly just join the "I think this is a good article" camp - obviously there are always more than one way of doing things - but assuming you're not always using 2005+ I think this solution is very nice.One thing I stumbled on in the following discussion this comment "It's not always required to add unique indexes/constraints, though that was a good tip."Just out of curiosity - [u]if[/u] I have a table where I use surrogate PK of some sort (Ints or GUIDs or whatever) I always make sure that I also have a natural key in the form of a unique index - so say it's a Person table I might place this on the email, if it's an Order table I might place it on the Customer and TimePlaces etc.I normally go to quite some length to do this, mainly just because I was once taught this was good practise - and also because I find it traps a lot of application logic errors that would otherwise go unnoticed.Now I should say that I am an application developer and not a DBA - so I am actually quite interested in hearing your opinion on this.</description><pubDate>Tue, 01 Dec 2009 10:33:50 GMT</pubDate><dc:creator>tpoulsen</dc:creator></item><item><title>RE: Get Rid of Duplicates!</title><link>http://www.sqlservercentral.com/Forums/Topic825835-1700-1.aspx</link><description>Seth,Thanks. Oh... so.. simple. Gotcha. Makes sense. Just never did a clean dups by deleting them all from the original table.Yeah, there's tons of ways to deal with dups. Still appreciate you're taking the time to write the article.Write on!Richard</description><pubDate>Tue, 01 Dec 2009 10:30:30 GMT</pubDate><dc:creator>rstelma</dc:creator></item><item><title>RE: Get Rid of Duplicates!</title><link>http://www.sqlservercentral.com/Forums/Topic825835-1700-1.aspx</link><description>Yes but there is nothing fantastic about your method; its been long done by many in SQL2K world. Just google and you will find a ton articles ... cheers.</description><pubDate>Tue, 01 Dec 2009 10:22:18 GMT</pubDate><dc:creator>G33kKahuna</dc:creator></item><item><title>RE: Get Rid of Duplicates!</title><link>http://www.sqlservercentral.com/Forums/Topic825835-1700-1.aspx</link><description>[quote]If you found two duplicated item_no's why did four rows get deleted? Wouldn't you want to delete just one of the duplicates so that one unique row would remain?I must be missing something. Thanks for your explanation in advance.Richard[/quote]Richard,In this scenario, I have 2 duplicated records, where every field is identical.  I copied one instance of each record into a temp table, then deleted ALL the records from the original table that had [b]item_no[/b]s that were duplicated (each of the 2 records had 1 duplicate record, so the total was 4 records).  I chose to group by [b]item_no[/b], but could just as easily have used [b]id[/b].  Then I copied everything from the temp table back to the original table (2 non-duplicate records).  This method just seemed to make sense to me, I'm sure there are other, possibly more efficient ways to do this.</description><pubDate>Tue, 01 Dec 2009 09:45:56 GMT</pubDate><dc:creator>seth delconte</dc:creator></item><item><title>RE: Get Rid of Duplicates!</title><link>http://www.sqlservercentral.com/Forums/Topic825835-1700-1.aspx</link><description>Seth,this is an age old dilemma and was put to rest with many variations like JP de Jong-202059 pointed out. In fact this site has articles with script to perform the same task. Stop wasting our time with your eureka moments ....</description><pubDate>Tue, 01 Dec 2009 09:16:03 GMT</pubDate><dc:creator>G33kKahuna</dc:creator></item><item><title>RE: Get Rid of Duplicates!</title><link>http://www.sqlservercentral.com/Forums/Topic825835-1700-1.aspx</link><description>I prefer to use GROUP BY when I want to see the number of duplicates for each group of attributes used to check uniqueness, while ROW_NUMBER is a nicer solution and I use it especially when I want to use the latest or earliest entered version of the same record. The check for duplicates might be required when merging data from two different sources or when breaking a not normalized table into multiple tables, for example a headers/lines set of tables. Another situation in which I had to check for duplicates is when importing data from non-relational sources (e.g. text files, Excel sheets, etc.) in which the chances of having duplicates are quite high.As already stressed, it's preferable to reduce upfront the possibility of entering duplicates, unfortunately that's not always possible. It's not always required to add unique indexes/constraints, though that was a good tip.</description><pubDate>Tue, 01 Dec 2009 02:42:01 GMT</pubDate><dc:creator>sql-troubles</dc:creator></item><item><title>RE: Get Rid of Duplicates!</title><link>http://www.sqlservercentral.com/Forums/Topic825835-1700-1.aspx</link><description>I have to admit that when I read the following, I thought that Seth had simply lost his mind...[quote]My quick resolution in this situation is to: 1.  Remove the unique index temporarily; 2.  Run the application, allowing it to insert duplicate item(s); 3.  then find the duplicate(s) and remove them. Of course, these steps are preceded by performing a good backup of the database and possibly putting the database in single user mode to prevent unexpected query results during my work. As simple as the task of removing a record with a duplicate value sounds, it can get confusing, and I need to proceed with care. To be safe, I follow this rule of thumb: first I perform a SELECT of the record(s) that will be removed, then I convert it to a DELETE statement after I'm sure it will affect only record(s) that I want it to.[/quote].. because it just wasn't clear that it was a legacy app that shouldn't be changed because of the impending rewrite.  I thought that was an awful lot of work to do a simple conditional insert.Now that Seth has clarified the problem a bit, I can mostly agree with the pain he goes through including that of duplicate elimination.  On that subject and for all of those that made the very good suggestion of using ROW_NUMBER() to isolate duplicates, keep in mind that this is a legacy app on a legacy DB and it might be pre-2k5 where ROW_NUMBER() simply doesn't exist.  Still, the title of the article is "Get Rid of Duplicates" and not "Get Rid of Duplicates for a Special Case" and I can certainly understand why people may have jumped to the wrong conclusion on this article especially when the wrap-up line in the Conclusion is "Now you can confidently remove duplicate records from your tables!" and there was no mention of version nor ROW_NUMBER(). ;-)That notwithstanding, for what the article was actually about, it was a good, well written article.  Thanks, Seth.As a side bar... I don't know what the app would do with the [font="Courier New"]"Duplicate key was ignored."[/font] message that would show up you tried to insert any dupes (some apps interpret such messages as an error... same goes with returned row counts), but have you tried changing the unique index to a unique index with the "IGNORE_DUP_KEY = ON" setting?  If the app forgives the warning message(s) about dupes being ignored (1 for each INSERT statment that has dupes no matter how many dupes exist in that INSERT), it could save you a wad of trouble that you're currently going through.</description><pubDate>Mon, 30 Nov 2009 17:41:24 GMT</pubDate><dc:creator>Jeff Moden</dc:creator></item><item><title>RE: Get Rid of Duplicates!</title><link>http://www.sqlservercentral.com/Forums/Topic825835-1700-1.aspx</link><description>If you found two duplicated item_no's why did four rows get deleted? Wouldn't you want to delete just one of the duplicates so that one unique row would remain?I must be missing something. Thanks for your explanation in advance.Richard</description><pubDate>Mon, 30 Nov 2009 10:47:08 GMT</pubDate><dc:creator>rstelma</dc:creator></item><item><title>RE: Get Rid of Duplicates!</title><link>http://www.sqlservercentral.com/Forums/Topic825835-1700-1.aspx</link><description>I use this:DELETE FROM tblUser tu1WHERE tu1.intUserID &amp;gt; ANY (SELECT intUserID 	FROM tblUser tu2	WHERE tu2.strUserName = tu1.strUserName		AND tu2.strFamilyName = tu1.strFamilyName)</description><pubDate>Mon, 30 Nov 2009 10:46:12 GMT</pubDate><dc:creator>dasapito</dc:creator></item><item><title>RE: Get Rid of Duplicates!</title><link>http://www.sqlservercentral.com/Forums/Topic825835-1700-1.aspx</link><description>I use this one a lot because it removes multiples (3's, 4's, etc)  - not just duplicates...WITH dups AS( SELECT *, ROW_NUMBER() OVER (partition BY USER_NAME, start_date ORDER BY USER_NAME, start_date) AS RowNum   FROM tbl_users)    Delete from dups where rownum &amp;gt; 1</description><pubDate>Mon, 30 Nov 2009 08:31:35 GMT</pubDate><dc:creator>dbajunior</dc:creator></item><item><title>RE: Get Rid of Duplicates!</title><link>http://www.sqlservercentral.com/Forums/Topic825835-1700-1.aspx</link><description>Good explanation, I can understand now why the issue cannot be resolved up front. Thanks,</description><pubDate>Mon, 30 Nov 2009 08:21:19 GMT</pubDate><dc:creator>Logicalman1998</dc:creator></item><item><title>RE: Get Rid of Duplicates!</title><link>http://www.sqlservercentral.com/Forums/Topic825835-1700-1.aspx</link><description>Good article, Seth, and a nice explanation of the issue. It's not always easy to do things up front, especially when you have business reasons for not putting resources into those solutions. We've all had apps that we would like to re-architect, but could not for some reason.</description><pubDate>Mon, 30 Nov 2009 08:15:00 GMT</pubDate><dc:creator>Steve Jones - SSC Editor</dc:creator></item><item><title>RE: Get Rid of Duplicates!</title><link>http://www.sqlservercentral.com/Forums/Topic825835-1700-1.aspx</link><description>To answer the 'Why don't you just use replication/triggers to keep the tables in sync' questions:Our app is being phased out, and was developed by 2 teams of developers that wrote the app to access 2 different databases that were very similar, but not exactly the same.  As we are developing new software to replace the old app, I have to keep it functional for now.  Thus, replication and/or triggers are not a viable solution in this case. :)</description><pubDate>Mon, 30 Nov 2009 08:00:39 GMT</pubDate><dc:creator>seth delconte</dc:creator></item><item><title>RE: Get Rid of Duplicates!</title><link>http://www.sqlservercentral.com/Forums/Topic825835-1700-1.aspx</link><description>yes agree with this - if you are replicating two databases then just update one - other than that use unique constraints (and if need be triggers) to make sure that you always have a natural uniqueness on each row in your table.</description><pubDate>Mon, 30 Nov 2009 07:10:12 GMT</pubDate><dc:creator>tpoulsen</dc:creator></item><item><title>RE: Get Rid of Duplicates!</title><link>http://www.sqlservercentral.com/Forums/Topic825835-1700-1.aspx</link><description>I think I'm with Tony Scott on this one: why not prevent the issue at insert time, rather than go through all the pain of removing duplicates after the fact?</description><pubDate>Mon, 30 Nov 2009 07:03:30 GMT</pubDate><dc:creator>Dean Cochrane</dc:creator></item><item><title>RE: Get Rid of Duplicates!</title><link>http://www.sqlservercentral.com/Forums/Topic825835-1700-1.aspx</link><description>I have had a similar problem in the past, but rather than looking to clean up after the fact, I test for duplicates beforehand and eliminate the insert at that time.I feel I have missed something in the original article as to why this might not have been identified as a design issue.</description><pubDate>Mon, 30 Nov 2009 06:51:48 GMT</pubDate><dc:creator>Logicalman1998</dc:creator></item><item><title>RE: Get Rid of Duplicates!</title><link>http://www.sqlservercentral.com/Forums/Topic825835-1700-1.aspx</link><description>There's an old article on it on this site...[url]http://www.sqlservercentral.com/articles/SQL+Server+2005+-+TSQL/dedupingdatainsqlserver2005/2260/[/url]This option became available with SQL Server 2005.[quote][b]Jim C-203340 (11/30/2009)[/b][hr]The row_number() method is by far the quickest and cleanest method. If you've never used row_number() before, do yourself a favor and learn it. one modification to JP's code.. "... where RowNumber &amp;gt; 1" will delete all duplicates not just in cases where you only have 1 dup. [quote][b]JP de Jong-202059 (11/30/2009)[/b][hr]Hi I prefer this syntax:WITH ItemsToBeDeletedAS (SELECT *, row_number() over (partition by item_no ORDER BY id) as RowNumberFROM item_store )DELETE FROM ItemsToBeDeleted Where RowNumber = 2Much more efficient.Regards,JP[/quote][/quote]</description><pubDate>Mon, 30 Nov 2009 06:42:17 GMT</pubDate><dc:creator>RyanRandall</dc:creator></item><item><title>RE: Get Rid of Duplicates!</title><link>http://www.sqlservercentral.com/Forums/Topic825835-1700-1.aspx</link><description>The row_number() method is by far the quickest and cleanest method. If you've never used row_number() before, do yourself a favor and learn it. one modification to JP's code.. "... where RowNumber &amp;gt; 1" will delete all duplicates not just in cases where you only have 1 dup. [quote][b]JP de Jong-202059 (11/30/2009)[/b][hr]Hi I prefer this syntax:WITH ItemsToBeDeletedAS (SELECT *, row_number() over (partition by item_no ORDER BY id) as RowNumberFROM item_store )DELETE FROM ItemsToBeDeleted Where RowNumber = 2Much more efficient.Regards,JP[/quote]</description><pubDate>Mon, 30 Nov 2009 06:22:18 GMT</pubDate><dc:creator>Jim C-203340</dc:creator></item><item><title>RE: Get Rid of Duplicates!</title><link>http://www.sqlservercentral.com/Forums/Topic825835-1700-1.aspx</link><description>It's very neat and all, and maybe I misunderstand, but if the tables are identical, couldn't you avoid the whole duplicates issue by inserting into one table only, and let replication take care of the rest?RegardsPeter</description><pubDate>Mon, 30 Nov 2009 05:24:21 GMT</pubDate><dc:creator>Peter Pirker</dc:creator></item><item><title>RE: Get Rid of Duplicates!</title><link>http://www.sqlservercentral.com/Forums/Topic825835-1700-1.aspx</link><description>dealing with duplicates on a daily basis, i find this approach works wellalter table withdupes add delid int identity(1,1)delete xfrom withdupes xinner join (select itemno, min(delid) as keepid from withdupes group by itemno) y on x.itemo = y.itemnowhere x.delid &amp;lt;&amp;gt; y.keepidalter table withdupes drop column delid</description><pubDate>Mon, 30 Nov 2009 04:38:04 GMT</pubDate><dc:creator>Trevor.weehuizen</dc:creator></item><item><title>RE: Get Rid of Duplicates!</title><link>http://www.sqlservercentral.com/Forums/Topic825835-1700-1.aspx</link><description>Hi I prefer this syntax:WITH ItemsToBeDeletedAS (SELECT *, row_number() over (partition by item_no ORDER BY id) as RowNumberFROM item_store )DELETE FROM ItemsToBeDeleted Where RowNumber = 2Much more efficient.Regards,JP</description><pubDate>Mon, 30 Nov 2009 02:25:53 GMT</pubDate><dc:creator>JP de Jong-202059</dc:creator></item><item><title>RE: Get Rid of Duplicates!</title><link>http://www.sqlservercentral.com/Forums/Topic825835-1700-1.aspx</link><description>I have the same issue on some tables that have an auto increment column which isn't an identity column, though lucky it sounds like it doesn't happen as often as it does for you! This is my technique which doesn't use a temp table, I use set rowcount set to 1 less than the number of duplicates and then call delete:CREATE TABLE dbo.tblDupTest(	id		int		not null,)INSERT INTO dbo.tblDupTest VALUES(1)INSERT INTO dbo.tblDupTest VALUES(1)INSERT INTO dbo.tblDupTest VALUES(2)INSERT INTO dbo.tblDupTest VALUES(3)INSERT INTO dbo.tblDupTest VALUES(4)INSERT INTO dbo.tblDupTest VALUES(4)INSERT INTO dbo.tblDupTest VALUES(4)-- At this point we should have two 1s and three 4sSELECT	*FROM	dbo.tblDupTest-- This will give us the countsSELECT	id,		COUNT(id) AS 'Count'FROM	dbo.tblDupTestGROUP BY idHAVING COUNT(id) &amp;gt; 1-- Then set the rowcount to one less than the duplicate and call deleteset rowcount 1DELETE FROM dbo.tblDupTest WHERE id = 1set rowcount 2DELETE FROM dbo.tblDupTest WHERE id = 4set rowcount 0SELECT	*FROM	dbo.tblDupTest</description><pubDate>Mon, 30 Nov 2009 02:05:13 GMT</pubDate><dc:creator>int.blue</dc:creator></item><item><title>Get Rid of Duplicates!</title><link>http://www.sqlservercentral.com/Forums/Topic825835-1700-1.aspx</link><description>Comments posted to this topic are about the item [B]&lt;A HREF="/articles/T-SQL/68376/"&gt;Get Rid of Duplicates!&lt;/A&gt;[/B]</description><pubDate>Sat, 28 Nov 2009 11:19:39 GMT</pubDate><dc:creator>seth delconte</dc:creator></item></channel></rss>