﻿<?xml version='1.0' encoding='UTF-8'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/"><channel><title>SQLServerCentral / Discuss Content Posted by Chris Cubley / Article Discussions / Article Discussions by Author  / Finding and Deleting Duplicate Data / Latest Posts</title><generator>InstantForum.NET v2.9.0</generator><description>SQLServerCentral</description><link>http://www.sqlservercentral.com/Forums/</link><webMaster>notifications@sqlservercentral.com</webMaster><lastBuildDate>Sun, 26 May 2013 01:38:37 GMT</lastBuildDate><ttl>20</ttl><item><title>RE: Finding and Deleting Duplicate Data</title><link>http://www.sqlservercentral.com/Forums/Topic14119-89-1.aspx</link><description>Hi,Great Post. This will definately help me. I tried manipulating your code to work for me and i get an "Msg 207, Level 16, State 1, Line 27 Invalid column name 'iID'." when i want o delete the duplicates and retail one copy of the duplicates. The error is with p2.iid. The column is there and i am not sure why i am getting this error.Can you please help?--deleteing duplicated rows with keeping 1 copy of the duplicated row.DELETE FROM	p1FROMVehicleHistory p1INNER JOIN	(		SELECT			MAX(iID) AS ColumnID,			ivehicleid,			dtdatetime,			iOdometer		FROM			VehicleHistory 		GROUP BY			ivehicleid,			dtdatetime,			iOdometer		HAVING			COUNT(*) &amp;gt; 1) p2		ON 		(p1.ivehicleid = p2.ivehicleid		AND		p1.dtdatetime = p2.dtdatetime		AND		p1.iOdometer = p2.iOdometer		and		p1.iID &amp;lt;&amp;gt; p2.iID)</description><pubDate>Fri, 13 Mar 2009 04:12:36 GMT</pubDate><dc:creator>vishnup</dc:creator></item><item><title>RE: Finding and Deleting Duplicate Data</title><link>http://www.sqlservercentral.com/Forums/Topic14119-89-1.aspx</link><description>To Calvin Lawson - I should hav added the modified code as follows:SET ANSI_NULLS ONGOSET QUOTED_IDENTIFIER ONGOALTER PROCEDURE SP_TEST01    ASBEGIN      SET NOCOUNT ON;   if object_ID('dbo.TBLTEST01') is not null drop table TBLTEST01CREATE TABLE TBLTEST01( ID int identity Not NUll Primary Key, CustNum int  NOT NULL, PostedDate datetime NOT NULL, PaymentAmt money  NOT NULL, PaymentMemo varchar(200) NOT NULL, CCRefNum char(10) NOT NULL)Insert TBLTEST01 Values (1, '01/10/2004', 1, '', '')Insert TBLTEST01 Values (1, '01/11/2004', 1, 'Keep', '')Insert TBLTEST01 Values (2, '01/10/2004', 2, '', '')Insert TBLTEST01 Values (2, '01/11/2004', 2, '', '')Insert TBLTEST01 Values (2, '01/12/2004', 2, 'Keep', '')Insert TBLTEST01 Values (3, '01/10/2004', 3, '', '')Insert TBLTEST01 Values (3, '01/11/2004', 3, '', '')Insert TBLTEST01 Values (3, '01/12/2004', 3, '', '')Insert TBLTEST01 Values (3, '01/13/2004', 3, 'Keep', '')delete p1From TBLTEST01 p1 JOIN TBLTEST01 p2    on  p1.CustNum = p2.CustNum and    p1.PostedDate &amp;lt; p2.PostedDate and     p1.PaymentAmt   = p2.PaymentAmt where p1.ID &amp;lt; p2.IDENDThanks again</description><pubDate>Wed, 31 Oct 2007 15:59:32 GMT</pubDate><dc:creator>Peter Meagher</dc:creator></item><item><title>RE: Finding and Deleting Duplicate Data</title><link>http://www.sqlservercentral.com/Forums/Topic14119-89-1.aspx</link><description>Thank you to Calvin Lawsen, Posted 7/26/2004 6:55 PM.  I tried and it worked first pass. However, I had to modify a bit to make the process delete the records with the earliest date and time stamp, leaving only the most recent record.  However, upon closer scrutiny, I noticed that the second part of your code segment, i.e. beginning with "DELETE FROM p1......"  didn't seem necessary?!  The first code segment worked just fine.Am I missing something or was this an oversight?</description><pubDate>Wed, 31 Oct 2007 15:56:04 GMT</pubDate><dc:creator>Peter Meagher</dc:creator></item><item><title>RE: Finding and Deleting Duplicate Data</title><link>http://www.sqlservercentral.com/Forums/Topic14119-89-1.aspx</link><description>&lt;P&gt;Here's the full script:&lt;/P&gt;&lt;P&gt;if object_ID('tempdb..#Payment') is not null drop table #PaymentCREATE TABLE #Payment( PaymentID int identity Not nUll Primary Key, CustomerNumber int  NOT NULL, PostedDatetime datetime NOT NULL, PaymentAmt money  NOT NULL, PaymentMemo varchar(200) NOT NULL, CCRefNum char(10) NOT NULL)&lt;/P&gt;&lt;P&gt; &lt;/P&gt;&lt;P&gt;Insert #Payment Values (1, '01/01/2004', 1, '', '')Insert #Payment Values (1, '01/01/2004', 1, '', '')Insert #Payment Values (2, '01/02/2004', 2, '', '')Insert #Payment Values (2, '01/02/2004', 2, '', '')Insert #Payment Values (2, '01/02/2004', 2, '', '')Insert #Payment Values (3, '01/03/2004', 3, '', '')Insert #Payment Values (3, '01/03/2004', 3, '', '')Insert #Payment Values (3, '01/03/2004', 3, '', '')Insert #Payment Values (3, '01/03/2004', 3, '', '')&lt;/P&gt;&lt;P&gt; &lt;/P&gt;&lt;P&gt; &lt;/P&gt;&lt;P&gt;delete p1From #Payment p1 JOIN #Payment p2    on  p1.CustomerNumber = p2.CustomerNumber and    p1.PostedDateTime = p2.PostedDateTime and     p1.PaymentAmt   = p2.PaymentAmt where p1.PaymentID &amp;lt; p2.PaymentID&lt;/P&gt;&lt;P&gt; &lt;/P&gt;&lt;P&gt;DELETE FROM p1FROM #Payment p1INNER JOIN (  SELECT   MAX(PaymentID) AS PaymentID,   CustomerNumber,   PostedDatetime,   PaymentAmt  FROM   #Payment  GROUP BY   CustomerNumber,   PostedDatetime,   PaymentAmt  HAVING   COUNT(*) &amp;gt; 1&amp;nbsp&lt;img src='images/emotions/wink.gif' height='20' width='20' border='0' title='Wink' align='absmiddle'&gt; p2 ON(  p1.CustomerNumber = p2.CustomerNumber  AND  p1.PostedDatetime = p2.PostedDatetime  AND  p1.PaymentAmt = p2.PaymentAmt  AND  p1.PaymentID &amp;lt;&amp;gt; p2.PaymentID&amp;nbsp&lt;img src='images/emotions/wink.gif' height='20' width='20' border='0' title='Wink' align='absmiddle'&gt;&lt;/P&gt;&lt;P&gt; &lt;/P&gt;</description><pubDate>Mon, 26 Jul 2004 11:55:00 GMT</pubDate><dc:creator>Calvin Lawson</dc:creator></item><item><title>RE: Finding and Deleting Duplicate Data</title><link>http://www.sqlservercentral.com/Forums/Topic14119-89-1.aspx</link><description>&lt;P&gt;This is a pretty good technique, but there are problems with it.  Using derived tables in a query can be very powerful, but they can also cause excessive use of tempdb (similar to using # tables).  Often, there are other ways of doing the same thing for cheaper.&lt;/P&gt;&lt;P&gt;So, if you don't have to use them, don't.  The query below is functionally equivalent to the delete statement in the article, but it costs almost half.&lt;/P&gt;&lt;P&gt;delete p1From Payment p1 JOIN Payment p2    on  p1.CustomerNumber = p2.CustomerNumber and    p1.PostedDateTime = p2.PostedDateTime and     p1.PaymentAmt   = p2.PaymentAmt where p1.PaymentID &amp;lt; p2.PaymentID&lt;/P&gt;&lt;P&gt; &lt;/P&gt;</description><pubDate>Mon, 26 Jul 2004 11:53:00 GMT</pubDate><dc:creator>Calvin Lawson</dc:creator></item><item><title>RE: Finding and Deleting Duplicate Data</title><link>http://www.sqlservercentral.com/Forums/Topic14119-89-1.aspx</link><description>Just to follow up, I followed all of the sample code and everything worked great the first time.  The article was well written and the logic was clear in describing why the steps were necessary. Keep up the good work !!! </description><pubDate>Tue, 29 Jul 2003 13:18:00 GMT</pubDate><dc:creator>cppwiz</dc:creator></item><item><title>RE: Finding and Deleting Duplicate Data</title><link>http://www.sqlservercentral.com/Forums/Topic14119-89-1.aspx</link><description>I think you can use this approach too--table structureCREATE TABLE [a4] (	[id] [int] IDENTITY (1, 1) NOT NULL ,	[title] [varchar] (50) ,	[name] [varchar] (50) NULL ,	CONSTRAINT [PK_a4] PRIMARY KEY  CLUSTERED 	(		[id]	)  ON [PRIMARY] ) ON [PRIMARY]GO--to have distinct id for the doubles--the first id'sselect distinct a.id, a.title, a.name from a4 a, a4 b where a.title = b.title and a.name = b.name and a.id &amp;lt; b.id --the last id'sselect distinct a.id, a.title, a.name from a4 a, a4 b where a.title = b.title  and a.name = b.name and a.id &amp;gt; b.id --to delete the first id's doubles and get unique rows in the original tabledelete from a4 where id in (select distinct a.id from a4 a, a4 b where a.title = b.title and a.name = b.name and a.id &amp;lt; b.id )--to delete the last id's doubles and get unique rows in the original tabledelete from a4 where id in (select distinct a.id from a4 a, a4 b where a.title = b.title and a.name = b.name and a.id &amp;gt; b.id ) </description><pubDate>Tue, 29 Jul 2003 10:18:00 GMT</pubDate><dc:creator>icata</dc:creator></item><item><title>RE: Finding and Deleting Duplicate Data</title><link>http://www.sqlservercentral.com/Forums/Topic14119-89-1.aspx</link><description>Good Article!  Thanks. </description><pubDate>Sat, 26 Jul 2003 17:53:00 GMT</pubDate><dc:creator>currym</dc:creator></item><item><title>RE: Finding and Deleting Duplicate Data</title><link>http://www.sqlservercentral.com/Forums/Topic14119-89-1.aspx</link><description>Excellent article !  This is something I plan to use on one of my tables later today. </description><pubDate>Fri, 25 Jul 2003 07:49:00 GMT</pubDate><dc:creator>cppwiz</dc:creator></item><item><title>RE: Finding and Deleting Duplicate Data</title><link>http://www.sqlservercentral.com/Forums/Topic14119-89-1.aspx</link><description>This is not as complex as using temporary tables and scrubbing processes and then moving to live tables.That was my first atempt. </description><pubDate>Fri, 25 Jul 2003 05:49:00 GMT</pubDate><dc:creator>whunt</dc:creator></item><item><title>RE: Finding and Deleting Duplicate Data</title><link>http://www.sqlservercentral.com/Forums/Topic14119-89-1.aspx</link><description>Useful topic to cover but isn't the final solution over-complex?Couldn't you just use an inequality of PaymentID's on a self-join as follows..?DELETE PaymentWHERE PaymentID IN     (SELECT P1.PaymentID     FROM Payment P1     INNER JOIN  Payment P2     ON P1.CustomerNumber = p2.CustomerNumber     AND     p1.PostedDatetime = p2.PostedDatetime     AND     p1.PaymentAmt = p2.PaymentAmt     AND     p1.PaymentID &amp;gt; p2.PaymentID)Mark </description><pubDate>Fri, 25 Jul 2003 03:45:00 GMT</pubDate><dc:creator>shepton</dc:creator></item><item><title>RE: Finding and Deleting Duplicate Data</title><link>http://www.sqlservercentral.com/Forums/Topic14119-89-1.aspx</link><description>Neatly encapsulates several techniques in one place. I will save it in my bits of useful code. </description><pubDate>Fri, 25 Jul 2003 02:11:00 GMT</pubDate><dc:creator>Graham Farrow</dc:creator></item><item><title>Finding and Deleting Duplicate Data</title><link>http://www.sqlservercentral.com/Forums/Topic14119-89-1.aspx</link><description>Comments posted to this topic are about the content posted at &lt;A HREF=http://www.sqlservercentral.com/columnists/ccubley/findinganddeletingduplicatedata.asp&gt;http://www.sqlservercentral.com/columnists/ccubley/findinganddeletingduplicatedata.asp&lt;/A&gt;</description><pubDate>Sun, 13 Jul 2003 00:00:00 GMT</pubDate><dc:creator>ccubley@queryplan.com</dc:creator></item></channel></rss>