﻿<?xml version='1.0' encoding='UTF-8'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/"><channel><title>SQLServerCentral / Discuss Content Posted by Charles Hawkins / Article Discussions / Article Discussions by Author  / Deduping Data in SQL Server 2005 / Latest Posts</title><generator>InstantForum.NET v2.9.0</generator><description>SQLServerCentral</description><link>http://www.sqlservercentral.com/Forums/</link><webMaster>notifications@sqlservercentral.com</webMaster><lastBuildDate>Sat, 18 May 2013 17:54:56 GMT</lastBuildDate><ttl>20</ttl><item><title>RE: Deduping Data in SQL Server 2005</title><link>http://www.sqlservercentral.com/Forums/Topic254584-278-1.aspx</link><description>&lt;P&gt;Another option (that works with SQL 2000) and even in cases of all the columns having the same value (no column to differentiate the rows), is inserting the result set into a new table with an identity column (or adding an identity column to the original table). After that, it's just a matter of keeping the distinct rows as shown in the comments.&lt;/P&gt;&lt;P&gt;I can't remember where I read this solution, but it was either here in SQL Server Central or SQL Team foruns.&lt;/P&gt;&lt;P&gt;André Cardoso&lt;/P&gt;</description><pubDate>Wed, 07 Feb 2007 04:45:00 GMT</pubDate><dc:creator>André Cardoso</dc:creator></item><item><title>RE: Deduping Data in SQL Server 2005</title><link>http://www.sqlservercentral.com/Forums/Topic254584-278-1.aspx</link><description>&lt;P class=MsoNormal style="MARGIN: 0pt"&gt;&lt;SPAN style="FONT-FAMILY: Arial"&gt;&lt;FONT size=3&gt;Interesting demonstration of the ROW_NUMBER() function. Please say it ain’t so, Joe! - that you are not using cursors to remove duplicate rows.&lt;img src='images/emotions/ermm.gif' height='20' width='20' border='0' title='Errmmm...' align='absmiddle'&gt; Even the technique of a SELECT DISTINCT into a temporary table would be a better option. As other readers have commented, there are a number of ways to remove duplicate rows. This would be my approach:&lt;?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" /&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P class=MsoNormal style="MARGIN: 0pt"&gt;&lt;SPAN style="FONT-FAMILY: Arial"&gt;&lt;FONT size=3&gt; &lt;o:p&gt;&lt;/o:p&gt;&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P class=MsoNormal style="MARGIN: 0pt"&gt;&lt;SPAN style="FONT-FAMILY: Arial"&gt;&lt;FONT size=3&gt;DELETE Employee&lt;o:p&gt;&lt;/o:p&gt;&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P class=MsoNormal style="MARGIN: 0pt"&gt;&lt;SPAN style="FONT-FAMILY: Arial"&gt;&lt;FONT size=3&gt;FROM Employee a INNER JOIN (SELECT Empid,&lt;o:p&gt;&lt;/o:p&gt;&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P class=MsoNormal style="MARGIN: 0pt"&gt;&lt;SPAN style="FONT-FAMILY: Arial"&gt;&lt;FONT size=3&gt;&lt;SPAN style="mso-spacerun: yes"&gt;                                       &lt;/SPAN&gt;&lt;SPAN style="mso-spacerun: yes"&gt;                                   &lt;/SPAN&gt;FName,&lt;o:p&gt;&lt;/o:p&gt;&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P class=MsoNormal style="MARGIN: 0pt"&gt;&lt;SPAN style="FONT-FAMILY: Arial"&gt;&lt;FONT size=3&gt;&lt;SPAN style="mso-spacerun: yes"&gt;   &lt;/SPAN&gt;&lt;SPAN style="mso-spacerun: yes"&gt;                                       &lt;/SPAN&gt;&lt;SPAN style="mso-spacerun: yes"&gt;                                &lt;/SPAN&gt;LName,&lt;o:p&gt;&lt;/o:p&gt;&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P class=MsoNormal style="MARGIN: 0pt"&gt;&lt;SPAN style="FONT-FAMILY: Arial"&gt;&lt;FONT size=3&gt;&lt;SPAN style="mso-spacerun: yes"&gt;    &lt;/SPAN&gt;&lt;SPAN style="mso-spacerun: yes"&gt;                                      &lt;/SPAN&gt;&lt;SPAN style="mso-spacerun: yes"&gt;                  &lt;/SPAN&gt;&lt;SPAN style="mso-spacerun: yes"&gt;             &lt;/SPAN&gt;MIN(RefDate) AS 'MinDate'&lt;o:p&gt;&lt;/o:p&gt;&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P class=MsoNormal style="MARGIN: 0pt"&gt;&lt;SPAN style="FONT-FAMILY: Arial"&gt;&lt;FONT size=3&gt;&lt;SPAN style="mso-spacerun: yes"&gt;        &lt;/SPAN&gt;&lt;SPAN style="mso-spacerun: yes"&gt;                          &lt;/SPAN&gt;&lt;SPAN style="mso-spacerun: yes"&gt;                        &lt;/SPAN&gt;FROM Employee&lt;SPAN style="mso-spacerun: yes"&gt;    &lt;/SPAN&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P class=MsoNormal style="MARGIN: 0pt"&gt;&lt;SPAN style="FONT-FAMILY: Arial"&gt;&lt;FONT size=3&gt;&lt;SPAN style="mso-spacerun: yes"&gt;                        &lt;/SPAN&gt;&lt;SPAN style="mso-spacerun: yes"&gt;                             &lt;/SPAN&gt;&lt;SPAN style="mso-spacerun: yes"&gt;    &lt;/SPAN&gt;GROUP BY Empid, FName, LName) b&lt;o:p&gt;&lt;/o:p&gt;&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P class=MsoNormal style="MARGIN: 0pt"&gt;&lt;SPAN style="FONT-FAMILY: Arial"&gt;&lt;FONT size=3&gt;&lt;SPAN style="mso-spacerun: yes"&gt;        &lt;/SPAN&gt;&lt;SPAN style="mso-spacerun: yes"&gt;                 &lt;/SPAN&gt;&lt;SPAN style="mso-spacerun: yes"&gt;         &lt;/SPAN&gt;ON a.Empid = b.Empid&lt;o:p&gt;&lt;/o:p&gt;&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P class=MsoNormal style="MARGIN: 0pt"&gt;&lt;SPAN style="FONT-FAMILY: Arial"&gt;&lt;FONT size=3&gt;&lt;SPAN style="mso-spacerun: yes"&gt;    &lt;/SPAN&gt;&lt;SPAN style="mso-spacerun: yes"&gt;              &lt;/SPAN&gt;&lt;SPAN style="mso-spacerun: yes"&gt;               &lt;/SPAN&gt;AND a.FName = b.FName &lt;o:p&gt;&lt;/o:p&gt;&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P class=MsoNormal style="MARGIN: 0pt"&gt;&lt;SPAN style="FONT-FAMILY: Arial"&gt;&lt;FONT size=3&gt;&lt;SPAN style="mso-spacerun: yes"&gt;              &lt;/SPAN&gt;&lt;SPAN style="mso-spacerun: yes"&gt;                   &lt;/SPAN&gt;AND a.LName = b.LName&lt;o:p&gt;&lt;/o:p&gt;&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P class=MsoNormal style="MARGIN: 0pt"&gt;&lt;SPAN style="FONT-FAMILY: Arial"&gt;&lt;FONT size=3&gt;&lt;SPAN style="mso-spacerun: yes"&gt;              &lt;/SPAN&gt;&lt;SPAN style="mso-spacerun: yes"&gt;                   &lt;/SPAN&gt;AND a.RefDate &amp;gt; b.MinDate&lt;o:p&gt;&lt;/o:p&gt;&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P class=MsoNormal style="MARGIN: 0pt"&gt;&lt;SPAN style="FONT-FAMILY: Arial"&gt;&lt;FONT size=3&gt; &lt;o:p&gt;&lt;/o:p&gt;&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P class=MsoNormal style="MARGIN: 0pt"&gt;&lt;SPAN style="FONT-FAMILY: Arial"&gt;&lt;FONT size=3&gt;This would still leave the issue of James verses Jim that would need to be resolved separately. If you didn’t care about spelling variations and wanted to assume that the first entry was the correct one then this would work:&lt;o:p&gt;&lt;/o:p&gt;&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P class=MsoNormal style="MARGIN: 0pt"&gt;&lt;SPAN style="FONT-FAMILY: Arial"&gt;&lt;FONT size=3&gt; &lt;o:p&gt;&lt;/o:p&gt;&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P class=MsoNormal style="MARGIN: 0pt"&gt;&lt;SPAN style="FONT-FAMILY: Arial"&gt;&lt;FONT size=3&gt;DELETE Employee&lt;o:p&gt;&lt;/o:p&gt;&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P class=MsoNormal style="MARGIN: 0pt"&gt;&lt;SPAN style="FONT-FAMILY: Arial"&gt;&lt;FONT size=3&gt;FROM Employee a INNER JOIN (SELECT Empid,&lt;o:p&gt;&lt;/o:p&gt;&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P class=MsoNormal style="MARGIN: 0pt"&gt;&lt;SPAN style="FONT-FAMILY: Arial"&gt;&lt;FONT size=3&gt;&lt;SPAN style="mso-spacerun: yes"&gt;              &lt;/SPAN&gt;&lt;SPAN style="mso-spacerun: yes"&gt;                                       &lt;/SPAN&gt;&lt;SPAN style="mso-spacerun: yes"&gt;                     &lt;/SPAN&gt;MIN(RefDate) AS 'MinDate'&lt;o:p&gt;&lt;/o:p&gt;&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P class=MsoNormal style="MARGIN: 0pt"&gt;&lt;SPAN style="FONT-FAMILY: Arial"&gt;&lt;FONT size=3&gt;&lt;SPAN style="mso-spacerun: yes"&gt;       &lt;/SPAN&gt;&lt;SPAN style="mso-spacerun: yes"&gt;                              &lt;/SPAN&gt;&lt;SPAN style="mso-spacerun: yes"&gt;                     &lt;/SPAN&gt;FROM Employee&lt;o:p&gt;&lt;/o:p&gt;&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P class=MsoNormal style="MARGIN: 0pt"&gt;&lt;SPAN style="FONT-FAMILY: Arial"&gt;&lt;FONT size=3&gt;&lt;SPAN style="mso-spacerun: yes"&gt;         &lt;/SPAN&gt;&lt;SPAN style="mso-spacerun: yes"&gt;                             &lt;/SPAN&gt;&lt;SPAN style="mso-spacerun: yes"&gt;                   &lt;/SPAN&gt;GROUP BY Empid) b&lt;o:p&gt;&lt;/o:p&gt;&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P class=MsoNormal style="MARGIN: 0pt"&gt;&lt;SPAN style="FONT-FAMILY: Arial"&gt;&lt;FONT size=3&gt;&lt;SPAN style="mso-spacerun: yes"&gt;               &lt;/SPAN&gt;&lt;SPAN style="mso-spacerun: yes"&gt;                   &lt;/SPAN&gt;ON a.Empid = b.Empid&lt;o:p&gt;&lt;/o:p&gt;&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P class=MsoNormal style="MARGIN: 0pt"&gt;&lt;SPAN style="FONT-FAMILY: Arial"&gt;&lt;FONT size=3&gt;&lt;SPAN style="mso-spacerun: yes"&gt;              &lt;/SPAN&gt;&lt;SPAN style="mso-spacerun: yes"&gt;                   &lt;/SPAN&gt;AND a.RefDate &amp;gt; b.MinDate&lt;o:p&gt;&lt;/o:p&gt;&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P class=MsoNormal style="MARGIN: 0pt"&gt;&lt;SPAN style="FONT-FAMILY: Arial"&gt;&lt;FONT size=3&gt; &lt;o:p&gt;&lt;/o:p&gt;&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P class=MsoNormal style="MARGIN: 0pt"&gt;&lt;SPAN style="FONT-FAMILY: Arial"&gt;&lt;FONT size=3&gt;I would be interested in the question of performance between the two techniques but I’d put my money on mine which I suspect has a whole lot less overhead even as a cross join than having the engine generate a row position.&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P class=MsoNormal style="MARGIN: 0pt"&gt;&lt;SPAN style="FONT-FAMILY: Arial"&gt;&lt;FONT size=3&gt;&lt;/FONT&gt;&lt;/SPAN&gt; &lt;/P&gt;&lt;P class=MsoNormal style="MARGIN: 0pt"&gt;&lt;SPAN style="FONT-FAMILY: Arial"&gt;&lt;FONT size=3&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/FONT&gt;&lt;/SPAN&gt; &lt;/P&gt;</description><pubDate>Tue, 07 Feb 2006 16:30:00 GMT</pubDate><dc:creator>USKiwi</dc:creator></item><item><title>RE: Deduping Data in SQL Server 2005</title><link>http://www.sqlservercentral.com/Forums/Topic254584-278-1.aspx</link><description>&lt;P&gt;great examples! If only duplicates need to be removed the ROW_NUMBER() may not be needed. &lt;/P&gt;&lt;P&gt;WITH cteEmployeeOrderedByMyRank AS(SELECT RANK() OVER (PARTITION BY EMPID,FNAME,LNAME ORDER BY REFDATE ASC) AS PartitionRank, * FROM EMPLOYEE -- WHERE 1 = 1)DELETE FROM cteEmployeeOrderedByMyRankWHERE PartitionRank &amp;gt; 1 ; It surely seems to be much faster than the cursor based apporach.&lt;/P&gt;</description><pubDate>Tue, 07 Feb 2006 13:14:00 GMT</pubDate><dc:creator>sscbm21</dc:creator></item><item><title>RE: Deduping Data in SQL Server 2005</title><link>http://www.sqlservercentral.com/Forums/Topic254584-278-1.aspx</link><description>Is there a performance gain to using this function and technique?  Or is it just one of those "hey cool function --- let's use it"?  </description><pubDate>Tue, 07 Feb 2006 09:24:00 GMT</pubDate><dc:creator>Aries Manlig</dc:creator></item><item><title>RE: Deduping Data in SQL Server 2005</title><link>http://www.sqlservercentral.com/Forums/Topic254584-278-1.aspx</link><description>&lt;P&gt;Hi...&lt;/P&gt;&lt;P&gt; &lt;/P&gt;&lt;P&gt;I usually clean up my "mess" with "something" like this&lt;/P&gt;&lt;P&gt; &lt;/P&gt;&lt;P&gt;delete from myTable where myID NOT in &lt;/P&gt;&lt;P&gt;(select min(MyID) from myTable group by myUniqueField[s])&lt;/P&gt;&lt;P&gt; &lt;/P&gt;&lt;P&gt;But I think the best application for CTE are recursive querries...&lt;/P&gt;</description><pubDate>Tue, 07 Feb 2006 08:11:00 GMT</pubDate><dc:creator>Heiko Hatzfeld</dc:creator></item><item><title>RE: Deduping Data in SQL Server 2005</title><link>http://www.sqlservercentral.com/Forums/Topic254584-278-1.aspx</link><description>&lt;P&gt;nice example &lt;img src='images/emotions/cool.gif' height='20' width='20' border='0' title='Cool' align='absmiddle'&gt;&lt;/P&gt;&lt;P&gt;Here is another :&lt;/P&gt;&lt;FONT color=#0000ff size=2&gt;&lt;P&gt;WITH&lt;/FONT&gt;&lt;FONT size=2&gt; cteEmployeeOrderedByMyRowNumber &lt;/FONT&gt;&lt;FONT color=#0000ff size=2&gt;AS&lt;/FONT&gt;&lt;FONT color=#808080 size=2&gt;(&lt;/FONT&gt;&lt;FONT color=#0000ff size=2&gt;SELECT&lt;/FONT&gt;&lt;FONT size=2&gt; ROW_NUMBER&lt;/FONT&gt;&lt;FONT color=#808080 size=2&gt;()&lt;/FONT&gt;&lt;FONT size=2&gt; &lt;/FONT&gt;&lt;FONT color=#0000ff size=2&gt;OVER&lt;/FONT&gt;&lt;FONT size=2&gt; &lt;/FONT&gt;&lt;FONT color=#808080 size=2&gt;(&lt;/FONT&gt;&lt;FONT color=#0000ff size=2&gt;ORDER&lt;/FONT&gt;&lt;FONT size=2&gt; &lt;/FONT&gt;&lt;FONT color=#0000ff size=2&gt;BY&lt;/FONT&gt;&lt;FONT size=2&gt; EMPID &lt;/FONT&gt;&lt;FONT color=#0000ff size=2&gt;ASC&lt;/FONT&gt;&lt;FONT color=#808080 size=2&gt;,&lt;/FONT&gt;&lt;FONT size=2&gt; REFDATE &lt;/FONT&gt;&lt;FONT color=#0000ff size=2&gt;ASC&lt;/FONT&gt;&lt;FONT color=#808080 size=2&gt;)&lt;/FONT&gt;&lt;FONT size=2&gt; &lt;/FONT&gt;&lt;FONT color=#0000ff size=2&gt;AS&lt;/FONT&gt;&lt;FONT size=2&gt; MyRowNumber&lt;/FONT&gt;&lt;FONT color=#808080 size=2&gt;,&lt;/FONT&gt;&lt;FONT size=2&gt; Row_NUMBER&lt;/FONT&gt;&lt;FONT color=#808080 size=2&gt;()&lt;/FONT&gt;&lt;FONT size=2&gt; &lt;/FONT&gt;&lt;FONT color=#0000ff size=2&gt;Over&lt;/FONT&gt;&lt;FONT size=2&gt; &lt;/FONT&gt;&lt;FONT color=#808080 size=2&gt;(&lt;/FONT&gt;&lt;FONT color=#0000ff size=2&gt;Partition&lt;/FONT&gt;&lt;FONT size=2&gt; &lt;/FONT&gt;&lt;FONT color=#0000ff size=2&gt;By&lt;/FONT&gt;&lt;FONT size=2&gt; EMPID&lt;/FONT&gt;&lt;FONT color=#808080 size=2&gt;,&lt;/FONT&gt;&lt;FONT size=2&gt;FNAME&lt;/FONT&gt;&lt;FONT color=#808080 size=2&gt;,&lt;/FONT&gt;&lt;FONT size=2&gt;LNAME &lt;/FONT&gt;&lt;FONT color=#0000ff size=2&gt;Order&lt;/FONT&gt;&lt;FONT size=2&gt; &lt;/FONT&gt;&lt;FONT color=#0000ff size=2&gt;By&lt;/FONT&gt;&lt;FONT size=2&gt; REFDATE &lt;/FONT&gt;&lt;FONT color=#0000ff size=2&gt;ASC&lt;/FONT&gt;&lt;FONT color=#808080 size=2&gt;)&lt;/FONT&gt;&lt;FONT size=2&gt; &lt;/FONT&gt;&lt;FONT color=#0000ff size=2&gt;as&lt;/FONT&gt;&lt;FONT size=2&gt; PartitionRank&lt;/FONT&gt;&lt;FONT color=#808080 size=2&gt;,&lt;/FONT&gt;&lt;FONT size=2&gt; &lt;/FONT&gt;&lt;FONT color=#808080 size=2&gt;*&lt;/FONT&gt;&lt;FONT size=2&gt; &lt;/FONT&gt;&lt;FONT color=#0000ff size=2&gt;FROM&lt;/FONT&gt;&lt;FONT size=2&gt; EMPLOYEE &lt;/FONT&gt;&lt;FONT color=#008000 size=2&gt;-- WHERE 1 = 1&lt;/FONT&gt;&lt;FONT color=#808080 size=2&gt;)&lt;/FONT&gt;&lt;FONT color=#0000ff size=2&gt;DELETE&lt;/FONT&gt;&lt;FONT size=2&gt; &lt;/FONT&gt;&lt;FONT color=#0000ff size=2&gt;FROM&lt;/FONT&gt;&lt;FONT size=2&gt; cteEmployeeOrderedByMyRowNumber &lt;/FONT&gt;&lt;FONT color=#0000ff size=2&gt;where&lt;/FONT&gt;&lt;FONT size=2&gt; PartitionRank &lt;/FONT&gt;&lt;FONT color=#808080 size=2&gt;&amp;gt;&lt;/FONT&gt;&lt;FONT size=2&gt; 1 &lt;/FONT&gt;&lt;FONT color=#808080 size=2&gt;;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color=#808080 size=2&gt;&lt;/FONT&gt; &lt;/P&gt;&lt;P&gt;&lt;FONT color=#808080&gt;just to get it in the tips of the fingers &lt;img src='images/emotions/w00t.gif' height='20' width='20' border='0' title='w00t' align='absmiddle'&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color=#808080 size=2&gt; &lt;/P&gt;&lt;/FONT&gt;</description><pubDate>Tue, 07 Feb 2006 04:03:00 GMT</pubDate><dc:creator>ALZDBA</dc:creator></item><item><title>RE: Deduping Data in SQL Server 2005</title><link>http://www.sqlservercentral.com/Forums/Topic254584-278-1.aspx</link><description>When you are populating the record set, take out the GO between the next-to-the-last and the last insert statements. Having this penultimate GO in the set of queries will remove the scope of the @NOW variable and cause the last INSERT to fail.</description><pubDate>Tue, 07 Feb 2006 03:37:00 GMT</pubDate><dc:creator>drnetwork</dc:creator></item><item><title>Deduping Data in SQL Server 2005</title><link>http://www.sqlservercentral.com/Forums/Topic254584-278-1.aspx</link><description>Comments posted to this topic are about the content posted at &lt;A HREF="http://www.sqlservercentral.com/columnists/chawkins/dedupingdatainsqlserver2005.asp"&gt;http://www.sqlservercentral.com/columnists/chawkins/dedupingdatainsqlserver2005.asp&lt;/A&gt;</description><pubDate>Mon, 30 Jan 2006 17:35:00 GMT</pubDate><dc:creator>drnetwork</dc:creator></item></channel></rss>