﻿<?xml version='1.0' encoding='UTF-8'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/"><channel><title>SQLServerCentral / Article Discussions / Article Discussions by Author / Discuss content posted by Syed Iqbal  / Remove Duplicate Records / Latest Posts</title><generator>InstantForum.NET v2.9.0</generator><description>SQLServerCentral</description><link>http://www.sqlservercentral.com/Forums/</link><webMaster>notifications@sqlservercentral.com</webMaster><lastBuildDate>Fri, 24 May 2013 16:31:28 GMT</lastBuildDate><ttl>20</ttl><item><title>RE: Remove Duplicate Records</title><link>http://www.sqlservercentral.com/Forums/Topic491052-1258-1.aspx</link><description>[quote][b]AmolNaik (6/13/2011)[/b][hr]Smart technique. But beware SET ROWCOUNT is set to be deprecated in the future releases, instead you could use TOP 1. An alternate method to remove duplicates is to use ROW_NUMBER() technique, basically ordering the resultset over the column phone number and then deleting row where row_number &amp;gt; 1, this will make sure that only 1 resultset is retained and any other repeating instances are deleted.Thanks![/quote]It'll also be much faster than the looping method.  :-)</description><pubDate>Mon, 13 Jun 2011 22:13:10 GMT</pubDate><dc:creator>Jeff Moden</dc:creator></item><item><title>RE: Remove Duplicate Records</title><link>http://www.sqlservercentral.com/Forums/Topic491052-1258-1.aspx</link><description>Smart technique. But beware SET ROWCOUNT is set to be deprecated in the future releases, instead you could use TOP 1. An alternate method to remove duplicates is to use ROW_NUMBER() technique, basically ordering the resultset over the column phone number and then deleting row where row_number &amp;gt; 1, this will make sure that only 1 resultset is retained and any other repeating instances are deleted.Thanks!</description><pubDate>Mon, 13 Jun 2011 15:24:39 GMT</pubDate><dc:creator>AmolNaik</dc:creator></item><item><title>RE: Remove Duplicate Records</title><link>http://www.sqlservercentral.com/Forums/Topic491052-1258-1.aspx</link><description>Hey guys,  I have a fairly large table in production with almost a million records, and I have duplicate records in that table with about 18,000 duplicate records.  I've been trying some sql suggestions to delete duplicates, and I found your example here I was able to (I think) incorporate my field information into.  Running this statement against my test database which is only half my production database, i executed the query which is running for some 4+ hours.  When i stopped the query from executing, i found i had some 38,000+ records added to my database table... so I'm wondering if i made some error when configuring your sql into my database.  Here is the code i created using your example.SET ROWCOUNT 1SELECT @@rowcountWHILE @@rowcount &amp;gt; 0DELETE pb FROM Expense as pbINNER JOIN(SELECT accountCode, expenseDescription, invoiceDate, invoiceNumber, ledgerCode, openItemNumber, programCode, transactionAmount, vendorName, warrantNumber, lineNumber, collocationCodeIDFROM Expensewhere expenseDescription like 'PP%'GROUP BY accountCode, expenseDescription, invoiceDate, invoiceNumber, ledgerCode, openItemNumber, programCode, transactionAmount, vendorName, warrantNumber, lineNumber, collocationCodeID HAVING count(*) &amp;gt; 1)AS c ON c.accountCode = pb.accountCodeand c.expenseDescription = pb.expenseDescriptionand c.invoiceDate = pb.invoiceDateand c.invoiceNumber = pb.invoiceNumberand c.ledgerCode = pb.ledgerCodeand c.openItemNumber = pb.openItemNumberand c.programCode = pb.programCodeand c.transactionAmount = pb.transactionAmountand c.vendorName = pb.vendorNameand c.warrantNumber = pb.warrantNumberand c.lineNumber = pb.lineNumberand c.collocationCodeID = pb.collocationCodeIDSET ROWCOUNT 0SELECT * FROM ExpenseDROP TABLE ExpenseSo, my questions are... Will this sql actually delete what seems like temp records?  With this size table, should i expect this sql to execute for hours???  I would appreciate any insight, this is my first time performing a task like this.Thank you,Lisa</description><pubDate>Tue, 14 Apr 2009 14:57:35 GMT</pubDate><dc:creator>lisa.siverly</dc:creator></item><item><title>RE: Remove Duplicate Records</title><link>http://www.sqlservercentral.com/Forums/Topic491052-1258-1.aspx</link><description>Delete should be based on business logics, as phone book example if you create unique index on phone number column then SQL Server will not allow duplicate phone number. This mean your phone book can not insert husband and wife who shares the same home phone number.CheckSum and Binary_CheckSum both are not reliable, they were orignally designed to check message integrity when sending secure messages, both parties can detect if message was altered. If you are on SQL Server 2005 then use HashBytes function.check sum and binary check sum can take the whole row and make things easy as exampleSelect Binary_CheckSum(*) from table_namethe hashbytes is limited to varchar,nvarchar and varbinary as input. You have to convert your columns to one of the supported data types. select HashBytes('SHA1', CONVERT(varchar,column_name)) from table_nameuse it with caution even a minor change like varchar to nvarchar will change the hashbytes value.select HashBytes('SHA1', CONVERT(nvarchar,column_name)) from table_nameHashByteshttp://msdn.microsoft.com/en-us/library/ms174415.aspxBinary_CheckSumhttp://msdn.microsoft.com/en-us/library/ms173784.aspxDuplicate rows are common in systems where you are importing data from other systems. There are few methods to delete duplicate rows in SQL server table. [url=http://www.sqldba.org/articles/34-find-duplicate-records-to-delete-or-update-in-sql-server.aspx]http://www.sqldba.org/articles/34-find-duplicate-records-to-delete-or-update-in-sql-server.aspx[/url]I hope it helps.</description><pubDate>Tue, 27 May 2008 19:08:03 GMT</pubDate><dc:creator>Syed-201559</dc:creator></item><item><title>RE: Remove Duplicate Records</title><link>http://www.sqlservercentral.com/Forums/Topic491052-1258-1.aspx</link><description>no problem, youve illustrated a fair point. Id like to see the link that adds some context to your example</description><pubDate>Tue, 27 May 2008 12:27:13 GMT</pubDate><dc:creator>colin Robinson-345240</dc:creator></item><item><title>RE: Remove Duplicate Records</title><link>http://www.sqlservercentral.com/Forums/Topic491052-1258-1.aspx</link><description>I would avoid using Checksum or binary or otherwise for uniqueness.  It is not guaranteed to be unique as shown by this example I took from somewhere out on the internet (apologies to the author who I did not note).[code]select binary_checksum('where myval in (7004054,7004055)') a, binary_checksum('where myval in (7003888,7003889)') b [/code]this gives the same result - 432179860	432179860  for both valuesToni</description><pubDate>Tue, 27 May 2008 10:50:17 GMT</pubDate><dc:creator>toniupstny</dc:creator></item><item><title>RE: Remove Duplicate Records</title><link>http://www.sqlservercentral.com/Forums/Topic491052-1258-1.aspx</link><description>Surely adding a constraint when designing the table is the best option.Stop duplicates at source?</description><pubDate>Tue, 27 May 2008 05:41:26 GMT</pubDate><dc:creator>Craig Sunderland</dc:creator></item><item><title>RE: Remove Duplicate Records</title><link>http://www.sqlservercentral.com/Forums/Topic491052-1258-1.aspx</link><description>clever but on large tables probably inparactical due to lock escalation and the time taken to do disk based delete and reindex operations.alternatives could be a Unique indexed view  based around select Distinct * from PhoneBook.or a select distinct into Phonebook2 combined with  rename Table operations.You could also stop the duplicates getting in there in the first place with an insert trigger. You can use Binary_checksum(*) function on the inserted table to check aginst the Binary_Checksum(*) of existing Rows</description><pubDate>Mon, 26 May 2008 06:47:58 GMT</pubDate><dc:creator>colin Robinson-345240</dc:creator></item><item><title>Remove Duplicate Records</title><link>http://www.sqlservercentral.com/Forums/Topic491052-1258-1.aspx</link><description>Comments posted to this topic are about the item [B]&lt;A HREF="/scripts/TSQL/62866/"&gt;Remove Duplicate Records&lt;/A&gt;[/B]</description><pubDate>Sat, 26 Apr 2008 07:08:12 GMT</pubDate><dc:creator>Syed-201559</dc:creator></item></channel></rss>