RE: Removing Duplicates – SQLServerCentral

SSCertifiable

Points: 7664

October 8, 2013 at 1:56 am

peterswe (10/8/2013)
Since I have worked with Oracle I have removed duplicates using the rowid column. SQL Server has something similar but its an undocumented feature (might be changed without notice).
delete
from table t1
where exists
(select *
from table t2
where t1.column1 = t2.column1
and t1.column2 = t2.column2
...
and t1.rowid > t2. rowid) -- in Oracle
and t1.%%physloc%% > %%physloc%%) -- in SQL Server
This self join can be useful, you dont need to compare all rows for example and it removed more than one row in case of tripple duplicates.

http://www.sqlskills.com/blogs/paul/sql-server-2008-new-undocumented-physical-row-locator-function/

Not so sure how happy I would be using this - but then as Paul says all the cool stuff is undocumented.

Great question, I really like how you have included the ability to retain the latest version.. I usually use a more simple way to remove duplicate rows..

Using the same table...

Declare @FName varchar(30), -- values to look for

@LName varchar (30),-- values to look for

@cnt int -- count

Declare getallrecords cursor local static For

Select count (1), Fname, Lname

from DuplicateRow (nolock)

group by FName, LName having count(1)>1

Open getallrecords

Fetch next from getallrecords into @cnt,@FName,@LName

--Cursor to check with all other records

While @@fetch_status=0

Begin

Set @cnt= @cnt-1

Set rowcount @cnt

-- Deleting the duplicate records. Observe that all fields are mentioned at the where condition

Delete from DuplicateRow where Fname=@FName and LName=@LName

Set rowcount 0

Fetch next from getallrecords into @cnt,@FName,@LName

End

Close getallrecords

Deallocate getallrecords

Obviously depending on how many rows you need to compare to ensure your only removing duplicates would depend on what you declare and select.

Hope this helps...

Ford Fairlane
Rock and Roll Detective