peterswe (10/8/2013)
Since I have worked with Oracle I have removed duplicates using the rowid column. SQL Server has something similar but its an undocumented feature (might be changed without notice).delete
from table t1
where exists
(select *
from table t2
where t1.column1 = t2.column1
and t1.column2 = t2.column2
...
and t1.rowid > t2. rowid) -- in Oracle
and t1.%%physloc%% > %%physloc%%) -- in SQL Server
This self join can be useful, you dont need to compare all rows for example and it removed more than one row in case of tripple duplicates.
http://www.sqlskills.com/blogs/paul/sql-server-2008-new-undocumented-physical-row-locator-function/
Not so sure how happy I would be using this - but then as Paul says all the cool stuff is undocumented.
Great question, I really like how you have included the ability to retain the latest version.. I usually use a more simple way to remove duplicate rows..
Using the same table...
Declare @FName varchar(30), -- values to look for
@LName varchar (30),-- values to look for
@cnt int -- count
Declare getallrecords cursor local static For
Select count (1), Fname, Lname
from DuplicateRow (nolock)
group by FName, LName having count(1)>1
Open getallrecords
Fetch next from getallrecords into @cnt,@FName,@LName
--Cursor to check with all other records
While @@fetch_status=0
Begin
Set @cnt= @cnt-1
Set rowcount @cnt
-- Deleting the duplicate records. Observe that all fields are mentioned at the where condition
Delete from DuplicateRow where Fname=@FName and LName=@LName
Set rowcount 0
Fetch next from getallrecords into @cnt,@FName,@LName
End
Close getallrecords
Deallocate getallrecords
Obviously depending on how many rows you need to compare to ensure your only removing duplicates would depend on what you declare and select.
Hope this helps...
Ford Fairlane
Rock and Roll Detective