• peterswe (10/8/2013)


    Since I have worked with Oracle I have removed duplicates using the rowid column. SQL Server has something similar but its an undocumented feature (might be changed without notice).

    delete

    from table t1

    where exists

    (select *

    from table t2

    where t1.column1 = t2.column1

    and t1.column2 = t2.column2

    ...

    and t1.rowid > t2. rowid) -- in Oracle

    and t1.%%physloc%% > %%physloc%%) -- in SQL Server

    This self join can be useful, you dont need to compare all rows for example and it removed more than one row in case of tripple duplicates.

    http://www.sqlskills.com/blogs/paul/sql-server-2008-new-undocumented-physical-row-locator-function/

    Not so sure how happy I would be using this - but then as Paul says all the cool stuff is undocumented.

    Great question, I really like how you have included the ability to retain the latest version.. I usually use a more simple way to remove duplicate rows..

    Using the same table...

    Declare @FName varchar(30), -- values to look for

    @LName varchar (30),-- values to look for

    @cnt int -- count

    Declare getallrecords cursor local static For

    Select count (1), Fname, Lname

    from DuplicateRow (nolock)

    group by FName, LName having count(1)>1

    Open getallrecords

    Fetch next from getallrecords into @cnt,@FName,@LName

    --Cursor to check with all other records

    While @@fetch_status=0

    Begin

    Set @cnt= @cnt-1

    Set rowcount @cnt

    -- Deleting the duplicate records. Observe that all fields are mentioned at the where condition

    Delete from DuplicateRow where Fname=@FName and LName=@LName

    Set rowcount 0

    Fetch next from getallrecords into @cnt,@FName,@LName

    End

    Close getallrecords

    Deallocate getallrecords

    Obviously depending on how many rows you need to compare to ensure your only removing duplicates would depend on what you declare and select.

    Hope this helps...

    Ford Fairlane
    Rock and Roll Detective