Recent PostsRecent Posts Popular TopicsPopular Topics
 Home Search Members Calendar Who's On

 Using Ranking Functions to Deduplicate Data Rate Topic Display Mode Topic Options
Author
 Message
 Posted Monday, July 26, 2010 8:57 PM
 Grasshopper Group: General Forum Members Last Login: Wednesday, May 23, 2012 3:39 AM Points: 19, Visits: 117
 Comments posted to this topic are about the item Using Ranking Functions to Deduplicate Data
Post #959167
 Posted Monday, July 26, 2010 11:19 PM
 SSChasing Mays Group: General Forum Members Last Login: Tuesday, December 03, 2013 4:40 PM Points: 632, Visits: 375
 Nice article, thanks for sharing. The same thing can be achieved with just the row_number function as well.select AlphaKeyfrom ( select row_number() over(partition by AlphaKey order by AlphaKey) rownum, AlphaKey from @AlphaList ) alwhere rownum = 1 order by 1;
Post #959186
 Posted Tuesday, July 27, 2010 12:50 AM
 SSCommitted Group: General Forum Members Last Login: Wednesday, November 06, 2013 2:53 AM Points: 1,711, Visits: 367
 yep, i too use this query to remove duplicates...was wondering, which query is faster, this one or the one mentioned in the article...scottm30 (7/26/2010)Nice article, thanks for sharing. The same thing can be achieved with just the row_number function as well.select AlphaKeyfrom ( select row_number() over(partition by AlphaKey order by AlphaKey) rownum, AlphaKey from @AlphaList ) alwhere rownum = 1 order by 1;
Post #959217
 Posted Tuesday, July 27, 2010 1:35 AM
 Forum Newbie Group: General Forum Members Last Login: Monday, October 21, 2013 1:38 AM Points: 5, Visits: 99
 with the A,B,C,D,E example, the select distinct syntax would be better imho.
Post #959229
 Posted Tuesday, July 27, 2010 1:55 AM
 Valued Member Group: General Forum Members Last Login: Thursday, December 05, 2013 7:19 AM Points: 69, Visits: 65
 How about good ol' grouping to remove duplicates and supply any of the minimum, maximum, sum, count, average, etc. etc?select ItemNumber, min(UnitCost) as MinCost from MissingItemsgroup by ItemNumber;ItemNumber MinCost777 10.10888 13.13999 16.16
Post #959234
 Posted Tuesday, July 27, 2010 2:31 AM
 SSC Eights! Group: General Forum Members Last Login: Friday, December 06, 2013 8:57 AM Points: 878, Visits: 822
 I was interested to see what the difference in performance between these queries as I would always have used Row_Number with a partition before.On the Northwind database (SQL 2005) I ran the following 2 queriesselect a.ProductID ,a.UnitPricefrom (select ProductID ,UnitPrice ,Rank() over (order by ProductID) as Ranking ,Row_Number() over (order by ProductID,UnitPrice) as RowNumber from [Order Details] ) as awhere a.Ranking=a.RowNumberorder by a.ProductIDselect a.ProductID ,a.UnitPricefrom (select ProductID ,UnitPrice ,Row_Number() over (partition by ProductID order by UnitPrice) as RowNumber from [Order Details] ) as awhere a.RowNumber=1order by a.ProductIDIn the Actual Execution Plan, the first query showed up as 64% and the second as 36%.Far from exhaustive, but I'll probably stick to Row_Number with Partition by.In (partial) answer to Phil Wood, if all I wanted was the lowest price for each product then I would use grouping, I'd probably use this method if I wanted to find the customer or order date as well, changing the code toselect a.ProductID ,a.UnitPrice ,b.CustomerID ,b.OrderDatefrom (select ProductID ,UnitPrice ,OrderID ,Row_Number() over (partition by ProductID order by UnitPrice) as RowNumber from [Order Details] ) as a inner join Orders as b on a.OrderID=b.OrderIDwhere a.RowNumber=1order by a.ProductID
Post #959248
 Posted Tuesday, July 27, 2010 2:54 AM
 Forum Newbie Group: General Forum Members Last Login: Tuesday, July 27, 2010 4:58 AM Points: 2, Visits: 7
 I agree. The simplest way to deduplicate is using grouping and aggregate functions as needed.regardsÁron
Post #959255
 Posted Tuesday, July 27, 2010 4:05 AM
 SSC-Enthusiastic Group: General Forum Members Last Login: Tuesday, October 15, 2013 5:16 AM Points: 147, Visits: 104
 I had to remove unitprice from the orderby to get deduplicated results.declare @TempTable table (itemnumber int, unitprice decimal(18, 2));insert into @TempTable values(777, 10.10), (777, 11.11),(777, 12.12),(888, 13.13),(888, 14.14),(888, 15.15),(999, 16.16),(999, 17.17),(999, 18.18);with GetMissingItems(Rank, RowNumber, ItemNumber, UnitCost) as( select RANK() over (order by ItemNumber) as Rank, ROW_NUMBER() over (order by ItemNumber) as RowNumber, itemnumber, unitprice from @TempTable ) select * from GetMissingItemswhere Rank = RowNumber; Bhavesh.NET and SQL Server Blog
Post #959283
 Posted Tuesday, July 27, 2010 4:32 AM
 Forum Newbie Group: General Forum Members Last Login: Wednesday, July 28, 2010 12:19 PM Points: 1, Visits: 4
 In the example you provided you used one column. Say I have 4 columns and I want to deduplicate based on those. In other words I want to rank against those columns -- how would the SQL statements change?
Post #959294
 Posted Tuesday, July 27, 2010 4:46 AM
 SSC Eights! Group: General Forum Members Last Login: Friday, December 06, 2013 8:57 AM Points: 878, Visits: 822
 messineo,If you have multiple columns that you want to deduplicate on you will need to include all of them in whichever method you use.Here is some sample code using each method that should work (all based on the Northwind database). Note that in all these cases the data will be de-duplicated but which row is selected for each set of duplicates can not be predicted (so if you want the first order for each set you will need different code).select a.ProductID ,a.UnitPrice ,a.Quantity ,a.Discountfrom (select ProductID ,UnitPrice ,Quantity ,Discount ,Rank() over (order by ProductID,UnitPrice,Quantity,Discount) as Ranking ,Row_Number() over (order by ProductID,UnitPrice,Quantity,Discount) as RowNumber from [Order Details] ) as awhere a.Ranking=a.RowNumberselect a.ProductID ,a.UnitPrice ,a.Quantity ,a.Discountfrom (select ProductID ,UnitPrice ,Quantity ,Discount ,Row_Number() over (partition by ProductID,UnitPrice,Quantity,Discount order by ProductID) as RowNumber from [Order Details] ) as awhere a.RowNumber=1select ProductID ,UnitPrice ,Quantity ,Discountfrom [Order Details]group by ProductID ,UnitPrice ,Quantity ,Discount
Post #959300

 Permissions