|
|
|
Grasshopper
      
Group: General Forum Members
Last Login: Wednesday, May 23, 2012 3:39 AM
Points: 19,
Visits: 117
|
|
|
|
|
|
Mr or Mrs. 500
      
Group: General Forum Members
Last Login: Wednesday, June 13, 2012 3:26 PM
Points: 566,
Visits: 341
|
|
Nice article, thanks for sharing. The same thing can be achieved with just the row_number function as well.
select AlphaKey from ( select row_number() over(partition by AlphaKey order by AlphaKey) rownum, AlphaKey from @AlphaList ) al where rownum = 1 order by 1;
|
|
|
|
|
SSCommitted
      
Group: General Forum Members
Last Login: Friday, May 17, 2013 2:53 AM
Points: 1,528,
Visits: 359
|
|
yep, i too use this query to remove duplicates...
was wondering, which query is faster, this one or the one mentioned in the article...
scottm30 (7/26/2010)
Nice article, thanks for sharing. The same thing can be achieved with just the row_number function as well. select AlphaKey from ( select row_number() over(partition by AlphaKey order by AlphaKey) rownum, AlphaKey from @AlphaList ) al where rownum = 1 order by 1;
|
|
|
|
|
Forum Newbie
      
Group: General Forum Members
Last Login: Monday, March 18, 2013 3:17 AM
Points: 5,
Visits: 92
|
|
| with the A,B,C,D,E example, the select distinct syntax would be better imho.
|
|
|
|
|
Valued Member
      
Group: General Forum Members
Last Login: Thursday, June 07, 2012 3:39 AM
Points: 57,
Visits: 62
|
|
How about good ol' grouping to remove duplicates and supply any of the minimum, maximum, sum, count, average, etc. etc?
select ItemNumber, min(UnitCost) as MinCost from MissingItems group by ItemNumber;
ItemNumber MinCost 777 10.10 888 13.13 999 16.16
|
|
|
|
|
Right there with Babe
      
Group: General Forum Members
Last Login: Tuesday, May 14, 2013 1:05 AM
Points: 752,
Visits: 713
|
|
I was interested to see what the difference in performance between these queries as I would always have used Row_Number with a partition before.
On the Northwind database (SQL 2005) I ran the following 2 queries
select a.ProductID ,a.UnitPrice from (select ProductID ,UnitPrice ,Rank() over (order by ProductID) as Ranking ,Row_Number() over (order by ProductID,UnitPrice) as RowNumber from [Order Details] ) as a where a.Ranking=a.RowNumber order by a.ProductID
select a.ProductID ,a.UnitPrice from (select ProductID ,UnitPrice ,Row_Number() over (partition by ProductID order by UnitPrice) as RowNumber from [Order Details] ) as a where a.RowNumber=1 order by a.ProductID
In the Actual Execution Plan, the first query showed up as 64% and the second as 36%.
Far from exhaustive, but I'll probably stick to Row_Number with Partition by.
In (partial) answer to Phil Wood, if all I wanted was the lowest price for each product then I would use grouping, I'd probably use this method if I wanted to find the customer or order date as well, changing the code to
select a.ProductID ,a.UnitPrice ,b.CustomerID ,b.OrderDate from (select ProductID ,UnitPrice ,OrderID ,Row_Number() over (partition by ProductID order by UnitPrice) as RowNumber from [Order Details] ) as a inner join Orders as b on a.OrderID=b.OrderID where a.RowNumber=1 order by a.ProductID
|
|
|
|
|
Forum Newbie
      
Group: General Forum Members
Last Login: Tuesday, July 27, 2010 4:58 AM
Points: 2,
Visits: 7
|
|
I agree. The simplest way to deduplicate is using grouping and aggregate functions as needed. regards Áron
|
|
|
|
|
SSC-Enthusiastic
      
Group: General Forum Members
Last Login: Wednesday, May 25, 2011 2:41 AM
Points: 145,
Visits: 99
|
|
I had to remove unitprice from the orderby to get deduplicated results.
declare @TempTable table (itemnumber int, unitprice decimal(18, 2));
insert into @TempTable values (777, 10.10), (777, 11.11), (777, 12.12), (888, 13.13), (888, 14.14), (888, 15.15), (999, 16.16), (999, 17.17), (999, 18.18);
with GetMissingItems(Rank, RowNumber, ItemNumber, UnitCost) as ( select RANK() over (order by ItemNumber) as Rank, ROW_NUMBER() over (order by ItemNumber) as RowNumber, itemnumber, unitprice from @TempTable ) select * from GetMissingItems where Rank = RowNumber;
Bhavesh .NET and SQL Server Blog
|
|
|
|
|
Forum Newbie
      
Group: General Forum Members
Last Login: Wednesday, July 28, 2010 12:19 PM
Points: 1,
Visits: 4
|
|
| In the example you provided you used one column. Say I have 4 columns and I want to deduplicate based on those. In other words I want to rank against those columns -- how would the SQL statements change?
|
|
|
|
|
Right there with Babe
      
Group: General Forum Members
Last Login: Tuesday, May 14, 2013 1:05 AM
Points: 752,
Visits: 713
|
|
messineo,
If you have multiple columns that you want to deduplicate on you will need to include all of them in whichever method you use.
Here is some sample code using each method that should work (all based on the Northwind database). Note that in all these cases the data will be de-duplicated but which row is selected for each set of duplicates can not be predicted (so if you want the first order for each set you will need different code).
select a.ProductID ,a.UnitPrice ,a.Quantity ,a.Discount from (select ProductID ,UnitPrice ,Quantity ,Discount ,Rank() over (order by ProductID,UnitPrice,Quantity,Discount) as Ranking ,Row_Number() over (order by ProductID,UnitPrice,Quantity,Discount) as RowNumber from [Order Details] ) as a where a.Ranking=a.RowNumber
select a.ProductID ,a.UnitPrice ,a.Quantity ,a.Discount from (select ProductID ,UnitPrice ,Quantity ,Discount ,Row_Number() over (partition by ProductID,UnitPrice,Quantity,Discount order by ProductID) as RowNumber from [Order Details] ) as a where a.RowNumber=1
select ProductID ,UnitPrice ,Quantity ,Discount from [Order Details] group by ProductID ,UnitPrice ,Quantity ,Discount
|
|
|
|