Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

Performance Comparison of Aggregate Function VS Alternate methods in SQL server 2008 R2 Expand / Collapse
Author
Message
Posted Wednesday, May 16, 2012 1:56 PM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Sunday, June 23, 2013 10:39 PM
Points: 231, Visits: 506
I will show the performance of the analytical function vs alternate methods.From my observation and testing
the ranking functions performs better than alternate methods but aggregate function performs very badly against alternate methods. Thus before using new features just test them against your best methods.

I also hope that Microsoft people will look into ways to improve the performance of aggregate analytical functions in future release.

/*
Microsoft introduced the analytical functions and people got excited to use these features.One things is that using these methods querying becomes simpler,smaller and more elegant.
What about perforance? Are these analytical functions performs better than alternate methods used to before analytical functions were intrduced?

I have tested few scenarios and come to conclsuion that the ranking function performs quite good in most of cases as compared to alternate methods.
However, aggregate functions performs very badly as compared to alternate methods. However, max and min function can be written using ranking function and thus could be made to performa better than alternate methods.
In aggregate functions table spool operator is used which is causing large number of IO's. I hope Microsoft people will fix this perf issue in future version of SQl server

Thus before using these new analytical functions make sure that you benchmark the performance of new methods against your alternate methods.

In next few post I will show the analytical functions and their performance.

I will use adventureworks database.

*/
--Let us check how row number is performing..
--Scanario : Find the top 2 orders (by total due) details for each customer. Sounds easy.But Implementation is quite difficult without analytical functions.Let us see..
set nocount on
go
Use AdventureWorks
go
set statistics io,time on
go

--Alternate Method using CTE. Looks very complicated even with CTE
with
Top1Order
as
(
select CAST(substring(tdsid,14,12) as float) as MaxSalesOrderId
from
(
select CustomerID,MAX(replicate('0',12-len(totaldue)) + totaldue + ':' + replicate('0',12-len(salesorderid)) + salesorderid) tdsid
from
(
select CustomerID,CAST(Totaldue as varchar) as totaldue,CAST(SalesorderId as varchar) salesorderid
from Sales.SalesOrderHeader
) dta
group by CustomerID
) dtb
)
,Top2Order
as
(
select MaxSalesOrderId,1 as OrderRank from Top1Order
union all
select CAST(substring(tdsid,14,12) as float) as MaxSalesOrderId,2 as OrderRank
from
(
select CustomerID,MAX(replicate('0',12-len(totaldue)) + totaldue + ':' + replicate('0',12-len(salesorderid)) + salesorderid) tdsid
from
(
select CustomerID,CAST(Totaldue as varchar) as totaldue,CAST(SalesorderId as varchar) salesorderid
from Sales.SalesOrderHeader soh left merge join Top1Order t1o on soh.SalesOrderID = t1o.MaxSalesOrderId
where t1o.MaxSalesOrderId is null
) dta
group by CustomerID
) dtb
)
select * from Top2order t2o inner join Sales.SalesOrderHeader soh
on t2o.MaxsalesOrderId = soh.SalesOrderID
/*
Logical IO 3100
Elapsed Time 1512 ms
*/

--Without CTE and using the temp table
drop table #Top1Order
drop table #top2Order
go
select CAST(substring(tdsid,14,12) as float) as MaxSalesOrderId,1 as OrderRank
into #Top1Order
from
(
select CustomerID,MAX(replicate('0',12-len(totaldue)) + totaldue + ':' + replicate('0',12-len(salesorderid)) + salesorderid) tdsid
from
(
select CustomerID,CAST(Totaldue as varchar) as totaldue,CAST(SalesorderId as varchar) salesorderid
from Sales.SalesOrderHeader
) dta
group by CustomerID
) dtb



select CAST(substring(tdsid,14,12) as float) as MaxSalesOrderId,2 as OrderRank
into #Top2Order
from
(
select CustomerID,MAX(replicate('0',12-len(totaldue)) + totaldue + ':' + replicate('0',12-len(salesorderid)) + salesorderid) tdsid
from
(
select CustomerID,CAST(Totaldue as varchar) as totaldue,CAST(SalesorderId as varchar) salesorderid
from Sales.SalesOrderHeader soh left merge join #Top1Order t1o on soh.SalesOrderID = t1o.MaxSalesOrderId
where t1o.MaxSalesOrderId is null
) dta
group by CustomerID
) dtb

select * from (select * from #top1order union all select * from #Top2Order) t2o inner join Sales.SalesOrderHeader soh
on t2o.MaxsalesOrderId = soh.SalesOrderID
/*
Logical IO 2310
Elapsed Time 984 ms
*/

--You can have a cursor method as well
--You can have the alternate method without CTE and without temp table.The sql query will not be understdanble and too complex


--Analytical Methods
with OrderRank
as
(
select *,ROW_NUMBER() over(partition by customerid order by totaldue desc) as OrderRanking from Sales.SalesOrderHeader
)
select CustomerID custid,OrderRanking,* from OrderRank where OrderRanking <= 2
order by CustomerID
--Just a simple select statement now look at the performance.
/*
Logical IO 706 --it does just a table scan..
Elapsed Time 968 ms
*/

--Thus anlytical function is very simple,elegant and need less resources.Also, it put less pressure on your brain as it is easy to code
--no need to think how to achieve this.

---Next I will use a aggregate function performance..
--Scenario: Find all the orders for the customers who has at least one order of value (total due) more than 100 K.

--alternate method
with CustomersWithOneOrderMoreThan100K
as
(
select customerid,MAX(TotalDue) MaxOrderValue
from Sales.SalesOrderHeader
group by CustomerID
having MAX(TotalDue) > 100000
)
select * from Sales.SalesOrderHeader soh inner join CustomersWithOneOrderMoreThan100K cwoo
on cwoo.CustomerID = soh.CustomerID


/*
Logical IO 1412 -- Just two clustered index scans
Elapsed Time 270 ms
*/

--analytical method
with mycte
as
(
select *,MAX(TotalDue) over(partition by CustomerID) as MaxOrdervalue
from Sales.SalesOrderHeader
)
select * from mycte where MaxOrdervalue > 100000
--looks elegant and simpel.But what about performance?
/*
Logical IO 139813 -- Uses the table spool which is causing too much IO's
Elapsed Time 511 ms
*/

-- I could use row_number function here as well to achieve the max kind of effect but it will be similar to the alternate method but better in terms of performance
with mycte
as
(
select *,row_number() over(partition by CustomerID order by TotalDue desc) as MaxOrdervalue
from Sales.SalesOrderHeader
)
select * from mycte where MaxOrdervalue = 1 and TotalDue > 100000
/*
Logical IO 706 -- Uses one table scan
Elapsed Time 244 ms
*/

--Thus rather than writing the max and min analytical function use the row_number function to achieve the same result but with dratsic improvement in performance.
--However, sum can not be written by using ranking analytcial function thus it will have serious performance issue.
--Thus before using the Anytical Aggregate function like sum and avg please benchmark performance against your existing method or use alternate methods
/*
In conclusion: New features are good. But please test thier performance against alternate methods. I hope the microsoft
people will not just implement the new features but will provide the best plan an dperformance than
existing alternate methods.
*/




GulliMeel

Finding top n Worst Performing queries
Improve the performance of Merge Join(special case)
How to Post Performance Problem -Gail Shaw
Post #1301337
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse