﻿<?xml version='1.0' encoding='UTF-8'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/"><channel><title>SQLServerCentral / Article Discussions / Article Discussions by Author / Discuss content posted by Bernabe Diaz  / Performance Comparison of Aggregate Function VS Alternate methods in SQL server 2008 R2 / Latest Posts</title><generator>InstantForum.NET v2.9.0</generator><description>SQLServerCentral</description><link>http://www.sqlservercentral.com/Forums/</link><webMaster>notifications@sqlservercentral.com</webMaster><lastBuildDate>Sat, 18 May 2013 08:00:58 GMT</lastBuildDate><ttl>20</ttl><item><title>Performance Comparison of Aggregate Function VS Alternate methods in SQL server 2008 R2</title><link>http://www.sqlservercentral.com/Forums/Topic1301337-1509-1.aspx</link><description>I will show the performance of the analytical function vs alternate methods.From my observation and testingthe ranking functions performs better than alternate methods but aggregate function performs very badly against alternate methods. Thus before using new features just test them against your best methods.I also hope that Microsoft people will look into ways to improve the performance of aggregate analytical functions in future release.[code="sql"]/*Microsoft introduced the analytical functions and people got excited to use these features.One things is that using these methods querying becomes simpler,smaller and more elegant.What about perforance? Are these analytical functions performs better than alternate methods used to before analytical functions were intrduced?I have tested few scenarios and come to conclsuion that the ranking function performs quite good in most of cases as compared to alternate methods.However, aggregate functions performs very badly as compared to alternate methods. However, max and min function can be written using ranking function and thus could be made to performa better than alternate methods.In aggregate functions table spool operator is used which is causing large number of IO's. I hope Microsoft people will fix this perf issue in future version of SQl serverThus before using these new analytical functions make sure that you benchmark the performance of new methods against your alternate methods. In next few post I will show the analytical functions and their performance.I will use adventureworks database.*/--Let us check how row number is performing..--Scanario : Find the top 2 orders (by total due) details for each customer. Sounds easy.But Implementation is quite difficult without analytical functions.Let us see..set nocount ongoUse AdventureWorksgoset statistics io,time ongo--Alternate Method using CTE. Looks very complicated even with CTEwithTop1Orderas(select CAST(substring(tdsid,14,12) as float) as MaxSalesOrderIdfrom (select CustomerID,MAX(replicate('0',12-len(totaldue)) + totaldue + ':' + replicate('0',12-len(salesorderid)) + salesorderid) tdsidfrom(select CustomerID,CAST(Totaldue as varchar) as totaldue,CAST(SalesorderId as varchar) salesorderidfrom Sales.SalesOrderHeader) dtagroup by CustomerID) dtb),Top2Orderas(select MaxSalesOrderId,1 as OrderRank from Top1Orderunion allselect CAST(substring(tdsid,14,12) as float) as MaxSalesOrderId,2 as OrderRankfrom (select CustomerID,MAX(replicate('0',12-len(totaldue)) + totaldue + ':' + replicate('0',12-len(salesorderid)) + salesorderid) tdsidfrom(select CustomerID,CAST(Totaldue as varchar) as totaldue,CAST(SalesorderId as varchar) salesorderidfrom Sales.SalesOrderHeader soh left merge join Top1Order t1o on soh.SalesOrderID = t1o.MaxSalesOrderIdwhere t1o.MaxSalesOrderId is null) dtagroup by CustomerID) dtb)select * from Top2order t2o inner join Sales.SalesOrderHeader soh on t2o.MaxsalesOrderId = soh.SalesOrderID/*Logical IO 3100Elapsed Time 1512 ms*/--Without CTE and using the temp tabledrop table #Top1Orderdrop table #top2Ordergoselect CAST(substring(tdsid,14,12) as float) as MaxSalesOrderId,1 as OrderRankinto #Top1Orderfrom (select CustomerID,MAX(replicate('0',12-len(totaldue)) + totaldue + ':' + replicate('0',12-len(salesorderid)) + salesorderid) tdsidfrom(select CustomerID,CAST(Totaldue as varchar) as totaldue,CAST(SalesorderId as varchar) salesorderidfrom Sales.SalesOrderHeader) dtagroup by CustomerID) dtbselect CAST(substring(tdsid,14,12) as float) as MaxSalesOrderId,2 as OrderRankinto #Top2Orderfrom (select CustomerID,MAX(replicate('0',12-len(totaldue)) + totaldue + ':' + replicate('0',12-len(salesorderid)) + salesorderid) tdsidfrom(select CustomerID,CAST(Totaldue as varchar) as totaldue,CAST(SalesorderId as varchar) salesorderidfrom Sales.SalesOrderHeader soh left merge join #Top1Order t1o on soh.SalesOrderID = t1o.MaxSalesOrderIdwhere t1o.MaxSalesOrderId is null) dtagroup by CustomerID) dtbselect * from (select * from #top1order union all select * from #Top2Order) t2o inner join Sales.SalesOrderHeader soh on t2o.MaxsalesOrderId = soh.SalesOrderID/*Logical IO 2310Elapsed Time 984 ms*/--You can have a cursor method as well--You can have the alternate method without CTE and without temp table.The sql query will not be understdanble and too complex--Analytical Methods with OrderRankas(select *,ROW_NUMBER() over(partition by customerid order by totaldue desc) as OrderRanking from Sales.SalesOrderHeader )select CustomerID custid,OrderRanking,* from OrderRank where OrderRanking &amp;lt;= 2order by CustomerID--Just a simple select statement now look at the performance./*Logical IO 706 --it does just a table scan..Elapsed Time 968 ms*/--Thus anlytical function is very simple,elegant and need less resources.Also, it put less pressure on your brain as it is easy to code--no need to think how to achieve this.---Next I will use a aggregate function performance..--Scenario: Find all the orders for the customers who has at least one order of value (total due) more than 100 K.--alternate methodwith CustomersWithOneOrderMoreThan100Kas(select customerid,MAX(TotalDue) MaxOrderValuefrom Sales.SalesOrderHeadergroup by CustomerIDhaving MAX(TotalDue) &amp;gt; 100000)select * from Sales.SalesOrderHeader soh inner join CustomersWithOneOrderMoreThan100K cwooon cwoo.CustomerID = soh.CustomerID/*Logical IO 1412 -- Just two clustered index scansElapsed Time 270 ms*/--analytical methodwith mycteas(select *,MAX(TotalDue) over(partition by CustomerID) as MaxOrdervaluefrom Sales.SalesOrderHeader)select * from mycte where MaxOrdervalue &amp;gt; 100000--looks elegant and simpel.But what about performance?/*Logical IO 139813 -- Uses the table spool which is causing too much IO'sElapsed Time 511 ms*/-- I could use row_number function here as well to achieve the max kind of effect but it will be similar to the alternate method but better in terms of performancewith mycteas(select *,row_number() over(partition by CustomerID order by TotalDue desc) as MaxOrdervaluefrom Sales.SalesOrderHeader)select * from mycte where MaxOrdervalue = 1 and TotalDue &amp;gt; 100000/*Logical IO 706 -- Uses one table scanElapsed Time 244 ms*/--Thus rather than writing the max and min analytical function use the row_number function to achieve the same result but with dratsic improvement in performance.--However, sum can not be written by using ranking analytcial function thus it will have serious performance issue.--Thus before using the Anytical Aggregate function like sum and avg please benchmark performance against your existing method or use alternate methods/*In conclusion: New features are good. But please test thier performance against alternate methods. I hope the microsoftpeople will not just implement the new features but will provide the best  plan an dperformance than existing alternate methods.*/[/code]</description><pubDate>Wed, 16 May 2012 13:56:35 GMT</pubDate><dc:creator>Gullimeel</dc:creator></item></channel></rss>