Performance Comparison of Aggregate Function VS Alternate methods in SQL server 2008 R2

  • I will show the performance of the analytical function vs alternate methods.From my observation and testing

    the ranking functions performs better than alternate methods but aggregate function performs very badly against alternate methods. Thus before using new features just test them against your best methods.

    I also hope that Microsoft people will look into ways to improve the performance of aggregate analytical functions in future release.

    /*

    Microsoft introduced the analytical functions and people got excited to use these features.One things is that using these methods querying becomes simpler,smaller and more elegant.

    What about perforance? Are these analytical functions performs better than alternate methods used to before analytical functions were intrduced?

    I have tested few scenarios and come to conclsuion that the ranking function performs quite good in most of cases as compared to alternate methods.

    However, aggregate functions performs very badly as compared to alternate methods. However, max and min function can be written using ranking function and thus could be made to performa better than alternate methods.

    In aggregate functions table spool operator is used which is causing large number of IO's. I hope Microsoft people will fix this perf issue in future version of SQl server

    Thus before using these new analytical functions make sure that you benchmark the performance of new methods against your alternate methods.

    In next few post I will show the analytical functions and their performance.

    I will use adventureworks database.

    */

    --Let us check how row number is performing..

    --Scanario : Find the top 2 orders (by total due) details for each customer. Sounds easy.But Implementation is quite difficult without analytical functions.Let us see..

    set nocount on

    go

    Use AdventureWorks

    go

    set statistics io,time on

    go

    --Alternate Method using CTE. Looks very complicated even with CTE

    with

    Top1Order

    as

    (

    select CAST(substring(tdsid,14,12) as float) as MaxSalesOrderId

    from

    (

    select CustomerID,MAX(replicate('0',12-len(totaldue)) + totaldue + ':' + replicate('0',12-len(salesorderid)) + salesorderid) tdsid

    from

    (

    select CustomerID,CAST(Totaldue as varchar) as totaldue,CAST(SalesorderId as varchar) salesorderid

    from Sales.SalesOrderHeader

    ) dta

    group by CustomerID

    ) dtb

    )

    ,Top2Order

    as

    (

    select MaxSalesOrderId,1 as OrderRank from Top1Order

    union all

    select CAST(substring(tdsid,14,12) as float) as MaxSalesOrderId,2 as OrderRank

    from

    (

    select CustomerID,MAX(replicate('0',12-len(totaldue)) + totaldue + ':' + replicate('0',12-len(salesorderid)) + salesorderid) tdsid

    from

    (

    select CustomerID,CAST(Totaldue as varchar) as totaldue,CAST(SalesorderId as varchar) salesorderid

    from Sales.SalesOrderHeader soh left merge join Top1Order t1o on soh.SalesOrderID = t1o.MaxSalesOrderId

    where t1o.MaxSalesOrderId is null

    ) dta

    group by CustomerID

    ) dtb

    )

    select * from Top2order t2o inner join Sales.SalesOrderHeader soh

    on t2o.MaxsalesOrderId = soh.SalesOrderID

    /*

    Logical IO 3100

    Elapsed Time 1512 ms

    */

    --Without CTE and using the temp table

    drop table #Top1Order

    drop table #top2Order

    go

    select CAST(substring(tdsid,14,12) as float) as MaxSalesOrderId,1 as OrderRank

    into #Top1Order

    from

    (

    select CustomerID,MAX(replicate('0',12-len(totaldue)) + totaldue + ':' + replicate('0',12-len(salesorderid)) + salesorderid) tdsid

    from

    (

    select CustomerID,CAST(Totaldue as varchar) as totaldue,CAST(SalesorderId as varchar) salesorderid

    from Sales.SalesOrderHeader

    ) dta

    group by CustomerID

    ) dtb

    select CAST(substring(tdsid,14,12) as float) as MaxSalesOrderId,2 as OrderRank

    into #Top2Order

    from

    (

    select CustomerID,MAX(replicate('0',12-len(totaldue)) + totaldue + ':' + replicate('0',12-len(salesorderid)) + salesorderid) tdsid

    from

    (

    select CustomerID,CAST(Totaldue as varchar) as totaldue,CAST(SalesorderId as varchar) salesorderid

    from Sales.SalesOrderHeader soh left merge join #Top1Order t1o on soh.SalesOrderID = t1o.MaxSalesOrderId

    where t1o.MaxSalesOrderId is null

    ) dta

    group by CustomerID

    ) dtb

    select * from (select * from #top1order union all select * from #Top2Order) t2o inner join Sales.SalesOrderHeader soh

    on t2o.MaxsalesOrderId = soh.SalesOrderID

    /*

    Logical IO 2310

    Elapsed Time 984 ms

    */

    --You can have a cursor method as well

    --You can have the alternate method without CTE and without temp table.The sql query will not be understdanble and too complex

    --Analytical Methods

    with OrderRank

    as

    (

    select *,ROW_NUMBER() over(partition by customerid order by totaldue desc) as OrderRanking from Sales.SalesOrderHeader

    )

    select CustomerID custid,OrderRanking,* from OrderRank where OrderRanking <= 2

    order by CustomerID

    --Just a simple select statement now look at the performance.

    /*

    Logical IO 706 --it does just a table scan..

    Elapsed Time 968 ms

    */

    --Thus anlytical function is very simple,elegant and need less resources.Also, it put less pressure on your brain as it is easy to code

    --no need to think how to achieve this.

    ---Next I will use a aggregate function performance..

    --Scenario: Find all the orders for the customers who has at least one order of value (total due) more than 100 K.

    --alternate method

    with CustomersWithOneOrderMoreThan100K

    as

    (

    select customerid,MAX(TotalDue) MaxOrderValue

    from Sales.SalesOrderHeader

    group by CustomerID

    having MAX(TotalDue) > 100000

    )

    select * from Sales.SalesOrderHeader soh inner join CustomersWithOneOrderMoreThan100K cwoo

    on cwoo.CustomerID = soh.CustomerID

    /*

    Logical IO 1412 -- Just two clustered index scans

    Elapsed Time 270 ms

    */

    --analytical method

    with mycte

    as

    (

    select *,MAX(TotalDue) over(partition by CustomerID) as MaxOrdervalue

    from Sales.SalesOrderHeader

    )

    select * from mycte where MaxOrdervalue > 100000

    --looks elegant and simpel.But what about performance?

    /*

    Logical IO 139813 -- Uses the table spool which is causing too much IO's

    Elapsed Time 511 ms

    */

    -- I could use row_number function here as well to achieve the max kind of effect but it will be similar to the alternate method but better in terms of performance

    with mycte

    as

    (

    select *,row_number() over(partition by CustomerID order by TotalDue desc) as MaxOrdervalue

    from Sales.SalesOrderHeader

    )

    select * from mycte where MaxOrdervalue = 1 and TotalDue > 100000

    /*

    Logical IO 706 -- Uses one table scan

    Elapsed Time 244 ms

    */

    --Thus rather than writing the max and min analytical function use the row_number function to achieve the same result but with dratsic improvement in performance.

    --However, sum can not be written by using ranking analytcial function thus it will have serious performance issue.

    --Thus before using the Anytical Aggregate function like sum and avg please benchmark performance against your existing method or use alternate methods

    /*

    In conclusion: New features are good. But please test thier performance against alternate methods. I hope the microsoft

    people will not just implement the new features but will provide the best plan an dperformance than

    existing alternate methods.

    */

    GulliMeel

    Finding top n Worst Performing queries[/url]
    Improve the performance of Merge Join(special case)
    How to Post Performance Problem -Gail Shaw[/url]

Viewing 0 posts

You must be logged in to reply to this topic. Login to reply