Alternative to Over() sum and max applications?

  • So I have a script where we use Sum(Dollar amounts) over (customer account)

    as well as many other 'Over()' categories i.e sales person, order type, etc etc.

    Is there an alternative to this method...it seems to be causing an unnecessary performance hits (not to mention overly complicated query requirements!)

    I'll admit I didn't build this, and it is taking 2 common table expressions in order to reach these calculations, when if I could get around these over() calculations I could do it in one statement.

    Is there another way to do this?

    Link to my blog http://notyelf.com/

  • It would help to see the actual code, the DDL for the tables involved, some sample data for the tables involved, and finally the expected results based on the sample data.

    For assistance with this, please read the first article I have referenced below in my signature block regarding asking for assistance. Following the instruction in that article will get you better answers plus test coded.

  • Ok Lynn you got it 😀

    I know the article...I was being lazy (ok I was extremely busy!) and hoping for an easy answer...I will compose something here to consume 😀

    Link to my blog http://notyelf.com/

  • there certainly are ways to answer aggregate queries without OVER, but they can be significantly LESS performant. As Lynn said, we need tables, data information, queries, and query plans to be able to help.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Ok I have attached a text file that builds the data (19 rows) and does 3 select queries.

    The first is the main and would be a small chunk out of the normal data I am using. the last 2 show what I am summing exactly, that is used as an over() function within the main query. Visually this makes much sense I believe 🙂

    There are actually about 400k records in the main table (more columns this is simplified) that aggregate down to about 6k when we run the aggregate query.

    What we do is a common table expression to aggregate the main table down to the 6k, then another select query that uses the columns from the CTE and adds the Over() function. My thought is there should be a simpler way to do this, and add something that will achieve the same thing in the main query...

    Link to my blog http://notyelf.com/

  • Not sure I follow exactly, but this query is much more efficient from a read perspective, but it doesn't seem to give the same output as what you have. I just cannot follow your need here, sorry!

    select

    t1.Account

    ,t1.Order_Type

    ,t1.YTDsum_Invoices

    ,YTD_ByOrder_Type

    ,YTD_ByAccount

    from (

    SELECT

    Account

    ,Order_Type

    ,Sum(YTDsum_Invoices) as YTDsum_Invoices

    FROM #Mytable

    GROUP BY Account,Order_Type) as t1

    inner join (

    -- Sum By account

    SELECT Account , Sum(YTDsum_Invoices) as YTD_ByAccount

    FROM #Mytable

    GROUP BY Account) as t2 on t1.account = t2.account

    inner join (

    -- Sums by order type

    SELECT Order_Type , Sum(YTDsum_Invoices) as YTD_ByOrder_Type

    FROM #Mytable

    GROUP BY Order_Type) as t3 on t1.order_type = t3.order_type

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • That's actually exactly my intended need!

    That solution crossed my mind a few times, the problem is I would have to rewrite the query I have which is over 200 lines long...plus I would need to make about 15 inner joins like that to accommodate all over() functions.

    I am wondering if that would kill the performance? Not that it is great anyways 😀

    Link to my blog http://notyelf.com/

  • shannonjk (11/13/2009)


    That's actually exactly my intended need!

    That solution crossed my mind a few times, the problem is I would have to rewrite the query I have which is over 200 lines long...plus I would need to make about 15 inner joins like that to accommodate all over() functions.

    I am wondering if that would kill the performance? Not that it is great anyways 😀

    Lots of inner joins have the POTENTIAL of killing perf, but don't always do that. Often it comes from having out-of-date statistics and/OR skewed data distributions. A very small variance in estimated/actual rows can lead to DRAMATIC perf differences when those variances have to wind their way out through lots of joins. What would DEFINITELY cause a problem is hitting large tables iteratively. Note that my solution already has to scan the table multiple times. If it is large doing that 15 times would definitely be suboptimal, although still perhaps the 'best' solution possible.

    And no more whining about having to refactor 200-line long queries you big baby!! 😀

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Thanks for the info!

    Haha nice! Yea I know I can do it...its doing it when I have a lot on my plate already, plus deadlines of getting these queries done...faced with refactoring a query becomes a little stressful 😉

    Oh well, it will be fun anyways 😉

    Link to my blog http://notyelf.com/

  • Consider it job security! That view (especially these days) makes all kinds of things seem more acceptable!

    Hope the refactor really smokes the original version!

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • So I managed to get this refactored finally, other critical projects took priority!

    I managed to squeeze the script down to 1 CTE and its subsequent Select statement. The high level version of this now runs in roughly 60 seconds instead of 11 minutes, and the detail level version runs in 2 seconds instead of ~8. The original subtree cost was over 50 million before whereas now it is between 4 and 19k depending on the parameters of the stored proc!

    Was definitely worth it!

    Link to my blog http://notyelf.com/

  • shannonjk (12/4/2009)


    So I managed to get this refactored finally, other critical projects took priority!

    I managed to squeeze the script down to 1 CTE and its subsequent Select statement. The high level version of this now runs in roughly 60 seconds instead of 11 minutes, and the detail level version runs in 2 seconds instead of ~8. The original subtree cost was over 50 million before whereas now it is between 4 and 19k depending on the parameters of the stored proc!

    Was definitely worth it!

    Notable improvement there!

    50 MILLION cost? That smokes the largest I have ever seen, but unfortunately for you in this case biggest is NOT best!! 😀

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Yes my jaw hit the desk when I saw 50million cost originally!

    I concluded after researching the chaotic sea that was the execution plan, that the distinct counts done on the outer layer select statement as well as the window functions done on the same select statement was bring RBAR to a whole new level; since there were TWO CTE's, each had functions and sums (and distinct subquery counts), where the Select statement was performing window functions on the CTE window functions! Basically each row on the outer most select statement was performing a calculation against the previous 2 cte's which in turn were performing calculations.

    Basically each row was exponentially calculated so returning a result set of 1000 rows was more like performing calculation against millions of rows.

    Now that I have that done I can review the statistics and indexes and get it to probably go a little faster 🙂

    Link to my blog http://notyelf.com/

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply