Solving the "Running Total" & "Ordinal Rank" Problems in SS 2k/2k5

  • Christopher Ford (2/19/2008)


    Kinda like finding chicken's teeth and my chicken just flew away to a different state.

    Eh.... What? :blink:

    ---

    Frank & Kevin both bring up some very valid points though with the state of the industry.

    People just don't know what to do anymore when it comes to databases or developing applications.

    It's become a world of wizards and GUI's. It's getting very tough to find people who can apply a basic index and understand what happened.

    Heh... sorry... guess I have a language all my own...

    What I meant was that SQL Server jobs are pretty scarce in Michigan. The company I'm currently working for has decided to pick and move from Michigan to North Carolina and I've decided to not move with them.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • >> My problem, lately, is that people know so little about SQL Server in the Detroit area, they think any ol' GUI programmer or network admin can maintain and program SQL Server... they don't really have a clue about performance code... doesn't make for too many good jobs for folks like me. Kinda like finding chicken's teeth and my chicken just flew away to a different state.

    Well, the amazing thing is that most anyone CAN install and develop against and 'maintain' SQL Server! But it only lasts for so long before bad data structures, bad code, bad maintenance and too much data get the best of any hardware. THAT is the point at which I get asked to come in - and jeez louise is business rocking! Thank you Microsoft for making such an accessible product!! 😀

    Good one on the Administratium too!!! :w00t:

    >>When I do training, I've found that most people do not remember a thing when you cater to being politically correct or explaining things.

    >>They do remember bad examples and the challenge of overcoming them. And they also remember that they don't want to be caught in the same mess when they go live with their code.

    These are my mantras when I enter a new perf tuning engagement. I communicate findings very directly and use the client's data/code in before/after scenarios whenever possible. Much easier to grasp new concepts (and to see unequivocally that there stuff is suboptimal) that way!

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Actually, Kevin, what you said is spot on... but they haven't figured out the rest of what you said, yet... when they do, Michigan will be a hot spot for folks like you, me, and some others on this forum because everyone in the State is going to need some pretty good fixes when their hardware can no longer keep up with the sorry code. 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Christopher Ford (2/19/2008)


    Kinda like finding chicken's teeth and my chicken just flew away to a different state.

    Eh.... What? :blink:

    ---

    Frank & Kevin both bring up some very valid points though with the state of the industry.

    People just don't know what to do anymore when it comes to databases or developing applications.

    It's become a world of wizards and GUI's. It's getting very tough to find people who can apply a basic index and understand what happened.

    Then again - until they come up with some substantially smarter wizards - I'm not particularly worried. The wizards have some serious limitations which will have to be addressed by folks who DO know how this stuff works. Try to get a wizard to tell you "this will work great for <50K records, but will perform about as well as trying to suck wet cement through a straw after that".

    I do think we're in a bit of a phase here with pressure in that direction. Some of what we do is still very much art over a science, since there tends to be so many factors playing into that ephemeral concept we call "performance". And - no amount of hardware will erase all of those disadvantages. From where I'm sitting, the size and complexity of data being stored is growing at a faster rate than the hardware is, meaning - the design and the code become MORE important.

    That's not to say that there won't be a serious shakeup in our world. I'm just not seeing the DBA role going away any time soon. Changed perhaps.

    Just wait for all of those lovely LINQ projects to hit their 2 year maturity period with the messy dynamic SQL constructs and the dynamic SQL context (which is taking 15-20% of a performance premium off the top). Once they hit that critical mass and the abstract data layer starts taking a dump because "the same algorithm applies to 1 record as it does to 1Million record", then someone will need to step in and fix it all.

    And - don't get me wrong - I'm not bashing LINQ per se. I'm bashing those who want to advance LINQ as a way to bypass knowing anything about the data layer and how it works. Having abstraction is wonderful in a developer's world, but trading away performance should not be one of the accepted by-products (or rather - deleterious side-effects).

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Just to add one more bit on this:

    Over the w/e, I was curious how the engine would handle these queries as a view or as a TVF; the view was dismal (as expected), but the TVF (in-line, built with the single-query CTE approach) is blazingly fast: 15secs or so on my machine to do grouped running totals on the million row table (via CROSS APPLY).

    Here's the code (same base table as before):

    create function slowcteTVF(

    @acct as int

    ) returns table

    as return

    with

    sequential(AccountID, seq, Date, Amount) as

    (

    select

    AccountID

    , row_number() over(partition by AccountID order by Date) as seq

    , Date

    , Amount

    from JBMTest

    where AccountID = @acct

    )

    ,summed(AccountID, seq, Date, Ammount, Running) as

    (

    select

    sq.AccountID

    , sq.seq

    , sq.Date

    , sq.Amount

    , sq.Amount as Running

    from sequential as sq

    where sq.seq = 1

    and AccountID = @acct

    UNION ALL

    select

    sq.AccountID

    , sq.seq

    , sq.Date

    , sq.Amount

    , sq.Amount + pri.Running as Running

    from sequential as sq

    join summed as pri on

    sq.AccountID = pri.AccountID

    and sq.seq = pri.seq + 1

    where sq.seq > 1

    and sq.AccountID = @acct

    )

    select * from summed

    -- full report for all accounts:

    select *

    from (select distinct(AccountID) from JBMTest) as accts

    cross apply slowcteTVF(accts.AccountID) as rttl

    -- single account query:

    select * from slowcteTVF(5)

    For some reason, the server is able to apply more parallelism to this verstion than to the bare CTE query with the table var....

    -frank


    The End.

  • >>When I do training, I've found that most people do not remember a thing when you cater to being politically correct or explaining things.

    >>They do remember bad examples and the challenge of overcoming them. And they also remember that they don't want to be caught in the same mess when they go live with their code.

    These are my mantras when I enter a new perf tuning engagement. I communicate findings very directly and use the client's data/code in before/after scenarios whenever possible. Much easier to grasp new concepts (and to see unequivocally that there stuff is suboptimal) that way!

    Yes, but then what do you do when they look at you and tell you, "Why aren't you doing a table scan? Didn't you know that's faster than a Clustered Index Scan?" 😉

    Christopher Ford

  • 1) Aren't ALL jobs scarce in Michigan, not just SQL Server ones?? 😉

    2) The Research Triangle area of NC is really good for tech work right now. As is Atlanta, GA too I think.

    3) What's a Clustered Index?? :w00t:

    4) In both of my Code Camp presentations recently on SQL 2008 I made the comment "Everyone PLEASE start using LINQ to SQL like CRAZY!! MAKE ME RICH!!" 😎

    5) It would be quite an interesting finding if the TVF did indeed 'encourage' more parallelism than without. Oh, great function name too, btw! hehehe

    6) Anyone notice that we are up to almost TWO HUNDRED posts on this thread!?!?! Wonder what the record is - at least for a technical discussion and not something like sex or religion . . .

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • As a wrap-up of the CTE approach from my perspective:

    Building on the last set of code I posted (the TVF version), compare a view on the TVF with the TVF directly:

    ----- view on cross apply tvf: still runs the loop, but always does full work (time invariant) as expected

    create view rttl(AccountID, seq, Date, Amount, Running)

    as

    select rttlfn.*

    from (select distinct(AccountID) from JBMTest) as accts

    cross apply slowcteTVF(accts.AccountID) as rttlfn

    select * from rttl

    where AccountID in (1,2,5)

    -- Show all line items with running total and the current total for the 5 accounts with largest current total

    -- First with TVF directly:

    select rttl.*, accts.[Current Total]

    from (select top(5) AccountID, sum(Amount) as [Current Total]

    from JBMTest

    group by AccountID

    order by AccountID) as accts

    cross apply slowcteTVF(accts.AccountID) as rttl

    -- Via the view: (again, time invariant -- full work done)

    select rttl.*, accts.[Current Total]

    from (select top(5) AccountID, sum(Amount) as [Current Total]

    from JBMTest

    group by AccountID

    order by AccountID) as accts

    inner join rttl

    on accts.AccountID = rttl.AccountID

    I think this is a great example to show how CTEs and TVFs can work wonderfully well together to greatly improve the performance and optimization of complex queries, and they greatly facilitate modularization of SQL code. Not only is the performance absolutely wonderful, but the queries are actually very straightforward and they are very clear in what they mean/do: easy to write and maintain, with excellent opportunity for re-use.

    -frank


    The End.

  • 5) It would be quite an interesting finding if the TVF did indeed 'encourage' more parallelism than without. Oh, great function name too, btw! hehehe

    Yes, indeed it does. The TVF wrapping the CTE seems to not only fix the ills of the all-in-one CTE (where the row ordering is done in a first query in the CTE WITH block), but also results in parallel operators being inserted into the execution plan. Thus, the TVF wrap eliminates the need for a table var (well, it's sort of still there, but it's disguised as the result set) and so runs at the same speed as the table var approach with parallelism off. Re-enabling parallelism makes the TVF approach much, much faster (well, on my idle dual quad core at least).

    (Note that the table var approach with the CTE did NOT have parallel ops in the actual execution plan; I am moderately curious as to what caused it to be inelligible for parallelism, given the posting here about table vars and parallelism....)

    Regarding the name: without the TVF wrap, the all-in-one CTE did not detect the loop opportunity, and instead did a full triangular join -- I never let it run to completion, but it was over 15 mins when I killed it, so I called that the "slowcte" approach. I had planned to try the fast CTE approach (with table var) in a multi-statement TVF, but when the slow approach wasn't anymore, I didn't bother...!

    -frank


    The End.

  • Sir Slicendice (2/19/2008)


    5) It would be quite an interesting finding if the TVF did indeed 'encourage' more parallelism than without. Oh, great function name too, btw! hehehe

    Yes, indeed it does. The TVF wrapping the CTE seems to not only fix the ills of the all-in-one CTE (where the row ordering is done in a first query in the CTE WITH block), but also results in parallel operators being inserted into the execution plan. Thus, the TVF wrap eliminates the need for a table var (well, it's sort of still there, but it's disguised as the result set) and so runs at the same speed as the table var approach with parallelism off. Re-enabling parallelism makes the TVF approach much, much faster (well, on my idle dual quad core at least).

    (Note that the table var approach with the CTE did NOT have parallel ops in the actual execution plan; I am moderately curious as to what caused it to be inelligible for parallelism, given the posting here about table vars and parallelism....)

    Regarding the name: without the TVF wrap, the all-in-one CTE did not detect the loop opportunity, and instead did a full triangular join -- I never let it run to completion, but it was over 15 mins when I killed it, so I called that the "slowcte" approach. I had planned to try the fast CTE approach (with table var) in a multi-statement TVF, but when the slow approach wasn't anymore, I didn't bother...!

    -frank

    That's pretty cool Frank! =)

    I had to bump my test table to 2 million rows for Parallelism to kick in. but sure enough, it kicked in.

    Christopher Ford

  • This I guess may be in way on and off topic...

    But what if you wanted to age them?

    And what if you wanted to age them based on the max(Date) for each account?

    Oh....WOW THIS IS GREAT!

    :: Edit :: Well, I spoke too soon...I'm not really *aging* them this way. hehe but it's still way cool.

    select rttl.*, DATEDIFF(month, rttl.Date, accts.Date) as Age, accts.[Current Total]

    from (select top(5) AccountID, max(date) as Date, sum(Amount) as [Current Total]

    from JBMTest

    group by AccountID

    order by AccountID) as accts

    inner join rttl

    on accts.AccountID = rttl.AccountID

    It's also lets you do custom aggregations without fussing!! Yahoo!!!

    :w00t: 😎

    Christopher Ford

  • BTW, on a production test.

    I had to run Jeff's running total update because I had 80 million rows to group by 1,928,672 different accounts over a 7 year period.

    1.5 minute run time...

    Now...THAT...is priceless.

    ::Edit -- The clustered index was already created in the order I needed the update run, so I didn't factor that into the run time. ::

    Christopher Ford

  • Christopher Ford (2/19/2008)


    BTW, on a production test.

    I had to run Jeff's running total update because I had 80 million rows to group by 1,928,672 different accounts over a 7 year period.

    1.5 minute run time...

    Now...THAT...is priceless.

    ::Edit -- The clustered index was already created in the order I needed the update run, so I didn't factor that into the run time. ::

    Thats lightning speed indeed !

    Just keep in mind there are issues with partitioned tables !

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • Well, you sure can age these: all line items for the 10 accounts that have the longest period without a transaction and that have a positive current balance:

    select rttl.*, accts.age, accts.[Current Total]

    from (select top(10) AccountID, datediff(month, max(Date), '2009-06-01') as age, sum(Amount) as [Current Total]

    from JBMTest

    group by AccountID

    order by AccountID) as accts

    cross apply slowcteTVF(accts.AccountID) as rttl

    where [Current Total] > 0.0

    and age > 1

    (The dates in my test data work best around 2009-06-01; luck of the random population process.....)

    Anyway, it's easy to add criteria and sub-queries, etc, and full performance is maintained. (This query is sub-1 sec on my machine.)

    Why store the running totals when it is fast to compute them on demand? What you don't store can't hurt you....

    -frank


    The End.

  • Thats lightning speed indeed !

    Just keep in mind there are issues with partitioned tables !

    Ah, which is how we got on this track of guaranteed, ordered, lightning speed updates with CTE's CLR's, UDF's, TVF's, and LOL's.

    Christopher Ford

Viewing 15 posts - 181 through 195 (of 250 total)

You must be logged in to reply to this topic. Login to reply