Solving the "Running Total" & "Ordinal Rank" Problems in SS 2k/2k5

  • This just back from Itzik. He states that the running totals problem can be solved by this construct:

    Yes, that would be the missing ORDER BY and ROWS subclauses of the OVER

    clause with aggregate functions. For example,

    SELECT empid, dt, qty,

    SUM(qty) OVER(PARTITION BY empid

    ORDER BY dt

    ROWS BETWEEN UNBOUNDED PRECEDING

    AND CURRENT ROW) AS run_total_qty

    FROM MyTable;

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Does Itzik's syntax work on 2005? Because I can't get to stop whining about the word ROWS (and since it can't seem to get past it -don't have a clue if it doesn't like the rest).

    And - of course - no mention of that anywhere in the OVER() clause documentation.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Hello TheSQLGuru,

    you just scared the hell out of me!

    I first thought I'd ask you to translate this into plain english:D

    Fortunately, if one takes a thorough look at the statement, it becomes crystal clear pretty quick.

    What a neat and clean solution!

    But unfortunately I guess this is not valid code in SQL2K5, and maybe not even yet in 2K8?

    For anyone interested in the details of this type of statement, I just found this on google:

    http://www.orafaq.com/node/55

    Thanks for researching this!

    Best Regards,

    Chris Bรผttner

  • Itzik and some other MVPs have been pushing hard for improvements such as the one I listed to Transact SQL. This is one of several they are pursuing. Some of his stuff has been or will definitely be implemented in 2008. This one has not been agreed to yet. Here's hoping!

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (2/1/2008)


    This just back from Itzik. He states that the running totals problem can be solved by this construct:

    Yes, that would be the missing ORDER BY and ROWS subclauses of the OVER

    clause with aggregate functions. For example,

    SELECT empid, dt, qty,

    SUM(qty) OVER(PARTITION BY empid

    ORDER BY dt

    ROWS BETWEEN UNBOUNDED PRECEDING

    AND CURRENT ROW) AS run_total_qty

    FROM MyTable;

    Ya sure that's for SQL Server??? Because even books Online says that you can't use ORDER BY on the aggregate functions with OVER(). It certainly doesn't work on 2k5 DE sp2+...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • TheSQLGuru (2/1/2008)


    Itzik and some other MVPs have been pushing hard for improvements such as the one I listed to Transact SQL. This is one of several they are pursuing. Some of his stuff has been or will definitely be implemented in 2008. This one has not been agreed to yet. Here's hoping!

    Oh good! would be a welcome change.

    I got a scare there, because for a second it sounded like he was recommending we switch to Oracle to do running totals....:)

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Didn't see Kevin's latest post...

    That'll be nice to have when 2k8 comes out. I thought you were saying that it's a viable 2k5 solution. Sorry for the confusion.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (1/31/2008)


    Philippe Cand (1/31/2008)


    Folks, there's absolutely no guarantee that MS will keep a particular method, function, stored proc, or functionality the same. I'll give you one of the best examples of something that I absolutely loved that they took away in 2k5... Query Analyzer and it's wonderful {f4} key! Both were supported by MS to the max... but it's gone, now.

    Yep, another example is Notification Services. It is gone in SQL2008.

    BI Guy

  • Philippe Cand (2/1/2008)


    Jeff Moden (1/31/2008)


    Philippe Cand (1/31/2008)


    Folks, there's absolutely no guarantee that MS will keep a particular method, function, stored proc, or functionality the same. I'll give you one of the best examples of something that I absolutely loved that they took away in 2k5... Query Analyzer and it's wonderful {f4} key! Both were supported by MS to the max... but it's gone, now.

    Yep, another example is Notification Services. It is gone in SQL2008.

    And none too soon. IMNSHO, that mess had no business being in the SQL Server system in the first place!

    Gone also is BACKUP LOG WITH TRUNCATE_ONLY/NO_LOG, which I actually use fairly regularly.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (2/1/2008)


    Gone also is BACKUP LOG WITH TRUNCATE_ONLY/NO_LOG, which I actually use fairly regularly.

    Too bad, I found this handy to keep a virtual server test database slim.

    I do not regret NS, found it not RTM level, not worth the hassle. So Only a few brave people used it reason why it goes away. No traction.

    I am afraid that MS will keep asking questions about it in their 2005 cert exams?

    About Itzik syntax, I played a little with it on SQL2005 with no luck.

    BI Guy

  • For those interested, here is a document written by Itzik Ben-Gan and Sujata Mehta with an incredible amount of information on the OVER enhancements I have been speaking of: http://www.sql.co.il/books/insidetsql2005/OVER_Clause_and_Ordered_Calculations.doc

    Very enlightening and eye-opening stuff for me!

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (2/1/2008)


    Which we could figure out a way to do the read-along scan test on a huge table in enterprise edition too. Perhaps one of you guys with a big box could test that too?

    I tried this afternoon. 32 proc box, 64 GB memory, 2005 SP2 x64 Enterprise edition.

    I had 4 connections open. 3 were running the following code in a permanent loop.

    checkpoint

    dbcc dropcleanbuffers -- force data out of memory, so that the next access has to be from disk

    select * from JVMTest WITH (nolock)

    The other connection, also in a permanent loop, was running the grouped version of the running totals update off a 10 million row table and comparing the results to a benchmark that I'd calculated and stuck in a temp table.

    In 2 hours, I didn't get it to produce incorrect results once. I did succeed in killing my workstation though.

    I also couldn't get the update to run parallel, despite dropping the parallelism threshold to 0. ๐Ÿ™

    If there are any other tests you'd like me to run on that server, drop me a mail. I should have access to it for a few more days.

    Aside, regarding backup log with truncate. None too soon in removing it. If you don't want log backups, use simple recovery.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thanks for the testing Gila! I wonder if there is an indication hidden somewhere in the full XML plan or exec query stats or some other DMV that indicates that the I/O thread joined with another in process. In any case, seems like your evolution would have resulted in at least some runs hitting another concurrent one and doing the merry-go-round thing.

    As for simple mode, it isn't that simple. ๐Ÿ˜€ Altering the db to set simple recovery mode doesn't necessarily force the removal of committed trans right then, which truncate_only does. Also it is less things to type, you never forget to set it back to full mode, etc. Still like the option of having it for transient needs.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (2/1/2008)


    Thanks for the testing Gila! I wonder if there is an indication hidden somewhere in the full XML plan or exec query stats or some other DMV that indicates that the I/O thread joined with another in process. In any case, seems like your evolution would have resulted in at least some runs hitting another concurrent one and doing the merry-go-round thing.

    That's what I thought too. I'd be quite surprised if there was any indication in the exec plan. The merry-go-round scan is a physical IO operation, whereas the Exec plan's at the query processing level. Maybe something in one of the IO DMVs.

    As for simple mode, it isn't that simple. ๐Ÿ˜€ Altering the db to set simple recovery mode doesn't necessarily force the removal of committed trans right then, which truncate_only does.

    No, but issuing a checkpoint while in simple recovery mode does

    I've just seen too many people stuff up log chains with truncate_only. They use it without realising what it does, then, when the DB won't restore fully, blame someone else. (BTDT)

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Hi Jeff. I was challenged to explore using your method(s) as a less resource-intensive approach to the subquery method of getting the TOP n candidates from a set.

    It was an interesting exercise and I would appreciate your comments if possible. The time to create the temporary table, create indexes and the time/resources for the inserts then updates added a select of the specific candidates (highest two of each group) did not come out to be as vastly-different as I thought it might from the time/resources from the TOP 2 subquery approach if I created a non-clustered index first then did the TOP 2 query.

    CREATE NONCLUSTERED INDEX IX_JBMTest_AccountID_Amount

    ON dbo.JBMTest (AccountID,Amount desc)

    go

    select j1.accountid, j1.amount

    from jbmtest j1 WITH (INDEX(IX_JBMTest_AccountID_Amount),TABLOCKX)

    where j1.amount in

    (select top 2 j2.amount from jbmtest j2

    WITH (INDEX(IX_JBMTest_AccountID_Amount),TABLOCKX)

    where j1.accountid = j2.accountid

    order by j2.accountid, j2.amount desc)

    order by j1.accountid, j1.amount desc

    drop index jbmtest.ix_jbmtest_accountid_amount

    In my unscientific approach, running the TOP 2 query/subquery without the nonclustered index was about 1200 units of cost. With creating the non-clustered index as above the total cost (index creation and select) as reduced to about 15 units.

    This turned out to be less than the time/resources needed for creating the temp table, indexes and updates even prior to doing the select for results. It seemed the time to do the Updates alone was around 30 units of cost.

    My thought with using a non-clustered index was that even if there were already a clustered index out on the current table, you could add a non-clustered one, run the TOP n select then DROP the non-clustered index. In that way you wouldn't have much effect on any other running queries.

    It looks like the key here would be the index tailored to how you need to extract the top N rows by group as the sort was the major component in the TOP n query. The Update statements and temporary table would be redundant. Am I missing something?

    Running totals and counts would be a much different situation.

    Toni

Viewing 15 posts - 61 through 75 (of 250 total)

You must be logged in to reply to this topic. Login to reply