Solving the "Running Total" & "Ordinal Rank" Problems in SS 2k/2k5

  • Jeff,

    Thanks for the great examples on generating running totals. Question: You mentioned that you have to include the "anchor" (or dummy) in the update statement in order for it to work correctly, but you didn't explain why. I can see its use in the group total, group rank, but could you explain why it is needed in the update statement for the overall running total? (aside from "just because it doesn't work without it")

  • Jeff, thanks for this article I learn something new.

    I was relying on Reporting Service reports, cubes and MDX to do the running total for me. This article shows me a way to do it in classic reports.

    I wonder if the same type of technique could be used as a general way to get a rid of all cursors and while loops. I stay away from but sometimes have no alternative.

    I have a specific problem now. I must create a table to report on contract compliance. I can have multiple contracts for the same customer with or without the same products in them. I can have different dates of start and finish. Dates can overlap. The revenue table does not have any contract info.

    To merge contract info with revenue data, I must use while loops and for each row, I put the contract num, the customer, the product, the start and finish date into variables, then in a while loop I add row by row the combined data in the result table by merging the Contract num with the matching data found in the revenue table.

    When done one can look at performance on a contract by contract basis. The sum of all contracts for a given customer product and period could double count revenue. This is understood and acceptable.

    I have 11 different type of contracts, the level of grouping changes and the nature of the revenue changes as well, can be direct, indirect and has to look at both revenue and orders. Takes 5 hours to run on a 64 bits machine, once a week on Saturday.

    These tables have million of rows, it is working but if it could work in a fraction of the time it would be better.

    BI Guy

  • Very nice, and thanks for the link.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Philippe -

    While I would recommend you start a separate thread on the specifics of your process, I would imagine that this technique might be useable in your scenario. Hard to tell for sure, but it's certainly one way to step away from some loops and cursors (although to be fair - the MS gear-heads that blog on this call this technique the "inner loop" process, since you're flying through each row in sequence, albeit VERY quickly).

    Jeff is probably going to give me the evil eye on that one....:)

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Unfortunately SQL 2k5's over clause is missing that little bit of magic present in the SQL standard and other SQL products. The full functionality is also not present in 2008 but perhaps the next version (2010/2011?) might have it.

    See http://www.sqlmag.com/Articles/ArticleID/95007/95007.html?Ad=1. In the article you can vote to have the extra features added and there's a link to a well written MS Word document in the article. In fact it serves as a great introduction by itself to the OVER clause and what's possible both now and, hopefully, in the future. Direct link: http://www.insidetsql.com/OVER_Clause_and_Ordered_Calculations.doc

    😀

    Courtesy goes to Itzik Ben-Gan and Sujata Mehta for the article/paper - I'm just linking to it.

  • Pure Awesomeness!

    Run over 1.4million records on a 6 level 'hierarchy' in about 10 s - compared to the 'set' solution that takes 1.5hrs...

    :exclamationmark:

    Oh yes, and validation against the previous values indicated no differences. Mindblowing.

    Thanks for possibly the single most impressive bit of code I have seen on here! :smooooth:

    Rich

  • Few things to test:

    1) read uncommited isolation level and also NOLOCKs (if not already tested)

    2) snapshotted table, with modifications

    3) partitioned tables

    4) table spread across multiple file groups

    5) Enterprise Edition where you start the run while another scan is ongoing. The "join-along" capability of EE will jump on the existing read thread, then loop around to the beginning, which should break the code due to out-of-order data. I am pretty sure this one is a killer.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (1/31/2008)

    5) Enterprise Edition where you start the run while another scan is ongoing. The "join-along" capability of EE will jump on the existing read thread, then loop around to the beginning, which should break the code due to out-of-order data. I am pretty sure this one is a killer.

    The 'merry-go-round' scan. From what I understand of that, that only applies of the initial read from physical disk, not subsequent reads from memory. Is that correct? Do you know if this feature is enabled on the Developer edition, or just Enterprise? (Don't have BoL at hand) Also not likely to be an issue, due to the tablock.

    Edit: Comments based on mis-reading removed.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (1/31/2008)


    TheSQLGuru (1/31/2008)


    Few things to test:

    1) read uncommited isolation level and also NOLOCKs (if not already tested)

    Good point.

    For anyone curious - Read uncommitted isolation means that SQL can do an allocation order scan of the cluster, rather than a page order scan. Shouldn't make a difference if the pages are in the correct order in the file. If the index is fragmented however...

    Itzik had an entire presentation last year at PASS on the consequences of this. Very scary.

    Two follow-on questions:

    - Jeff is advocating locking the table exclusively (with (....,TABLOCKX)) , so - how does NOLOCK play into this? You mean - someone ELSE reading the table with NOLOCK?

    - How would the "traditional" method do with NOLOCK? Wouldn't you run into more danger of the Allocation scan problem with a process taking say, several minutes to several hours, versus a 10ms process?

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • My mistake. I missed the TablockX hint. Replied without checking back to the article.

    Edit: And even if the tablockX hint wasn't there, the statement in question is an update. Updates have to take locks. Allocation scans only happen in read uncommitted (because they can result in data anomalies) hence, no chance of getting an allocation-order scan in the first place.

    Allocation vs page order scan only matters if you're depending on the order of the rows been returned in the order of the clustered index, without an order by clause

    I will read carefully before posting. I will read carefully before posting. I will read carefully before posting. I will read carefully before posting. ..........

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I too missed the stated use of TABLOCKX, although that will reduce the usefulness of the query since I know I wouldn't allow that in MY production system unless it was during an off-hours batch run (assuming such existed). 🙂

    From 2005 BOL: ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/udb9/html/fb83d250-4252-4322-b57c-3c72de8407c8.htm

    Advanced Scanning

    In SQL Server Enterprise Edition, the advanced scan feature allows multiple tasks to share full table scans. If the execution plan of a Transact-SQL statement requires a scan of the data pages in a table and the Database Engine detects that the table is already being scanned for another execution plan, the Database Engine joins the second scan to the first, at the current location of the second scan. The Database Engine reads each page one time and passes the rows from each page to both execution plans. This continues until the end of the table is reached.

    At that point, the first execution plan has the complete results of a scan, but the second execution plan must still retrieve the data pages that were read before it joined the in-progress scan. The scan for the second execution plan then wraps back to the first data page of the table and scans forward to where it joined the first scan. Any number of scans can be combined like this. The Database Engine will keep looping through the data pages until it has completed all the scans. This mechanism is also called "merry-go-round scanning" and demonstrates why the order of the results returned from a SELECT statement cannot be guaranteed without an ORDER BY clause.

    Seems that unless you EXPLICITLY force the order by you can get out-of-order processing in this case. Makes me also believe it is possible in other situations (parallelism, allocation order scans, etc).

    Also from BOL: ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/instsql9/html/e5186f02-dd91-47d0-8fa4-de3f41c76903.htm

    SQL Server 2005 Developer Edition (32-bit and 64-bit)

    SQL Server 2005 Developer Edition lets developers build any type of application on top of SQL Server. It includes all of the functionality of SQL Server 2005 Enterprise Edition, but is licensed for use as a development and test system, not as a production server.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • It's a full table update, so the entire table has to be locked exclusively anyway until the update has completed, whether by a (forced) table lock or multiple row/page locks. If it's done on a temp table it won't have any impact on concurrency.

    I was reading up on MSDN on the advanced scanning. Since we've got an exclusive lock (from the update or from the tablockX) there shouldn't be any processes running with us in parallel. If it's a temp table, then there certainly own't.

    Is definitely something to keep in mind though.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Couldn't you have a scan running under NOLOCK be joined by a TABLOCKX scan?

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • OK, I just have to say that while this is all very clever, it also seems beyond absurd to be relying on undocumented functionality to speed up critical calculations, *especially* in the context of a database.

    Steve's blogging and so many threads here are all about the sanctity of the data, but this sort of approach would seem to run counter to all that.

    No amount of testing and adding hints can ever make up for using functionality that is explicitly unsupported. What happened to defensive programming? Are we really willing to throw that all away for a speed improvement? Getting occasionally wrong results back really quickly isn't really a win....

    -frank


    The End.

  • Yep, looks there are uncertainties in this approach.

    It is very database engine centric while mind opening. I keep this article in mind as valuable.

    I just thought it could be nice to use this to get a rid of loops. I think that such technique would have its full benefit revealed in such alternative to cursors when a set approach is not possible.

    I am not too much concerned about what I would call a "Hard coded" running total which I perceive more as of an academic interest rather than business added value for day to day use.

    People like to look at different levels of running total aggregations according to their needs. This simple "end user" mind set makes the whole proposal kind of a particular.

    That's why I leave the running total stuff out of the tables. Cubes does this as you want. Some may want to use SCOPE instead of iif to speed-up things even more but basically you put one colon ":" and you are done once for all. Example.

    iif(isleaf([CALENDAR].[Process Date].currentmember),

    Sum([CALENDAR].[Process Date].firstsibling:[CALENDAR].[Process Date].CurrentMember, [Measures].[Bill]),null)

    in ReportingServices, you create a column with RunningValue and you are done too. i.e. RunningValue(Fields!Cost.Value, Sum, Nothing)

    In Excel pivot you can also use Running total. A little less flexible but still valuable and very fast.

    Just my 2 cents.

    BI Guy

Viewing 15 posts - 16 through 30 (of 250 total)

You must be logged in to reply to this topic. Login to reply