Solving the Running Total and Ordinal Rank Problems (Rewritten)

  • peter-757102 (9/3/2010)


    I think that implementing running totals high performance style brings you to techniques that cannot just be copy/pasted by anyone.

    I think that's a fair assessment. It's been stated a number of times on this thread that this technique is probably not for everyone, partly for that reason. That said, Jeff did an awesome job of making the idea accessible. By the time we are done here, it might be even easier...

    On another note, I still want to find some time to build an incrementally updating 'running total' by means of triggers. I done such a thing for multiple running totals (over several entities) in the past using more accessible SQL. It works well for the (relatively) modest volume of data it had to work on (and still is). I like to see how combining this new information would speed such logic up.

    Way, way, back Alex Kuznetsov mentioned his blog entry that addresses that:

    http://sqlblog.com/blogs/alexander_kuznetsov/archive/2009/01/23/denormalizing-to-enforce-business-rules-running-totals.aspx

    And I might have another view then others here about the need for running totals in the database. First of all, you nearly always need it for speed purposes, but sometimes also for data integrity checks. Either way you need to guarantee the running total is always correct, which means guarding and updating it by means of triggers.

    Again, see Alex's blog entry. Indexed views are another alternative, if you can meet the conditions for them. The high-perf running totals methods described here (SQLCLR, set-based iteration, Quirky) are not targeted at that need.

    Paul

  • Paul White NZ,

    That link you posted doesn't work anymore, but I found the article (working link below).

    http://sqlblog.com/blogs/alexander_kuznetsov/archive/2009/01/21/denormalizing-to-enforce-business-rules-part-1.aspx

  • peter-757102 (9/3/2010)


    Paul White NZ,

    That link you posted doesn't work anymore, but I found the article (working link below).

    http://sqlblog.com/blogs/alexander_kuznetsov/archive/2009/01/21/denormalizing-to-enforce-business-rules-part-1.aspx

    There are two blog entries. My link was broken due to an extra square bracket (copy & paste error on my part) which I have now corrected.

  • Paul White NZ (9/3/2010)


    You approach the problem in a slightly different way - copying the source table to a heap first and relying on the ranking function to provide the required update ordering.

    The Ordered CTE Update is not relying on the windowed function to work. It relies on the heap and the ORDER BY in the CTE.

    If you change the ORDER BY in the CTE, you will get same 1/0 error.

    See this example, where the windowed function is the opposite of the query's ORDER BY

    ;WITH SafeTable

    AS (

    SELECT TOP(2147483647)

    1000001 - ROW_NUMBER() OVER (ORDER BY AccountID DESC, Date DESC, TransactionDetailID DESC) AS Sequence,

    AccountID,

    AccountRunningCount,

    AccountRunningTotal,

    Amount,

    Date,

    NCID,

    TransactionDetailID

    FROM #Temp

    ORDER BY AccountID,

    Date,

    TransactionDetailID

    )

    UPDATE SafeTable

    SET @AccountRunningTotal = AccountRunningTotal =

    CASE

    WHEN AccountID = @PrevAccountID

    THEN @AccountRunningTotal+Amount

    ELSE Amount

    END,

    @AccountRunningCount = AccountRunningCount =

    CASE

    WHEN AccountID = @PrevAccountID

    THEN @AccountRunningCount + 1

    ELSE 1

    END,

    @Sequence =

    CASE

    WHEN Sequence = @Sequence + 1 THEN Sequence -- updating in sequence

    ELSE 1/0 -- quirky update is broken!

    END,

    @PrevAccountID = AccountID;


    N 56Ā°04'39.16"
    E 12Ā°55'05.25"

  • SwePeso (9/3/2010)


    The Ordered CTE Update is not relying on the windowed function to work. It relies on the heap and the ORDER BY in the CTE.

    You know what I meant, and it comes to the same thing in this example.

    If you want to be precise about it, like the true Quirky Update, it relies on the Compute Scalar in the plan calculating values in the order requested by the intermediate materialization ORDER BY. Naturally, the ORDER BY in the ranking function has to match that.

    For sure, I do know that your method doesn't rely on the ranking function order by per se: your version of the Quirky Update method doesn't normally have a ranking function!

    Don't forget the TOP (9223372036854775807) šŸ™‚

    Paul

  • Paul White NZ (9/3/2010)


    Jeff Moden (9/3/2010)


    If a SEEK occurs, you could get an out of order update and that's why I didn't recommend INDEX(1). INDEX(0) forces the clustered index scan that's needed to guarantee the order of the update.

    Seeks are always ordered šŸ˜‰ It's the scan that might not be. Honest!

    It's the type of day I've been having... one of the few people who argues on my side and I have to disagree with him a bit. :pinch: I believe there may be code on this very thread that shows that's probably not true. I'll post it along with some other stuff I'm going to post once I've finished studying what everyone has said on this thread recently.

    Shifting gears... the code you provided in the "other" post is an absolutely awesome method of double checking the update order. Would you mind if I added the method to the article?

    Ha, so polite! Do I really have to answer that?

    It depends... did you feel the Earth move down under recently? šŸ˜› Thanks, Paul. šŸ™‚

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (9/3/2010)


    ... one of the few people who argues on my side and I have to disagree with him a bit. :pinch: I believe there may be code on this very thread that shows that's probably not true. I'll post it along with some other stuff I'm going to post once I've finished studying what everyone has said on this thread recently.

    Don't sweat it - I'm happy to be corrected, debates are cool, it's arguing that's not šŸ™‚

    Awaiting the demo or link with considerable interest. Maybe I should state clearly what I mean by seeks always being ordered: an index seek always has the Ordered:True attribute. A seek navigates down the b+ tree and performs a partial scan (forward or backward) from that point by following the doubly-linked list at the leaf level. That's what causes them to be ordered: the linked list connects adjacent pages in logical key order.

    I've been back over every post in this thread more than once over the last 24 hours, and I haven't noticed any code that shows INDEX(1) not working - but then again I wasn't looking for it šŸ™‚

    Paul

  • Less than two days, and I already have a lot of catching up to do. This topic sure if lively! šŸ˜‰

    Jeff Moden (9/3/2010)


    I usually value your opinion but you have to stop hammering at me personally.

    Let me respond to this first. Jeff, please accept my sincerest apologies for coming across as hammering at you personally. I know I can disagree very violently, but I always try to attack only the message, never the messenger. The quote above indicates that I failed.

    I have a very deep respect for you, and for your enormous contributions to the SQLServerCentral community. The fact that I utterly disagree with your opinion on this particular subject does not change that respect one bit.

    But (yes, there is the but, and it's a big one) I will NOT stop hammering at the quirky update method you advocate in this article.

    The technique works,

    No, it does not. My examples on page 11 in this topic still produce incorrect results. (Maybe you addressed those in one of the post I've yet to read).

    I've added code to check that it works

    True. But the code to check runs slower than the fully documented algorithm I posted and you then optimized further. So why would you then still choose quirky update + check?

    (and Paul White has recently done a much better job)

    Yes, I've seen it and it's awesome. It still uses undocumented features for the quirky update, but the code to check that it works correctly uses documented features only without taking so much time as your code. So the calculation using Paul's code might still fail, but at least you'll get an error message instead of erroneous results in the database.

    I could imagine some very specific circumstances where I myself might choose to implement this code rather than mine (for speed) or the cursor (if simplicity of future maintenance is more important than speed). But the circumstances would have to be very specific, and I'd add large blocks of comments to point out the potential problems for future maintenance! šŸ™‚


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Paul White NZ (9/3/2010)


    Jeff Moden (9/3/2010)


    If a SEEK occurs, you could get an out of order update and that's why I didn't recommend INDEX(1). INDEX(0) forces the clustered index scan that's needed to guarantee the order of the update.

    Seeks are always ordered šŸ˜‰ It's the scan that might not be. Honest!

    Either I am having a major senior moment right now, or you are.

    Seeks are always for single rows. Ordered or unordered does not apply to single rows, and hence to seeks.

    Scans can be both ordered or unordered. When it's ordered, the engine HAS to follow the pointer chain to process the index in "logical" order. When unordered, the engine has the choice of either following the pointer chain, or using the IAM to access the index pages in whatever order they happen to be allocated in the data file. And if I recall correctly, the IAM-driven scan will only actually take place if the optimizer requests and unordered scan AND the query uses either a table lock or no locks (dirty read).

    EDIT:

    After reading this:

    Paul White NZ (9/3/2010)


    Maybe I should state clearly what I mean by seeks always being ordered: an index seek always has the Ordered:True attribute. A seek navigates down the b+ tree and performs a partial scan (forward or backward) from that point by following the doubly-linked list at the leaf level. That's what causes them to be ordered: the linked list connects adjacent pages in logical key order.

    You are absolutely right. I thought that this method of finding a starting point and then scanning a range of rows was called a scan, but I just double checked in SSMS and the execution plan clearly labels this operator as a seek.

    The senior moment was mine! šŸ˜€


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • SwePeso (9/3/2010)


    A correct implementation of "Ordered CTE Update" would look something like this (no clustered index).

    (...)

    SELECTTOP(2147483647)

    (...)

    ORDER BYAccountID,

    Date,

    TransactionDetailID

    Maybe it's not really important since the quirk update itself relies on undocumented behaviour, but the SELECT TOP ... ORDER BY trick is undocumented as well.

    In SQL 2000, you could use SELECT TOP 100 PERCENT ... ORDER BY in a view or subquery, and expect the results to be sorted without explicit order by on the outermost query. This was not documented - the documentation implied that the ORDER BY in this case only governed the interpretation of the TOP clause.

    The undocumented behaviour changed in SQL 2005, when the optimizer got smart enough that the TOP 100 PERCENT will always pass all rows, making it a no-operation - soo the optimizer simply thres both the TOP and the accompanying ORDER BY in the bin. That broke a lot of code that depended on this undocumented trick.

    And then, people found that they could restore the behaviour by using TOP (some large number) instead of TOP 100 PERCENT. The optimizer does not yet have the logic to compare the number after the TOP with the actual number of rows in the table. But who know - that logic might be added in the next service pack, and then this trick, too, will start to fail.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Hugo Kornelis (9/4/2010)


    My examples on page 11 in this topic still produce incorrect results.

    If we add the safety check, things work out nicely:

    DECLARE @Sequence INTEGER = 0,

    @Counter INTEGER = 0;

    -- Quirky update with safety check

    UPDATE SafeTable

    SET @Counter = AccountRunningCount = @Counter + 1,

    @Sequence = CASE WHEN Sequence = @Sequence + 1 THEN Sequence ELSE 1/0 END

    FROM (

    SELECT Sequence = ROW_NUMBER() OVER (ORDER BY TD.AccountID, TD.[Date], TD.TransactionDetailID),

    TD.AccountRunningCount

    FROM dbo.TransactionDetail TD

    WHERE TD.TransactionDetailID BETWEEN 120000 AND 120010

    ) SafeTable;

    -- Results

    SELECT *

    FROM dbo.TransactionDetail

    WHERE TransactionDetailID BETWEEN 120000 AND 120010

    ORDER BY

    AccountID,

    Date,

    TransactionDetailID;

    Query Plan:

    Correct results and an optimal query plan! šŸ™‚

    Paul

  • Hugo Kornelis (9/4/2010)


    I thought that this method of finding a starting point and then scanning a range of rows was called a scan, but I just double checked in SSMS and the execution plan clearly labels this operator as a seek.

    The terminology is annoying imprecise.

    Is it a "seek plus range scan", a "partial scan", a "range seek" or simply a "seek"? I remember Gail Shaw and I had a debate about this once, and agreed to disagree šŸ˜‰

    For my money, the simple term "seek" is not enough. I'm normally pretty careful about making the distinction, so apologies to you and Jeff for the ambiguity.

  • Hugo Kornelis (9/4/2010)


    Less than two days, and I already have a lot of catching up to do. This topic sure if lively! šŸ˜‰

    Jeff Moden (9/3/2010)


    I usually value your opinion but you have to stop hammering at me personally.

    Let me respond to this first. Jeff, please accept my sincerest apologies for coming across as hammering at you personally. I know I can disagree very violently, but I always try to attack only the message, never the messenger. The quote above indicates that I failed.

    I have a very deep respect for you, and for your enormous contributions to the SQLServerCentral community. The fact that I utterly disagree with your opinion on this particular subject does not change that respect one bit.

    But (yes, there is the but, and it's a big one) I will NOT stop hammering at the quirky update method you advocate in this article.

    The technique works,

    No, it does not. My examples on page 11 in this topic still produce incorrect results. (Maybe you addressed those in one of the post I've yet to read).

    I've added code to check that it works

    True. But the code to check runs slower than the fully documented algorithm I posted and you then optimized further. So why would you then still choose quirky update + check?

    (and Paul White has recently done a much better job)

    Yes, I've seen it and it's awesome. It still uses undocumented features for the quirky update, but the code to check that it works correctly uses documented features only without taking so much time as your code. So the calculation using Paul's code might still fail, but at least you'll get an error message instead of erroneous results in the database.

    I could imagine some very specific circumstances where I myself might choose to implement this code rather than mine (for speed) or the cursor (if simplicity of future maintenance is more important than speed). But the circumstances would have to be very specific, and I'd add large blocks of comments to point out the potential problems for future maintenance! šŸ™‚

    I'll get back to the "meat" of this thread soon but I wanted to address this particular post separately.

    Hugo, it takes one awesome professional to post something like that above. Even if we vehemently disagree (and you'll be surprised that we disagree less than you think) on something, the tone of my future conversations with you will be markedly different. You're just trying to do the same thing I am... help people. Thank you for your courtesy and good will.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • As a bit of a tease and a bit of an explanation as to why my response to all of this is taking so long is {drum roll please and pardon the run on sentence}, Wayne S. finally caused a true "after the fact" break in the method but Paul White was originally correct in what I should not have changed because it prevents the break and Hugo was partially wrong but was also essentially correct for the same reason as Wayne Sā€™s break. Whew! šŸ˜› Again, both breaks (and similar) are preventable by what Paul said I shouldn't have removed for speed.

    I'm checking performance on a couple of things that Hugo and Paul came up with and making sure of others. It'll take me some time but I'll be back.

    Thank everyone who actually spent some constructive time on this especially (but not limited to) Hugo and Paul.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (9/4/2010)


    Hugo, it takes one awesome professional to post something like that above. Even if we vehemently disagree (and you'll be surprised that we disagree less than you think) on something...

    I could not agree more.

Viewing 15 posts - 181 through 195 (of 307 total)

You must be logged in to reply to this topic. Login to reply