Solving the Running Total and Ordinal Rank Problems (Rewritten)

  • Seggerman-675349 (3/14/2014)


    Jeff - on behalf of my client 😛

    see http://www.sqlservercentral.com/Forums/Topic1550386-392-1.aspx

    I took a stored procedure, modified (but not enough) which had used cursors in Oracle that ran for 88 minutes 2 nights ago

    last night it ran in 4 minutes

    you are so right RBAR is truly evil

    My apologies. I totally missed this post. Thank you so much for the feedback concerning the "other" world.

    Do be advised that Oracle has had the LEAD/LAG functionality, which just came around in 2012 for SQL Server, for quite some time. I probably won't ever test it in Oracle but you should consider using that functionality there.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • WayneS (5/28/2014)


    autoexcrement (5/28/2014)


    Has anyone done any significant testing of the Quirky Update method in SQL Server 2008, 2012, or 2014? Does it work in those versions too?

    It's easy enough to test out a few million rows myself, but I trust the gurus here a lot more than I trust myself to make any authoritative determination on this.

    http://blog.waynesheffield.com/wayne/archive/2011/08/running-totals-in-denali-ctp3/[/url]

    Not just because of the obvious reasons, but I love that post. It's a great comparison of the "new" functionality available in 2012 and clearly demonstrates why one method in 2012 should be used over the other for such things. I've said before but I'll say it again, well done Wayne!

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • corey lawson (9/26/2014)


    I came across the Quirky Update initially just last week in one of Phil Factor's "Speed Phreak" challenges, where in the lengthy commentary regarding posted candidate solutions, someone did compare it against the upgraded partition functions in SQL Server 2012, and it's still faster. I wasn't looking for it in particular, but saw "quirky update", and had to find out more!

    It's hard to call this undocumented with all its "the evil" connotations people give this, when the syntax for this use of SET is right there in BOL (but not really explained well/in depth). This is a vestige from the mother code from Sybase, where it's probably been a standard, known speed-up for quite some time...

    (yes, the syntax for it still in BOL for the UPDATE statement for SQL Server 2014, but no examples)

    Awesome feedback, Corey. Thank you.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi Jeff

    Great article. Came across it when searching for ways to make updates proceed in a sequential order, similar to the running total problem. I have an observation that is contrary to what you stated in the article "...CLUSTERED INDEX MUST BE PRESENT IN THE CORRECT ORDER: ...". I would love it to proceed in the order of the clustered index for my use case but doesn't look like we can. On my test box on MS SQL2012 I find it is running in exactly opposite order to the clustered index & I suspect in random order. Here is a test script and the result of the test script.

    set nocount on

    go

    exec sp_helpindex Party

    go

    drop trigger upd_trig_Party

    go

    create trigger upd_trig_Party on Party

    for update

    as

    select 'within update trigger', Party_ID from inserted

    go

    update Party

    set IsDeleted = 1

    where Party_ID in (1,2)

    --output of above query

    index_name index_description index_keys

    -------------------------------------------------------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

    pk_Party_Party_ID clustered, unique, primary key located on PRIMARY Party_ID

    Party_ID

    --------------------- -----------

    within update trigger 2

    within update trigger 1

    If the updates in general proceed in the order of the clustered index, my understanding is that the output should have been 1 followed by 2. Possibly I misunderstood your statement or took it out of context. Appreciate if you can clarify. I have verified that the exectuion plan says "Clustered Index Update" on the left most side

    Regards

    Lakshmi

  • lbnayak (10/6/2016)


    On my test box on MS SQL2012 I find it is running in exactly opposite order to the clustered index & I suspect in random order. Here is a test script and the result of the test script.

    First; There are no guarantees as to the order in which rows are internally processed in an update that affects multiple rows. That is why I rercommend avoiding the quirky update trick, If you are running SQL Server 2012 or higher, then using SUM(xxx) OVER (ORDER BY xxx ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) is the supported and guaranteed method to get running totals. If you are on an older version and cannot upgrade, then I would still try to avoid this method. If you really feel that you need to use this method, then at least add the failsafe trick that is suggested earlier in this discussion so that you'll get an error rather than incorrect results if the processing order changes unexpectedly.

    Second: Your repro proves nothing. The trigger fires once per statement. The exact order is that first the UPDATE executes (prrocessing all rows, in whatever order SQL Server decides), then the trigger executes, and finally the transaction is committed. The order of rows you see reflects the order in which SQL Server decides to scan the rows from the inserted pseudo-table. This order is unrelated to the order in which rows were updated.

    (EDIT: Oh, and can you please edit your post and remove that one-million-dashes line? It messes up the forum's dynamic fomatting)


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • If you're using SQL Sever 2012 or above, you shouldn't be using the "Quirky Update" method... period. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hugo Kornelis (10/6/2016)


    ... then at least add the failsafe trick that is suggested earlier in this discussion so that you'll get an error rather than incorrect results if the processing order changes unexpectedly.

    +1000 to that.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi Jeff,

    Thank you for your response. I do understand that I shouldn't be using this method if I am using MS SQL2012. The issue here is that my customer environment ranges from MS SQL2008R2 thru MS SQL2014. MS SQL2012 is the preferred environment but the same query has to run on all these versions. I should have made that clear.

    From the example that I provided, does it seem like MS SQL2012 does not follow the clustered index for doing updates ?

    Regards

  • Hi Jeff

    Regarding using Hugo Cornelis mechansin "... then at least add the failsafe trick that is suggested earlier in this discussion so that you'll get an error rather than incorrect results if the processing order changes unexpectedly."

    This mechanism is going be deployed on an OLTP system:crying:. The concept is to chain records together in a linked list fashion with hashvalue from 'a previous record' stored in the 'next record' as a means to detect deletes:crazy:. the first ever record and the last record would be marked with special hashes. Since the hash values are going to be generated by triggers , the notion of a 'previous record' / 'next record' is forcing me to update the records in an ordered fashion. Throwing a 'divide by zero' or anyother kind of exception is going to look very odd. And morever in the case of MS SQL2012, looks like it is not a workable solution. Heck!! You can argue that this is a unworkable idea in the first place. It serializes all inserts and is more trouble than is worth but this is the option that has been chosen and I am guy who is tasked with implementing it.

    Regards

  • To be honest and IMHO, a simple auditing system would serve you and the task of proof much better than doing a linked list. You could even setup a trigger to prevent updates and deletes. If that doesn't quite fit the bill, a simple ROW_NUMBER() OVER with the proper offset would do the trick. You could even do a simple joined offset to capture what the previous (say) IDENTITY value was.

    And, you don't need the Quirky Update to do any of it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • lbnayak (10/6/2016)


    The concept is to chain records together in a linked list fashion with hashvalue from 'a previous record' stored in the 'next record' as a means to detect deletes:crazy:. the first ever record and the last record would be marked with special hashes. Since the hash values are going to be generated by triggers , the notion of a 'previous record' / 'next record' is forcing me to update the records in an ordered fashion.

    I fully agree with Jeff here. Since you apparently already need to use triggers, why not simply log the deletes directly from those triggers?

    Alternatively, if your goal is to find gaps in a range of key values, then google "Itzik Ben-Gan gaps and island solutions". He has a very efficient query pattern forr this that works on SQL Serverr 2005 and up.

    Throwing a 'divide by zero' or anyother kind of exception is going to look very odd.

    In my book even that would beat deriving and storing potentially incorrect data. But there's no reason to throw any error. You could wrap the fast but unreliable method in a TRY block and then code the slower altenative in the CATCH.

    But hopefully this is moot and you can use one of the alternatives suggested by Jeff or me.

    You can argue that this is a unworkable idea in the first place. It serializes all inserts and is more trouble than is worth but this is the option that has been chosen and I am guy who is tasked with implementing it.

    When I am tasked to implement a bad idea, my first step is always to challenge it. I found that if I present my case right and if I can also present a better alternative, I am often able to convince the higher-ups in the food chain.

    In the case of the more stubborn bosses, I either walk away (if I can afford it); or I document my objections, do my job as good as possible, and perpare to do the "told you so" dance when smelly stuff starts hitting the fan.:cool:


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Hugo Kornelis (10/7/2016)


    You could wrap the fast but unreliable method ...

    Heh... maybe for you but it's never been unreliable for me. You must be doing something wrong. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi Hugo/Jeff

    Thank you for sharing your thoughts on this. I really appreciate you guys going above & beyond trying to understand the problem and providing a resolution. I am not in favor of our proposed 'linked list' solution and your proposal to use the island-and-gaps solution seems to be a viable option to detect deletes in the system. I will take it to the group, of course mentioning the source of the advice, and I think I will be able to convince them

    Regards

  • Jeff Moden (10/7/2016)


    Hugo Kornelis (10/7/2016)


    You could wrap the fast but unreliable method ...

    Heh... maybe for you but it's never been unreliable for me. You must be doing something wrong. 😉

    Are we really going to have that discussion again?

    Let's just say that you and I have different definitions of "reliable".


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Hugo Kornelis (10/7/2016)


    Jeff Moden (10/7/2016)


    Hugo Kornelis (10/7/2016)


    You could wrap the fast but unreliable method ...

    Heh... maybe for you but it's never been unreliable for me. You must be doing something wrong. 😉

    Are we really going to have that discussion again?

    Let's just say that you and I have different definitions of "reliable".

    Please, don't. Thank you.

Viewing 15 posts - 286 through 300 (of 307 total)

You must be logged in to reply to this topic. Login to reply