Solving the "Running Total" & "Ordinal Rank" Problems in SS 2k/2k5

  • Comments posted to this topic are about the item

    [font="Arial Black"]

    Solving the "Running Total" & "Ordinal Rank" Problems in SS 2k/2k5

    [/font][/url]

    The new article was published on 10 Nov 2009 and may be found at the following URL...

    http://www.sqlservercentral.com/articles/T-SQL/68467/

    The new article is basically a rewrite of the old article with some fixes and workarounds for some of the things we've encountered in the last year. A second article is in the works for some of the many uses for the "Quirky Update"... there was just too much in the first article to conveniently fit.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".
    "Dear Lord... I'm a DBA so please give me patience because, if you give me strength, I'm going to need bail money too!"

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Watch out with those clustered indices...the order is NOT guaranteed to be the index order.

    There are a few cases - out of disk space is one I experienced, parallel query execution et al - that cause the leaf data to be out of order.

    See also http://blog.sqlauthority.com/2007/09/24/sql-server-order-of-result-set-of-select-statement-on-clustered-indexed-table-when-order-by-is-not-used/


    Regards
    Andy Davies

  • Yep... thanks for the feedback. I've seen that, too. But only on SELECTs... and only on non-clustered indexes. I've never seen it fail on UPDATEs on CLUSTERED index hints. That's why I did all the proofing in the article with the Merry-Go-Round index.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".
    "Dear Lord... I'm a DBA so please give me patience because, if you give me strength, I'm going to need bail money too!"

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Yes, I'd definitely agree that the 'Quirky update' approach is the right one. I've always used an 'order by' but never tried clustered index/hints to force the order. It seems pretty robust too. I've used it in Sybase and, I reckon, all versions of SQL Server in the past.

    The strange thing about the 'Quirky Update' technique is that, one has experimented with it, one discovers all sorts of problems that it solves. Directory trees, Book indices (Index, Use of, Updates), financial reporting. Whoever put the feature into Sybase was very far-sighted.

    Best wishes,
    Phil Factor
    Simple Talk

  • Hello Jeff,

    very interesting article.

    Somehow it throws all my beliefs into the trash bin:

    1. The update method with variables is not explicitly supported by Microsoft

    2. The clustered index does not guarantee ordered results

    3. Order By in derived tables does not guarantee ordered results.

    Your "evidence" does not really convince me. It could be just a lucky day (or a lot of them) ;).

    If anyone would have some "official" documentation on these 3 topics, that would be great!

    Btw: Did anyone try the "Order By" Solution?

    It returns wrong results for me, but maybe did not copy the code correctly (somehow the linefeeds

    disappear with c/p).

    Best Regards,

    Chris Büttner

  • I think Andy has a point there. On multiprocessor machine SQL Server might use more threads to scan the index and then merge the result sets in one data stream before executing the update, and this could mess up things. I'd add a maxdop 1 option just to be sure. Even with one thread, I'm not sure that SQL Server would start scanning from the beginning of the index if some data pages are already in memory (perhaps because someone has just finished a full scan of the clustered index). We should use an order by to be sure, but it's not allowed on updates. Here's a trick I use in situations like this: I put a 'greater than' condition on the clustered index column(s) (it works on single column keys, it should work on multicolumn key if we use the first column of the index), where the comparison value is less than any value of the index column (say min(IndexColumnName)-1). This forces the engine to traverse the index looking for the first record matching the condition to start the scan from there. You could also avoid the index hint, unless that column also belongs to other indexes.

    Interesting, these are the execution plans without and with the ordering condition AccountID>0:

    UPDATE dbo.JBMTest

    SET @PrevRunBal = RunBal = @PrevRunBal + Amount,

    @PrevAcctID = AccountID

    FROM dbo.JBMTest WITH (INDEX(IX_JBMTest_AccountID_Date),TABLOCKX)

    OPTION(MAXDOP 1)

    StmtText -------------------------------------------------------------------------------------------------------------------------------------------------------------

    |--Clustered Index Update(OBJECT: ([Northwind].[dbo].[JBMTest].[IX_JBMTest_AccountID_Date]), SET: ([@PrevAcctID]=[Expr1005], [JBMTest].[RunBal]=[Expr1004]))

    |--Compute Scalar(DEFINE: ([Expr1004]=[@PrevRunBal]=[@PrevRunBal]+[JBMTest].[Amount], [Expr1005]=[@PrevAcctID]=[JBMTest].[AccountID]))

    |--Top(ROWCOUNT est 0)

    |--Clustered Index Scan(OBJECT: ([Northwind].[dbo].[JBMTest].[IX_JBMTest_AccountID_Date]))

    UPDATE dbo.JBMTest

    SET @PrevRunBal = RunBal = @PrevRunBal + Amount,

    @PrevAcctID = AccountID

    FROM dbo.JBMTest WITH (INDEX(IX_JBMTest_AccountID_Date),TABLOCKX)

    WHERE AccountID>0

    OPTION(MAXDOP 1)

    StmtText --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

    |--Clustered Index Update(OBJECT: ([Northwind].[dbo].[JBMTest].[IX_JBMTest_AccountID_Date]), SET: ([@PrevAcctID]=[Expr1005], [JBMTest].[RunBal]=[Expr1004]), DEFINE: ([Expr1004]=[@PrevRunBal]=[@PrevRunBal]+[JBMTest].[Amount], [Expr1005]=[@PrevAcctID]=[JBMTest].[AccountID]), WHERE: ([JBMTest].[AccountID] > 0))

    Quite different, but with the same results in terms of I/O statistics:

    Table 'JBMTest'. Scan count 1, logical reads 10186, physical reads 0, read-ahead reads 50.

    Ok, time to go back to work.

    bye


    Salvor

  • Very good article, nice testcode. :w00t::cool:

    There may be some testing needed regarding parallelism,

    maybe for now just add a OPTION(MAXDOP 1).

    The only remarks I would add to the conclusion is :

    - Keep in mind that by design there is no order in a setbased approach

    - For now it works and it is the fasted propriatary solution for the problem.

    Remeber the views with order by clause that suddenly nolonger

    externalized the order in SQL2005, but worked so well in SQL2000.

    Thank you for sharing the knowledge.:smooooth:

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • I have to agree with the naysayers unfortunately. It's great at the moment and really does seem to work (I haven't seen it fail yet either). I was always suspicious and slightly annoyed when people used the select top 100 percent with an order by in a view - dirty hack in my opinion. This could go the same way. Having said that, perhaps we could get someone from the MS Query Engine team to elaborate on

    * The update syntax of @variable = col = newValue

    * What happens if you have multiple of these in the same update. eg

    @variable = col = col + @variable

    col2 = col + @variable

    Does col2 effectively get col + col + @variable or just col + @variable?

    * Forced ordering.

    I'm going to run some quick tests on my sql 2k5 installation to see if there are any quirks (I'd be very surprised if I found something that was overlooked by the likes of Jeff though 😛 )

  • Christian,

    The use of variables in Update statements has always been supported by Microsoft. The full syntax for the original Transact SQL Update statement is...

    [font="Courier New"]

    update [[database.]owner.]{table_name | view_name}

    set [[[database.]owner.]{table_name. | view_name.}]

    column_name1 =

    {expression1 | null | (select_statement)} |

    variable_name1 =

    {expression1 | null | (select_statement)}

    [, column_name2 = {expression2 | null |

    (select_statement)}]... |

    variable_name2 = {expression1 | null | (select_statement)}

    [from [[database.]owner.] {table_name | view_name}

    [, [[database.]owner.] {table_name |

    view_name}]]...

    [where search_conditions]

    [/font]

    and the classic example that Sybase gives for assigning variables in an update statement is....

    [font="Courier New"]

    DECLARE @price money

    SELECT @price = 0

    UPDATE titles

        SET total_sales = total_sales + 1,

        @price = price

        WHERE title_id = 'BU1032'

    SELECT @price, total_sales

        FROM titles

        WHERE title_id = 'BU1032'

                              total_sales

    ------------------------ -----------

                        19.99        4096

    [/font]

    It all seems to be well documented, so I reckon it is safe to use!

    Best wishes,
    Phil Factor
    Simple Talk

  • Hi Phil,

    you are correct, it obviously is supported.

    I misunderstood a section in a book [1] that covered this type of statement.

    In this book it was only mentioned that the variable thing is "far from standard",

    but obviously not "unsupported".

    Thanks for your hint!

    [1] Inside Microsoft SQL Server 2005: T-SQL Querying

    Best Regards,

    Chris Büttner

  • Very interesting article.

    I want to add my voice to those urging caution with regard to relying on order being forced by a clustered index. According to the SQL standard, the only way to enforce order is with ORDER BY.

    The problem with relying on a clustered index is that it isn't supported: no matter how many times you have tested it, it is not guaranteed to work, and data corruption introduced by a few pages not being in the order in which you expect them is likely to be both subtle and really difficult to fix. I can only imagine trying to diagnose the problem... it makes me queasy just thinking about it.

    I will consider using the ORDER BY method, which seems to me to be innovative and safe code.

  • Phil, Chris,

    Indeed no discussion regarding the use of target variables.

    The point is, will its content be correct during the setbased work

    if the variable is used more then once in the statement.

    Suppose you have more than one price ....

    (cfr result data for select from views containing an order by ( before the 'fix' and trace) )

    For now it seems to work with correct results, but ... you have to

    figure out how to force the process order.

    I'm glad Jeff cleared it out in this article :w00t:

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • Did you test using the Clustered Index with the ORDER BY subquery? It seems to me that the ORDER BY Would enforce the order, but since you are using a Clustered Index it wouldn't cause much overhead unless you were going to get bit by a disordered resultset anyway.

    --

    JimFive

  • Since you installed SQL 2K5, have you looked into the "OVER" clause (BOL reference: ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/tsqlref9/html/ddcef3a6-0341-43e0-ae73-630484b7b398.htm) and the Ranking Functions (BOL reference: ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/tsqlref9/html/e7f917ba-bf4a-4fe0-b342-a91bcf88a71b.htm)?

    I am curious how they perform against your test. I have used the Ranking Functions, but only on result sets of a few thousand rows and it has performed well for me.

    Darrin

  • good one - no doubt about it. I hadn't yet seen the merry-go-round in action, but it does make sense now.

    From what I've seem, the OVER syntax, and the ranking functions give you no advantage whatsoever on anything except for counts in this particular case. It's in my mind one of the failings/shortcoming of that syntax, since with all it does, you STILL have to do triangle joins or this kind of technique to get any kind of running aggregate other than a count (which you get implicitly through the ROW_NUMBER() function). I am pretty sure that's part of the OVER syntax from Ansi that didn't make the 2005 implementation (as I recall, that OTHER database product does running aggregates using its OVER syntax)..

    And - just like you'd expect - using a CTE to try to force the order does nothing for you. It operates the same as a view does (inner ORDER BY is ignored).

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Viewing 15 posts - 1 through 15 (of 251 total)

You must be logged in to reply to this topic. Login to reply