Solving the "Running Total" & "Ordinal Rank" Problems in SS 2k/2k5

  • Hi Jeff

    I have a doubt abt the index selection when i ran your code

    in the article ' Running total.......'

    I created the table dbo.JBMTest , added the clustered index on rownum column &

    created the nonclustered index on AccountID, Date columns.

    I populated the table with million rows with your code.

    When i specify an "order by rownum clause" or "Where rownum = 4 " the query uses the clustered index whihc i understand.

    when i run "Select RowNum from dbo.JBMTest" and view the actual execution plan it shows that the nonclustered index has been used. Should'nt it be using the clustered index since that is on the rownum column?

    "Keep Trying"

  • Matt Miller (2/5/2008)


    Jeff Moden (2/5/2008)


    It IS a cool solution... Wasn't so good in the performance department on my machine, though... it took 29 minutes to do the million row example on my machine... correlated subquery might do just as well, in this particular case. There're only 200 rows per account. CROSS APPLY gave it all the characteristics of a triangular join.

    Remember - CROSS APPLY = Correlated subquery.... and if my Xquery is good enough to decipher that - it IS a triangular join being created...

    Heh... Yeah... now that you mention it, I remember us going through this once before on another thread.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Chirag (2/6/2008)


    Hi Jeff

    I have a doubt abt the index selection when i ran your code

    in the article ' Running total.......'

    I created the table dbo.JBMTest , added the clustered index on rownum column &

    created the nonclustered index on AccountID, Date columns.

    I populated the table with million rows with your code.

    When i specify an "order by rownum clause" or "Where rownum = 4 " the query uses the clustered index whihc i understand.

    when i run "Select RowNum from dbo.JBMTest" and view the actual execution plan it shows that the nonclustered index has been used. Should'nt it be using the clustered index since that is on the rownum column?

    Now you know why, when creating indexes, that you should first consider join columns, then criteria, then items in the Select list, and in that order, too.

    It turns out that using the clustered index is the better choice in this case (slightly less cpu time) even though it doesn't contain the desired column... that's because the clustered index is buried in the data that does contain the desired column.

    Even when the Rownum column is identified as a clustered primary key, doing the select on the Rownum column with no criteria still has the index with the other columns being scanned and is still a tiny bit faster than if you force the clustered to be used with an index hint. The optimizer does a pretty good job of picking the right index to use.

    Also shows how important it is pick the correct columns for the clustered index.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Radovan Jablonovsky (2/6/2008)


    Andy is right. We have setup MSSQL2000 (4 CPU) with parallel query execution and your example/code does not work. I tried it 3 times but always with messy results.

    Example of result:

    RowNum AccountID Amount Date RunBal GrpBal RunCnt GrpCnt

    ----------- ----------- --------------------- ------------------------------------------------------ --------------------- --------------------- ----------- -----------

    451167 1 -38.3500 2000-06-23 09:25:07.967 -1430.6200 -72.5800 3920 15

    905251 1 -92.9700 2001-05-08 19:28:55.757 -1353.9100 4.1300 3909 4

    738143 1 -3.3700 2002-03-05 06:33:47.590 -1375.4400 -17.4000 3916 11

    5200 1 61.7700 2002-10-12 05:48:14.943 -1260.9400 97.1000 3908 3

    62482 1 42.2400 2003-02-16 08:19:53.653 -1392.2700 -34.2300 3919 14

    146296 1 67.6300 2003-05-07 08:18:23.690 -1286.2800 71.7600 3910 5

    With regards,

    Radovan Jablonovsky

    I'm also not quite sure how you managed to make that happen, but that's why you will see folks force it to NOT use parallel execution (OPTION (MAXDOP 1)). It 's efficient enough that it will blow the doors off of the triangular join scenario even when only using one proc.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Replay for Jeff,

    Server was setup to use 2 CPU for parallelism and query execution plan showed parallel execution was used in sorting index. See attached pictures.

    Sincerely,

    Radovan Jablonovsky

  • Radovan Jablonovsky (2/7/2008)


    Replay for Jeff,

    Server was setup to use 2 CPU for parallelism and query execution plan showed parallel execution was used in sorting index. See attached pictures.

    Sincerely,

    Radovan Jablonovsky

    Amazing... I've got the same exact settings on an 8 CPU box except for the number of processors that can be used in a parallel process (4 instead of 2), and I can't get it to spawn parallelism.

    Matt is correct, however... the "trick" update method runs so fast anyway, that you really don't need much else for speed... I'd used the MAXDOP 1 option, just like he said.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Philippe Cand (2/6/2008)


    Hello,

    This article is getting unprecedented proportions.

    How could we have MS SQL engine development database guys participating in the discussion If they are not yet already doing so or watching from the sidelines?

    Very valuable and arguable content in here.:P

    Outstanding... thanks for the comment, Philippe.

    Agreed... wouldn't mind it a bit of the MS SQL engine guys took a look.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (2/7/2008)


    Radovan Jablonovsky (2/7/2008)


    Replay for Jeff,

    Server was setup to use 2 CPU for parallelism and query execution plan showed parallel execution was used in sorting index. See attached pictures.

    Sincerely,

    Radovan Jablonovsky

    Amazing... I've got the same exact settings on an 8 CPU box except for the number of processors that can be used in a parallel process (4 instead of 2), and I can't get it to spawn parallelism.

    Matt is correct, however... the "trick" update method runs so fast anyway, that you really don't need much else for speed... I'd used the MAXDOP 1 option, just like he said.

    Maybe I'm way off.... but how many files are in the filegroup where the table and/or index resides ?

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • Matt Miller (2/5/2008)


    Jeff Moden (2/5/2008)


    It IS a cool solution... Wasn't so good in the performance department on my machine, though... it took 29 minutes to do the million row example on my machine... correlated subquery might do just as well, in this particular case. There're only 200 rows per account. CROSS APPLY gave it all the characteristics of a triangular join.

    Remember - CROSS APPLY = Correlated subquery.... and if my Xquery is good enough to decipher that - it IS a triangular join being created...

    For what it's worth - this one hasn't been sitting well with me. I spent most of the night last night hammering on various XPath and Xquery techniques, only to find that none of the useful ones were implemented into the XML support in 2005.... And - from what I can tell - Mark's solution for calculating the solution seems to be the only one in the current XML version. I DID get is to run faster (20 minutes instead of the 30 or so) once I dumped the CTE, pushed the stuff into a new table which got indexed. Also - it doesn't look so bad when you figure out it's taking 4.5 minutes just to make the XML data to begin with.

    But - Let's recap the current crippled (my assessment only, but hey - I'm posting here) state XML has been left in (at least as far as I've managed to find):

    - it WILL preserve the order of the data, but there's no decent way to leverage that.

    - there IS a position() predicate that allows you to know where you are - but you can't pass it a dynamic parameter.

    - most or all of the positional XPath predicates allowing you to navigate between nodes (following, preceding,preceding-sibling, following-sibling) aren't implemented.

    - you CAN declare xquery variables, but it seems that you can't leverage them as an accumulator (increment/decrement as you navigate through nodes).

    - and the Proprietary handling doesn't seem to extend into XML data. Meaning - trying to rewrite the running totals solution here, but accessing XMLdata nodes falls apart, because it looks to me that (as opposed to what happens with a conventional table), it doesn't "reevaluate" as it moves from one node to the next.

    Hmm...Frustrating. (I mean - what's the point of serializing data if you can't ACCESS IT SERIALLY?).

    Figures - they finally give us something where you CAN count on the ordering of the data, and they clip its wings.....

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • I absolutely agree, Matt... same thing goes with the SUM() OVER... they (MS) don't allow the necessary ORDER BY to calculate a running total using SUM() OVER like they do in other databases.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • 1) For the TSQL limitations remember that there are a bunch of MVPs pushing Microsoft to fully implement the ANSI standards that will allow set-based solutions to this and a number of other problems. Hopefully some/all of these will come out with 2008.

    2) there are also improvements to XML in 2008, although I don't know enough about them to know if they will be helpful with this particular problem. The two I note in the document I got from the 2008 website (What’s New for XML in SQL Server 2008 published August 2007) is the use of LET in XQUERY and the use of variables in the INSERT clause so you can insert XML variables into existing XML structures.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Hi,

    If it is possible in your production environment to dynamically change number of processors, then it is OK. In our environment it is more conservative, therefore I am looking for solution which is independent from server setup, type of storage, number of files in tablespace, etc ... RDBMS database/server is more about consistency.

    Sincerely,

    Radovan Jablonovsky

  • TheSQLGuru (2/8/2008)


    1) For the TSQL limitations remember that there are a bunch of MVPs pushing Microsoft to fully implement the ANSI standards that will allow set-based solutions to this and a number of other problems. Hopefully some/all of these will come out with 2008.

    2) there are also improvements to XML in 2008, although I don't know enough about them to know if they will be helpful with this particular problem. The two I note in the document I got from the 2008 website (What’s New for XML in SQL Server 2008 published August 2007) is the use of LET in XQUERY and the use of variables in the INSERT clause so you can insert XML variables into existing XML structures.

    Thanks for the heads up. It's just one of those little annoying gotchas which happens at times.

    <RANT>

    It just peeves me to no end to see half-hearted implementations of various established languages like XQUERY, compounded with less than half-hearted implementations of the documentation as to what does or doesn't work. That's actually one area where BOL is still falling on its face. (I ran into several circular loops in there, specifically having to do with the creating of a primary XML index, where you have literally two documents pointing at each other for the answer, and NEITHER HAS THE ANSWER).

    </RANT>

    On second thought - I must be in need of a nap. I'm acting cranky today...:) I guess I was up way past my bedtime.

    By the way - I'm sorry I won't be able to catch your presentation(s) next weekend. But something else overruled that so I won't even be in the right state that weekend. I will have to see if I can wrangle some notes from the other attendees.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • It just peeves me to no end to see half-hearted implementations of various established languages like XQUERY, compounded with less than half-hearted implementations of the documentation as to what does or doesn't work. That's actually one area where BOL is still falling on its face. (I ran into several circular loops in there, specifically having to do with the creating of a primary XML index, where you have literally two documents pointing at each other for the answer, and NEITHER HAS THE ANSWER).

    By the way - I'm sorry I won't be able to catch your presentation(s) next weekend. But something else overruled that so I won't even be in the right state that weekend. I will have to see if I can wrangle some notes from the other attendees.

    Well, I am one of those who actually wish they hadn't wasted time and effort bothering to put anything XML in the relational engine. CLR too for that matter. Oh, and Notification Services (thankfully that is being dropped in 2008). The list goes on. I would much rather have full implementation of the ANSI SQL specification, debugging baked in, MUCH fewer bugs, better tested patch releases, etc, etc.

    Are you speaking of my presentations? I have quite a few events coming up, including the Raleigh and Alabama Code Camps. Suppose you must be talking about Raleigh, which is next Saturday. If you can make it down to Huntsville, AL the following Saturday you can see the same two presentations. If not, drop me an email and I will send you the slide decks and code files once I get them completed.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (2/8/2008)


    Are you speaking of my presentations? I have quite a few events coming up, including the Raleigh and Alabama Code Camps. Suppose you must be talking about Raleigh, which is next Saturday. If you can make it down to Huntsville, AL the following Saturday you can see the same two presentations. If not, drop me an email and I will send you the slide decks and code files once I get them completed.

    Oh - good to know about the Huntsville thing. I will see if I can swing that - haven't had a chance to work out my schedule that far out. If not - I will definitely take you up on your offer.

    And - You've got my vote on some of the features. I would have been quite happy with half or less of the "new" features if they had just fully implemented them. A bunch of half measures just leaves you...well - here, cobbling half-solutions together.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Viewing 15 posts - 91 through 105 (of 250 total)

You must be logged in to reply to this topic. Login to reply