Solving the Running Total and Ordinal Rank Problems (Rewritten)

Question

Post reply

Solving the Running Total and Ordinal Rank Problems (Rewritten)

Jeff Moden

SSC Guru

Points: 1003859
More actions
November 10, 2009 at 12:00 am

#210950

Comments posted to this topic are about the item Solving the Running Total and Ordinal Rank Problems (Rewritten)
--Jeff Moden
RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
Change is inevitable... Change for the better is not.
Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 307 total)

You must be logged in to reply to this topic. Login to reply

Allister Reid SSCrazy Points: 2666 More actions · Answer 1

Hi Jeff,

many thanks for taking the time to research, write and post this eagerly anticipated article!

Just tried running some of the code and have noticed missing whitespace from figures 2, 3, 6, 6, 18, 19 (maybe others, haven't got all the way through yet) resulting in syntax errors.

You seem to be losing your ongoing battle with these code windows!

Again thanks a million for another great article.

Allister

Jeff Moden SSC Guru Points: 1003859 More actions · Answer 2

Thanks Allister. The reformatting is taking longer than any of us anticipated and I also have a late breaking bit of information that I'm adding.

For the code windows, you'll need to copy from one line above the code window to one line below the code window, paste to Word (or other word processor), copy from that and paste to QA or SSMS. As soon as we get the formatting and the additional info squared away, we'll add a file with clearly marked code for all the examples to make it just a little easier for folks to test with.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

peter-757102 SSCertifiable Points: 6877 More actions · Answer 3

One word Jeff....CLASS

Thank you for the very clear article and all the testing done to make sure that what you claim is 100% correct. But I wouldn't be me if I had not at least some comments, so here they come:

1) In Figure 28 I personally found the switch from the 3 part assignment problem, solved by using 2 part assignment, straight to a "this is the correct way" which shows 3 parts again...confusing. In fact I think I still don't get the reason why one works and the other doesn't (and I would go 2 parts all the way because of this).

2) At the end when talking about a trigger method to keep a running total correct at all times, you state that it might not work well with batch updates. It is hard to argue with a 'may not work well' opinion, but I just want to state that I can think of a way that could in fact work well.

What about making some sort of temporary Delta/Offset table first for those records that changed and then use this as input in a batch update that corrects all depending records with the appropriate offset without using a pseudo cursor.

Its just an idea, I never tested it and cannot guarantee it runs into unforeseen problems either.

ben.mcintyre SSCommitted Points: 1878 More actions · Answer 4

Superlative. As usual, you make a highly complex and conditional task look simple and straightforward by precisely covering key points. I am now not afraid to try using this method 🙂

Any time frame on your part 2 of pivot tables article Jeff ? I was thinking of drafting up a complementary article but wanted to leave you to have a complete say first ...

TravisDBA SSCoach Points: 15780 More actions · Answer 5

Good article Jeff, and I did read Pauls blog on DBCC PAGE AND DBCC IND and Paul says the following about those undocumented commands:

"Before jumping into how things work, I'd like to go over two commands I'll be using a lot - DBCC PAGE and DBCC IND. These are both undocumented and unsupported commands, but are very safe to use as they're used extensively inside and outside Microsoft when troubleshooting. Nevertheless, use at your own risk. They're quite well known in the SQL community and I and others have publicized them before."

I have followed the genius of Paul S Randal for quite sometime and have rarely ever seen him to be off-base about anything in SQL Server. So, if the author of DBCC CHECKDB tells me that they are "very safe", then I pretty much take that to the bank. When it comes to SQL Server internals there is no one I trust more than Paul, maybe Kim Tripp. It's a toss up. Now, on the otherhand, he does say to use them at your own risk, but in reality you could say that about all undocumented commands in SQL Server. Microsoft tends to cover themselves there and I don't blame them. However, that does not necessarily mean that they are not useful or safe. 🙂 They are probably not commands for the "database cowboys" as I like to say sometimes, but in the hands of someone who relatively knows what they are doing, they are indispensible...Travis.

"Technology is a weird thing. It brings you great gifts with one hand, and it stabs you in the back with the other. ...:-D"

DM Unseen SSC Veteran Points: 260 More actions · Answer 6

I'm missig the CTE/Recursive approaches to solving these issues. While usually not single pass the beat triangular joins

For Running total see: http://shahamishm.blogspot.com/2009/04/how-to-do-running-sum-using-cte-sql.html

For finding previous row ordered by key_nr and trans_date(clustered key):

WITH trans_cte(key_nr, trans_date,field1,field2,row_order)

AS(

SELECT

key_nr

,trans_date

,field1

,field2

,ROW_NUMBER()OVER(PARTITIONBY key_nr ORDERBY trans_date ASC) row_order

FROM [dbo].[trans_stage_table]

)

SELECT BASE.key_nr

,BASE.trans_date

,BASE.field1

,BASE.field2

FROM trans_cte BASE

LEFTJOIN trans_cte LAG ON BASE.key_nr = LAG.key_nr

AND BASE.row_order = LAG.row_order+1

and BASE.field1 = LAG.field1

and BASE.field2 = LAG.field2

where LAG.key_nr isnull

Kind Regards,

DM Unseen AKA M. Evers

ta.bu.shi.da.yu Hall of Fame Points: 3985 More actions · Answer 7

That's an awesome article Jeff!

Now if only I could work out how to run a query that traverses a directed graph with cycles - but where there is a node that connects to another node that had already been traversed then it does not continue. ..

But rather than muck up this thread, I'll post a question about it a bit later. 🙂

Random Technical Stuff[/url]

Jeff Moden SSC Guru Points: 1003859 More actions · Answer 8

peter-757102 (11/10/2009)
One word Jeff....CLASS
Thank you for the very clear article and all the testing done to make sure that what you claim is 100% correct. But I wouldn't be me if I had not at least some comments, so here they come:
1) In Figure 28 I personally found the switch from the 3 part assignment problem, solved by using 2 part assignment, straight to a "this is the correct way" which shows 3 parts again...confusing. In fact I think I still don't get the reason why one works and the other doesn't (and I would go 2 parts all the way because of this).
2) At the end when talking about a trigger method to keep a running total correct at all times, you state that it might not work well with batch updates. It is hard to argue with a 'may not work well' opinion, but I just want to state that I can think of a way that could in fact work well.
What about making some sort of temporary Delta/Offset table first for those records that changed and then use this as input in a batch update that corrects all depending records with the appropriate offset without using a pseudo cursor.
Its just an idea, I never tested it and cannot guarantee it runs into unforeseen problems either.

Thanks, Peter. I appreciate the awesome compliment and your observations.

Referring to your item 1, I believe that part of the reason why the code in the "OMG! I Broke It!" section is a bit confusing is because (going out on a limb here), 99.999% of the people wouldn't add 2 to 1 to get to 3. When I first saw a similar bit of code from the two guys I mentioned in the article, they only used the number 1... talk about confusing. Like a lot of folks might, I initially concentrated on the "Who would do this?" rather than the "Oh... I don't know why it does that, but it does." 😛

What the intent of that code is is to show that the 3 part update can create a problem (not always) if the expression (right most part of the 3 part update) relies on the same variable that has already been updated by another expression. The other intent is to show that any problems with the 3 part update can be easily overcome by splitting the 3 part update into two 2 part updates. I see in another post above where at least one person has decided to keep life simple and use only 2 part updates all the time.

For your item number 2, you are correct that you could easily copy the changed rows to a Temp table and do a batch update on those... however, just offseting the rows (I assume you're talking about using an incremental column somewhere in that) won't keep you from needing a "Quirky Update", a real cursor, or a Triangular Join to update the new rows with a running balance. The point is well taken, though. The article is basically incomplete unless the method of using a trigger to update on input is included. I need to resubmit the article to spell Paul Randal's name correctly and to put the correct code in Figure 21... I'll add a section for the trigger update.

Again, thank you for the very thoughtful observations. They're much appreciated.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Hugo Kornelis SSC Guru Points: 64780 More actions · Answer 9

Hi Jeff,

Thanks for doing a rewrite of this article.

Just a few comments before I start really reading the article:

1) Being one of the people who tried to grill you over a small fire for your recommendation to use what you call quirky update, I feel olbliged to point out that, no matter how many tests you run succesfully and how many rules you add to rule out problem cases, using undocumented features will always remain a huge risk. Of course, you are right that you can't prove reliability of SELECT either. But if I ever encounter some weird case where SELECT does not behave as expected and as documented, I can file a bug with Microsoft and have it fixed. If I encounter a case where the quirky update fails to work, I can try to file a bug but I should expect the MS guys to do the told-you-so-dance.

2) The code in figure 2 will blow up on any server with a case sensitive collation. You should change "TempDB" to "tempdb".

That's it for now. I'll now start properly reading the article.

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

Jeff Moden SSC Guru Points: 1003859 More actions · Answer 10

ben.mcintyre (11/10/2009)
Superlative. As usual, you make a highly complex and conditional task look simple and straightforward by precisely covering key points. I am now not afraid to try using this method 🙂
Any time frame on your part 2 of pivot tables article Jeff ? I was thinking of drafting up a complementary article but wanted to leave you to have a complete say first ...

Thanks for the compliment, Ben... especially on such a controversial method.

Shifting gears and thank you for your interest, Part 2 of the Pivot tables article was actually published a while back and can be found at the following URL...

http://www.sqlservercentral.com/articles/Crosstab/65048/

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1003859 More actions · Answer 11

dm_unseen-571460 (11/10/2009)
I'm missig the CTE/Recursive approaches to solving these issues. While usually not single pass the beat triangular joins
For Running total see: http://shahamishm.blogspot.com/2009/04/how-to-do-running-sum-using-cte-sql.html
For finding previous row ordered by key_nr and trans_date(clustered key):
WITH trans_cte(key_nr, trans_date,field1,field2,row_order)
AS(
SELECT
key_nr
,trans_date
,field1
,field2
,ROW_NUMBER()OVER(PARTITIONBY key_nr ORDERBY trans_date ASC) row_order
FROM [dbo].[trans_stage_table]
)
SELECT BASE.key_nr
,BASE.trans_date
,BASE.field1
,BASE.field2
FROM trans_cte BASE
LEFTJOIN trans_cte LAG ON BASE.key_nr = LAG.key_nr
AND BASE.row_order = LAG.row_order+1
and BASE.field1 = LAG.field1
and BASE.field2 = LAG.field2
where LAG.key_nr isnull

Thank you for the link to your article where a running total is resolved by a recurrsive CTE. That's one of the methods I left out due to the shear length (22 pages in Word) of this article. Please take the following constructively... it's not meant as a slam, just an observation. What I believe you'll find is that it's performance is similar to or worse than a cursor. But, before I stick my foot too far into my mouth, I need to test it against the million row test table I created for this article. I'll be back (hopefully tonight) to let you know how that went. Of course, I'll post the code I tested with so that you and others can also run the same test.

Thank you for your interest in solving the running total problem.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1003859 More actions · Answer 12

ta.bu.shi.da.yu (11/10/2009)
That's an awesome article Jeff!
Now if only I could work out how to run a query that traverses a directed graph with cycles - but where there is a node that connects to another node that had already been traversed then it does not continue. ..
But rather than muck up this thread, I'll post a question about it a bit later. 🙂

Thanks for the great feedback, ta.bu.shi.da.yu. Always appreciated.

On the subject of "a query that traverses a directed graph with cycles", I'd very much appreciate you posting a question on the forum about that especially since I'm not sure what you mean there. Of course, I've only had one cup of coffee so far this morning. 😛 Looking forward to the question and the explanation as to what you mean on the subject.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1003859 More actions · Answer 13

Hugo Kornelis (11/10/2009)
Hi Jeff,
Thanks for doing a rewrite of this article.
Just a few comments before I start really reading the article:
1) Being one of the people who tried to grill you over a small fire for your recommendation to use what you call quirky update, I feel olbliged to point out that, no matter how many tests you run succesfully and how many rules you add to rule out problem cases, using undocumented features will always remain a huge risk. Of course, you are right that you can't prove reliability of SELECT either. But if I ever encounter some weird case where SELECT does not behave as expected and as documented, I can file a bug with Microsoft and have it fixed. If I encounter a case where the quirky update fails to work, I can try to file a bug but I should expect the MS guys to do the told-you-so-dance.
2) The code in figure 2 will blow up on any server with a case sensitive collation. You should change "TempDB" to "tempdb".
That's it for now. I'll now start properly reading the article.

Heh... thanks Hugo. Despite our differences on several of those subjects, you've always been a gentleman in your criticism and observations. Never any harm done there and I very much appreciate it.

On the subject of using undocumented features, I agree... you can't call Microsoft on them. If they ever change the fundamental code behind the UPDATE statement so the "Quirky Update" doesn't work anymore, then we'll all have to resort to a Cursor, While Loop, or a CLR. Hopefully MS will have a properly working SUM() windowing function by then, as well. Heh... the good part is... they're done modifying 2k and 2k5 so I don't believe we have anything to worry about there. 🙂

On your item 2, agreed... I don't use any case sensitivity except by column so I tend to overlook the folks that have an entire server set to be case sensitive. Hmmmm... perhaps that would make a decent "Friday Poll"... "Question 1. Do you have any case sensitive servers? Question 2. Either way, please discuss why or why not." I think it would be a great discussion.

Again, even though we tend to disagree on a couple of subjects, I always appreciate the very thoughtful manner in which you discuss those things. Thanks, Hugo.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Hugo Kornelis SSC Guru Points: 64780 More actions · Answer 14

Thanks Jeff.

I also found a TEMPDB (fully upper case) and a Master when I ran the code to create the million row table. If you have a chance, you might want to change that.

I always run on case sensitive servers so that I know for sure all code I write will work on all servers. (Okay, in theory I could mess up by using object names that only differ in case, but I am not THAT thick).

I don't think cursor, WHILE loop, CLR and RBAR are the only documented and guaranteed alternatives. If you have the SQL Server MVP Deep Dives book, you'll find it in my article on set-based iteration (chapter 4). If you don't have it, shame on you! All royalties go to a great charity so you ought to have at least three copies! 😛 Anyway, in this case you'll have to wait until I find the time to adapt my method to your test data, test its performance, and write a longer reply.

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/