Calculate the Running Total for the last five Transactions

Question

Post reply

Calculate the Running Total for the last five Transactions

Viewing 15 posts - 16 through 30 (of 57 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1003863 More actions · Answer 1

Lynn Pettis (11/26/2008)
Good article, but let's take another approach and see what we see.

Heh... you beat me to it, again. 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 2

Jeff Moden (11/27/2008)
Lynn Pettis (11/26/2008)
Good article, but let's take another approach and see what we see.
Heh... you beat me to it, again. 😉

Heh... I may be old, but I'm also trainable.

Phil Factor SSC-Insane Points: 20244 More actions · Answer 3

I like both approaches to the problem. They both have a refreshing spark of originality.

I'm not entirely sure if I completely agree with all Hugo's objections, though I share his caution. 'Quirky Update' techniques can go wrong if you are not entirely aware of certain 'gotchas', but they were documented, used, and supported even before Microsoft bought the product. For the 'quirky update' approach that Lynne uses to work safely, you have to remember certain rules (e.g. updates are done in the order of the clustered index, all '@var=col=expression' variable assignments are done before any simple column updates-I hope I've remembered that the right way around!)

Best wishes,
Phil Factor

Hugo Kornelis SSC Guru Points: 64780 More actions · Answer 4

Lynn Pettis (11/27/2008)
Since the purpose of the identity field was order, if there was a gap it wouldn't have caused an issue.

Hi Lynn,

But you are using the IDENTITY values for more than just imposing an order. You use "WHERE B.ID BETWEEN A.ID - 4 AND A.ID" to find the current and four preceeding rows - but if there are gaps, then less than four preceeding rows will match this clause.

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

Hugo Kornelis SSC Guru Points: 64780 More actions · Answer 5

Jeff Moden (11/27/2008)
Hugo Kornelis (11/27/2008)
What I was refering too is the lack of ROW_NUMBER() in SQL Server 2000. This means you'll either have to take your chance with IDENTITY, at the risk of gaps, as the author of this article did; or you have to use a correlated subquery to calculate the row number on the fly, which can result in dramatic performance as the amount of rows grows. Plus, the queries tend to get long and hard to understand.
Nope... in SQL Server 2000, just do a SELECT INTO a temp table with the IDENTITY function and the author's code works just fine without any difficulty for length or understanding.

Hi Jeff,

You're right. When using SELECT INTO a temp table with the IDENTITY function (*), then there will not be any gaps and the range of A.ID-4 up to A.ID will always have 5 rows. But unless I overlooked something, this was not in the article. To me, the article appears to imply that any IDENTITY row can be used for this. And since many tables already have an IDENTITY column, often with gaps in the sequence due to deleted data or rolled back inserts, I thought it'd be better to point out this danger.

(*) There is another potential problem here. I know that there is only one situation where Microsoft guarantees that identity values are given out in the expected order when using ORDER BY in an INSERT statement, but I can never recall if this guarantee is for SELECT INTO with the IDENTITY() function, of for INSERT ... SELECT on a table with a predefined IDENTITY column. And I can never find this particular bit of documentation when I need it. I think that SELECT INTO with the IDENTITY() function is the supported scenario, but if you are going to use this in production you'd probably better doublecheck first, for my memory is known to .... aaahh, what was I going to say again? 🙂

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 6

Hugo Kornelis (11/27/2008)
Lynn Pettis (11/27/2008)
Since the purpose of the identity field was order, if there was a gap it wouldn't have caused an issue.
Hi Lynn,
But you are using the IDENTITY values for more than just imposing an order. You use "WHERE B.ID BETWEEN A.ID - 4 AND A.ID" to find the current and four preceeding rows - but if there are gaps, then less than four preceeding rows will match this clause.

Actually, the author was using it in his code (which I included in my post for comparision purposes), I was using it in my code exclusively to provide order to the data.

Hugo Kornelis SSC Guru Points: 64780 More actions · Answer 7

Lynn Pettis (11/27/2008)
Actually, the author was using it in his code (which I included in my post for comparision purposes), I was using it in my code exclusively to provide order to the data.

Hi Lynn,

Oops, my bad. Apologies for confusing you with the author. And also for not seeing the scrollbar and the extra code you added.

However, now that I did see your code, I must say that I like it even less than the code in the article. Your update relies on at least three assumptions:

1) The assumption that this kind of UPDATE statement will always work row by row, using the variables after assignment from row #n to update row #n+1. This may or may not be documented (I don't feel like digging through BOL at the moment), but it's definitely against the original idea of set-based updates, where all modifications are done "at once".

2) The assumption that such an UPDATE will always be processed in order of the clustered index. And I am fairly confident that this is not documented at all. Maybe the current versions of SQL Server will work like that (I admit not being able to break it in my first three tries, but with only three tries you can also see that I didn't try very hard), but who says they'll continue to do so on the enxt version? or maybe even the next service pack? or maybe even on different hardware?

For instance, as far as I know data modifications are currently never paralellized. But what if the next service pack changes that, to make better use of the increasing numbers of cores per socket and sockets per server. I can assure you that if this update runs in parallel, results will be completely different from what you want....

3) The assumption that people want to store the running totals. In most cases, you DON'T want to store them, since this would necessify recalculation everytime some data changes. Most people will want to calculate running totals on the client. And if it's really necessary to calculate them on the server, you'll want to do it in a view or stored procedure, not persist it. Unless you are in a reporting database that refreshes once or twice a day and is further used to query somewhat stale data for reporting or analysis.

Unless there is some source that I don't know of where Microsoft documents the order of processing in an UPDATE statement and commits itself to maintaining that behaviour, I would never allow this code to run in any of my production database - and I'd urge anyone to do the same.

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 8

Okay, can't wait to see Jeff's response. I was using a technique I picked up from him for completing running totals.

Regarding storing running totals, maybe it isn't stored in the database but the calculation is completed in a #temp table that is loaded in a stored procedure with a proper clustered index to ensure the order of the data, updated (including using an index hint on the clustered index (per Jeff's article on running totals), then the results returned by a select query on the #tem table.

Not stored, and the calculations done much faster than it would be done with the cross-join query.

As another aside, I ran both against a table with 32,800 entries and here are the stats for that run:

[qoute]

-- Cross Join Query --

SQL Server Execution Times: