RE: Solving the "Running Total" & "Ordinal Rank" Problems (Rewritten)

SSC Guru

Points: 64780

November 10, 2009 at 4:52 pm

Well, I've allowed myself to be distracted into a discussion on what can be deducted from particular operators in the execution plan and whether or not the quirky update can currently be broken. But I really wanted to present you an alternative way, that might reduce the need to build production code on undocumented and unreliable constructions.

First, a plug. I have described the technique I will show in much greater depth in one of my two articles in the great new book SQL Server MVP Deep Dives. A book written by no less than 53 MVPs, all with a passion for SQL and a passion for helping children in need. All the author royalties for this book will go to "War Child" - a great organisation that is dedicated to helping children in war zones (war victims, child soldiers, etc) back to a normal childs life. More information about the book can be found at http://www.sqlservermvpdeepdives.com/[/url].

So even though I'll post the code here, I urge you all to buy the book. If not for helping charity, then for the wealth of other information you'll find in its 59 chapters.

So far for the plug. On to the code 😉

SELECT CURRENT_TIMESTAMP;

DECLARE @Results TABLE

(AccountID int NOT NULL,

Date datetime NOT NULL,

TransactionDetailID int NOT NULL,

Amount money NOT NULL,

RunningTotal money NULL,

Rnk int NOT NULL,

PRIMARY KEY (Rnk, AccountID));

INSERT INTO @Results(AccountID, Date, TransactionDetailID, Amount,

RunningTotal, Rnk)

SELECT AccountID, Date, TransactionDetailID, Amount, Amount,

RANK() OVER (PARTITION BY AccountID

ORDER BY Date,

TransactionDetailID)

FROM dbo.TransactionDetail;

DECLARE @Rank int,

@RowCount int;

SET @Rank = 1;

SET @RowCount = 1;

WHILE @RowCount > 0

BEGIN;

SET @Rank = @Rank + 1;

UPDATE nxt

SET RunningTotal = prv.RunningTotal + nxt.Amount

FROM @Results AS nxt

INNER JOIN @Results AS prv

ON prv.AccountID = nxt.AccountID

AND prv.Rnk = @Rank- 1

WHERE nxt.Rnk = @Rank;

SET @RowCount = @@ROWCOUNT;

END;

SELECT CURRENT_TIMESTAMP;

UPDATE td

SET AccountRunningTotal = r.RunningTotal,

AccountRunningCount = r.Rnk

FROM dbo.TransactionDetail AS td

INNER JOIN @Results AS r

ON r.TransactionDetailID = td.TransactionDetailID;

SELECT CURRENT_TIMESTAMP;

When executed on my computer, the entire code takes 76.8 seconds. The first 28.3 seconds are for calculating the running totals and running counts; the remaining 48.5 seconds for copying the results back from the working table (@Results) into the original table. I added this step only because I couldn't use EXEC dbo.Verify to check the correctness otherwise.

It is my experience that running totals are often only needed for reports, or calculated and stored in a snapshot copy of the data. Storing them in the actual table, as your code does, is far less common (and in fact bad practice - the running totals will not be updated every time the base data changes, so you are mixing current and stale data in a single table which can be very confusing). That's why I wrote the code to calculate the running totals in a table variable (which of course could also have been a temp table or a permanent table in a reporting database).

Your quirky update code (figure 19 in the article) takes 8.3 seconds on the same machine and the same data. So indeed, my code is quite a bit slower than your code (2.5 times slower, to be exact). And it only takes half as long as the Quirky Update + Verification method you mention in the article's conclusion.

If I had to choose between an 8-minute (or even 4-minute, see my previous post) fully documented cursorbased solution or a 8-second undocumented method, then -depending on the situation, especially the importance of cutting edge performance in the specific circumstances- I might be tempted to take the risk and use the undocumented method.

But with a 30-second fully documented alternative at hand, shaving off those final seconds running time no longer seem worth the price of relying on undocumented features.

Remember - my main gripe with using this method is not that I was able to find some weird cases where it breaks, nor that you have to live by a very limiting set of rules before being able to use it.

My main gripe is that it can change any day. A new version, or a new servicepack, or even a security update, a hotfix, or the position of the moons might cause a sudden change in behaviour. And if there is lots of code in production that uses this trick, then I hope I am not the one on pager duty at the time that happens!

I can understand that people use this method to drop from 8 minutes to less than 10 seconds. But I do hope that, now that I have presented a 30-second alternative that IS completely reliable, people will place all code using this method where it belongs - in the bin.

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/