Solving the "Running Total" & "Ordinal Rank" Problems in SS 2k/2k5

Question

Solving the "Running Total" & "Ordinal Rank" Problems in SS 2k/2k5

Jeff Moden

SSC Guru

Points: 1003896
More actions
January 30, 2008 at 10:21 pm

#183663

Comments posted to this topic are about the item
[font="Arial Black"]
Solving the "Running Total" & "Ordinal Rank" Problems in SS 2k/2k5
[/font][/url]
The new article was published on 10 Nov 2009 and may be found at the following URL...
http://www.sqlservercentral.com/articles/T-SQL/68467/
The new article is basically a rewrite of the old article with some fixes and workarounds for some of the things we've encountered in the last year. A second article is in the works for some of the many uses for the "Quirky Update"... there was just too much in the first article to conveniently fit.
--Jeff Moden
RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
Change is inevitable... Change for the better is not.
Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 250 total)

You must be logged in to reply to this topic. Login to reply

Capt Calamity SSC-Addicted Points: 449 More actions · Answer 1

Watch out with those clustered indices...the order is NOT guaranteed to be the index order.

There are a few cases - out of disk space is one I experienced, parallel query execution et al - that cause the leaf data to be out of order.

See also http://blog.sqlauthority.com/2007/09/24/sql-server-order-of-result-set-of-select-statement-on-clustered-indexed-table-when-order-by-is-not-used/

Regards
Andy Davies

Jeff Moden SSC Guru Points: 1003896 More actions · Answer 2

Yep... thanks for the feedback. I've seen that, too. But only on SELECTs... and only on non-clustered indexes. I've never seen it fail on UPDATEs on CLUSTERED index hints. That's why I did all the proofing in the article with the Merry-Go-Round index.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Phil Factor SSC-Insane Points: 20244 More actions · Answer 3

Yes, I'd definitely agree that the 'Quirky update' approach is the right one. I've always used an 'order by' but never tried clustered index/hints to force the order. It seems pretty robust too. I've used it in Sybase and, I reckon, all versions of SQL Server in the past.

The strange thing about the 'Quirky Update' technique is that, one has experimented with it, one discovers all sorts of problems that it solves. Directory trees, Book indices (Index, Use of, Updates), financial reporting. Whoever put the feature into Sybase was very far-sighted.

Best wishes,
Phil Factor

Christian Buettner-167247 SSChampion Points: 13729 More actions · Answer 4

Hello Jeff,

very interesting article.

Somehow it throws all my beliefs into the trash bin:

1. The update method with variables is not explicitly supported by Microsoft

2. The clustered index does not guarantee ordered results

3. Order By in derived tables does not guarantee ordered results.

Your "evidence" does not really convince me. It could be just a lucky day (or a lot of them) ;).

If anyone would have some "official" documentation on these 3 topics, that would be great!

Btw: Did anyone try the "Order By" Solution?

It returns wrong results for me, but maybe did not copy the code correctly (somehow the linefeeds

disappear with c/p).

Best Regards,

Chris Büttner

Salvor Ten Centuries Points: 1219 More actions · Answer 5

I think Andy has a point there. On multiprocessor machine SQL Server might use more threads to scan the index and then merge the result sets in one data stream before executing the update, and this could mess up things. I'd add a maxdop 1 option just to be sure. Even with one thread, I'm not sure that SQL Server would start scanning from the beginning of the index if some data pages are already in memory (perhaps because someone has just finished a full scan of the clustered index). We should use an order by to be sure, but it's not allowed on updates. Here's a trick I use in situations like this: I put a 'greater than' condition on the clustered index column(s) (it works on single column keys, it should work on multicolumn key if we use the first column of the index), where the comparison value is less than any value of the index column (say min(IndexColumnName)-1). This forces the engine to traverse the index looking for the first record matching the condition to start the scan from there. You could also avoid the index hint, unless that column also belongs to other indexes.

Interesting, these are the execution plans without and with the ordering condition AccountID>0:

UPDATE dbo.JBMTest

SET @PrevRunBal = RunBal = @PrevRunBal + Amount,

@PrevAcctID = AccountID

FROM dbo.JBMTest WITH (INDEX(IX_JBMTest_AccountID_Date),TABLOCKX)

OPTION(MAXDOP 1)

StmtText -------------------------------------------------------------------------------------------------------------------------------------------------------------

|--Clustered Index Update(OBJECT: ([Northwind].[dbo].[JBMTest].[IX_JBMTest_AccountID_Date]), SET: ([@PrevAcctID]=[Expr1005], [JBMTest].[RunBal]=[Expr1004]))

|--Compute Scalar(DEFINE: ([Expr1004]=[@PrevRunBal]=[@PrevRunBal]+[JBMTest].[Amount], [Expr1005]=[@PrevAcctID]=[JBMTest].[AccountID]))

|--Top(ROWCOUNT est 0)

|--Clustered Index Scan(OBJECT: ([Northwind].[dbo].[JBMTest].[IX_JBMTest_AccountID_Date]))

UPDATE dbo.JBMTest

SET @PrevRunBal = RunBal = @PrevRunBal + Amount,

@PrevAcctID = AccountID

FROM dbo.JBMTest WITH (INDEX(IX_JBMTest_AccountID_Date),TABLOCKX)

WHERE AccountID>0

OPTION(MAXDOP 1)

StmtText --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

|--Clustered Index Update(OBJECT: ([Northwind].[dbo].[JBMTest].[IX_JBMTest_AccountID_Date]), SET: ([@PrevAcctID]=[Expr1005], [JBMTest].[RunBal]=[Expr1004]), DEFINE: ([Expr1004]=[@PrevRunBal]=[@PrevRunBal]+[JBMTest].[Amount], [Expr1005]=[@PrevAcctID]=[JBMTest].[AccountID]), WHERE: ([JBMTest].[AccountID] > 0))

Quite different, but with the same results in terms of I/O statistics:

Table 'JBMTest'. Scan count 1, logical reads 10186, physical reads 0, read-ahead reads 50.

Ok, time to go back to work.

bye

Salvor

Johan Bijnens SSC Guru Points: 135266 More actions · Answer 6

Very good article, nice testcode. :w00t::cool:

There may be some testing needed regarding parallelism,

maybe for now just add a OPTION(MAXDOP 1).

The only remarks I would add to the conclusion is :

- Keep in mind that by design there is no order in a setbased approach

- For now it works and it is the fasted propriatary solution for the problem.

Remeber the views with order by clause that suddenly nolonger

externalized the order in SQL2005, but worked so well in SQL2000.

Thank you for sharing the knowledge.:smooooth:

Johan

Learn to play, play to learn !

Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:

- How to post Performance Problems
- How to post data/code to get the best help[/url]

- How to prevent a sore throat after hours of presenting ppt

press F1 for solution, press shift+F1 for urgent solution 😀

Need a bit of Powershell? How about this

Who am I ? Sometimes this is me but most of the time this is me

Ian Yates SSCoach Points: 19738 More actions · Answer 7

I have to agree with the naysayers unfortunately. It's great at the moment and really does seem to work (I haven't seen it fail yet either). I was always suspicious and slightly annoyed when people used the select top 100 percent with an order by in a view - dirty hack in my opinion. This could go the same way. Having said that, perhaps we could get someone from the MS Query Engine team to elaborate on

* The update syntax of @variable = col = newValue

* What happens if you have multiple of these in the same update. eg

@variable = col = col + @variable

col2 = col + @variable

Does col2 effectively get col + col + @variable or just col + @variable?

* Forced ordering.

I'm going to run some quick tests on my sql 2k5 installation to see if there are any quirks (I'd be very surprised if I found something that was overlooked by the likes of Jeff though 😛 )

Phil Factor SSC-Insane Points: 20244 More actions · Answer 8

Christian,

The use of variables in Update statements has always been supported by Microsoft. The full syntax for the original Transact SQL Update statement is...

[font="Courier New"]

update [[database.]owner.]{table_name | view_name}

set [[[database.]owner.]{table_name. | view_name.}]

column_name1 =

{expression1 | null | (select_statement)} |

variable_name1 =

{expression1 | null | (select_statement)}

[, column_name2 = {expression2 | null |

(select_statement)}]... |

variable_name2 = {expression1 | null | (select_statement)}

[from [[database.]owner.] {table_name | view_name}

[, [[database.]owner.] {table_name |

view_name}]]...

[where search_conditions]

[/font]

and the classic example that Sybase gives for assigning variables in an update statement is....

[font="Courier New"]

DECLARE @price money

SELECT @price = 0

UPDATE titles

SET total_sales = total_sales + 1,

@price = price

WHERE title_id = 'BU1032'

SELECT @price, total_sales

FROM titles

WHERE title_id = 'BU1032'

total_sales

------------------------ -----------

19.99 4096

[/font]

It all seems to be well documented, so I reckon it is safe to use!

Best wishes,
Phil Factor

Christian Buettner-167247 SSChampion Points: 13729 More actions · Answer 9

Hi Phil,

you are correct, it obviously is supported.

I misunderstood a section in a book [1] that covered this type of statement.

In this book it was only mentioned that the variable thing is "far from standard",

but obviously not "unsupported".

Thanks for your hint!

[1] Inside Microsoft SQL Server 2005: T-SQL Querying

Best Regards,

Chris Büttner

Dean Cochrane SSC Eights! Points: 992 More actions · Answer 10

Very interesting article.

I want to add my voice to those urging caution with regard to relying on order being forced by a clustered index. According to the SQL standard, the only way to enforce order is with ORDER BY.

The problem with relying on a clustered index is that it isn't supported: no matter how many times you have tested it, it is not guaranteed to work, and data corruption introduced by a few pages not being in the order in which you expect them is likely to be both subtle and really difficult to fix. I can only imagine trying to diagnose the problem... it makes me queasy just thinking about it.

I will consider using the ORDER BY method, which seems to me to be innovative and safe code.

Johan Bijnens SSC Guru Points: 135266 More actions · Answer 11

Phil, Chris,

Indeed no discussion regarding the use of target variables.

The point is, will its content be correct during the setbased work

if the variable is used more then once in the statement.

Suppose you have more than one price ....

(cfr result data for select from views containing an order by ( before the 'fix' and trace) )

For now it seems to work with correct results, but ... you have to

figure out how to force the process order.

I'm glad Jeff cleared it out in this article :w00t:

Johan

Learn to play, play to learn !

Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:

- How to post Performance Problems
- How to post data/code to get the best help[/url]

- How to prevent a sore throat after hours of presenting ppt

press F1 for solution, press shift+F1 for urgent solution 😀

Need a bit of Powershell? How about this

Who am I ? Sometimes this is me but most of the time this is me

James Goodwin Hall of Fame Points: 3691 More actions · Answer 12

Did you test using the Clustered Index with the ORDER BY subquery? It seems to me that the ORDER BY Would enforce the order, but since you are using a Clustered Index it wouldn't cause much overhead unless you were going to get bit by a disordered resultset anyway.

--

JimFive

Darrin Blunt SSC Enthusiast Points: 144 More actions · Answer 13

Since you installed SQL 2K5, have you looked into the "OVER" clause (BOL reference: ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/tsqlref9/html/ddcef3a6-0341-43e0-ae73-630484b7b398.htm) and the Ranking Functions (BOL reference: ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/tsqlref9/html/e7f917ba-bf4a-4fe0-b342-a91bcf88a71b.htm)?

I am curious how they perform against your test. I have used the Ranking Functions, but only on result sets of a few thousand rows and it has performed well for me.

Darrin

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 14

good one - no doubt about it. I hadn't yet seen the merry-go-round in action, but it does make sense now.

From what I've seem, the OVER syntax, and the ranking functions give you no advantage whatsoever on anything except for counts in this particular case. It's in my mind one of the failings/shortcoming of that syntax, since with all it does, you STILL have to do triangle joins or this kind of technique to get any kind of running aggregate other than a count (which you get implicitly through the ROW_NUMBER() function). I am pretty sure that's part of the OVER syntax from Ansi that didn't make the 2005 implementation (as I recall, that OTHER database product does running aggregates using its OVER syntax)..

And - just like you'd expect - using a CTE to try to force the order does nothing for you. It operates the same as a view does (inner ORDER BY is ignored).

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?