Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase «««1819202122»»»

Solving the "Running Total" & "Ordinal Rank" Problems (Rewritten) Expand / Collapse
Author
Message
Posted Friday, September 03, 2010 9:58 PM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 5:25 AM
Points: 11,168, Visits: 10,927
Jeff Moden (9/3/2010)
... one of the few people who argues on my side and I have to disagree with him a bit. I believe there may be code on this very thread that shows that's probably not true. I'll post it along with some other stuff I'm going to post once I've finished studying what everyone has said on this thread recently.

Don't sweat it - I'm happy to be corrected, debates are cool, it's arguing that's not

Awaiting the demo or link with considerable interest. Maybe I should state clearly what I mean by seeks always being ordered: an index seek always has the Ordered:True attribute. A seek navigates down the b+ tree and performs a partial scan (forward or backward) from that point by following the doubly-linked list at the leaf level. That's what causes them to be ordered: the linked list connects adjacent pages in logical key order.

I've been back over every post in this thread more than once over the last 24 hours, and I haven't noticed any code that shows INDEX(1) not working - but then again I wasn't looking for it

Paul




Paul White
SQL Server MVP
SQLblog.com
@SQL_Kiwi
Post #980581
Posted Saturday, September 04, 2010 4:39 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Yesterday @ 1:54 PM
Points: 5,787, Visits: 8,001
Less than two days, and I already have a lot of catching up to do. This topic sure if lively! ;)

Jeff Moden (9/3/2010)
I usually value your opinion but you have to stop hammering at me personally.

Let me respond to this first. Jeff, please accept my sincerest apologies for coming across as hammering at you personally. I know I can disagree very violently, but I always try to attack only the message, never the messenger. The quote above indicates that I failed.

I have a very deep respect for you, and for your enormous contributions to the SQLServerCentral community. The fact that I utterly disagree with your opinion on this particular subject does not change that respect one bit.

But (yes, there is the but, and it's a big one) I will NOT stop hammering at the quirky update method you advocate in this article.

The technique works,

No, it does not. My examples on page 11 in this topic still produce incorrect results. (Maybe you addressed those in one of the post I've yet to read).

I've added code to check that it works

True. But the code to check runs slower than the fully documented algorithm I posted and you then optimized further. So why would you then still choose quirky update + check?

(and Paul White has recently done a much better job)

Yes, I've seen it and it's awesome. It still uses undocumented features for the quirky update, but the code to check that it works correctly uses documented features only without taking so much time as your code. So the calculation using Paul's code might still fail, but at least you'll get an error message instead of erroneous results in the database.
I could imagine some very specific circumstances where I myself might choose to implement this code rather than mine (for speed) or the cursor (if simplicity of future maintenance is more important than speed). But the circumstances would have to be very specific, and I'd add large blocks of comments to point out the potential problems for future maintenance!



Hugo Kornelis, SQL Server MVP
Visit my SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis
Post #980698
Posted Saturday, September 04, 2010 4:44 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Yesterday @ 1:54 PM
Points: 5,787, Visits: 8,001
Paul White NZ (9/3/2010)
Jeff Moden (9/3/2010)
If a SEEK occurs, you could get an out of order update and that's why I didn't recommend INDEX(1). INDEX(0) forces the clustered index scan that's needed to guarantee the order of the update.

Seeks are always ordered It's the scan that might not be. Honest!

Either I am having a major senior moment right now, or you are.
Seeks are always for single rows. Ordered or unordered does not apply to single rows, and hence to seeks.

Scans can be both ordered or unordered. When it's ordered, the engine HAS to follow the pointer chain to process the index in "logical" order. When unordered, the engine has the choice of either following the pointer chain, or using the IAM to access the index pages in whatever order they happen to be allocated in the data file. And if I recall correctly, the IAM-driven scan will only actually take place if the optimizer requests and unordered scan AND the query uses either a table lock or no locks (dirty read).


EDIT:
After reading this:
Paul White NZ (9/3/2010)
Maybe I should state clearly what I mean by seeks always being ordered: an index seek always has the Ordered:True attribute. A seek navigates down the b+ tree and performs a partial scan (forward or backward) from that point by following the doubly-linked list at the leaf level. That's what causes them to be ordered: the linked list connects adjacent pages in logical key order.

You are absolutely right. I thought that this method of finding a starting point and then scanning a range of rows was called a scan, but I just double checked in SSMS and the execution plan clearly labels this operator as a seek.
The senior moment was mine!



Hugo Kornelis, SQL Server MVP
Visit my SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis
Post #980699
Posted Saturday, September 04, 2010 4:54 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Yesterday @ 1:54 PM
Points: 5,787, Visits: 8,001
SwePeso (9/3/2010)
A correct implementation of "Ordered CTE Update" would look something like this (no clustered index).
(...)
SELECT TOP(2147483647)
(...)
ORDER BY AccountID,
Date,
TransactionDetailID


Maybe it's not really important since the quirk update itself relies on undocumented behaviour, but the SELECT TOP ... ORDER BY trick is undocumented as well.

In SQL 2000, you could use SELECT TOP 100 PERCENT ... ORDER BY in a view or subquery, and expect the results to be sorted without explicit order by on the outermost query. This was not documented - the documentation implied that the ORDER BY in this case only governed the interpretation of the TOP clause.
The undocumented behaviour changed in SQL 2005, when the optimizer got smart enough that the TOP 100 PERCENT will always pass all rows, making it a no-operation - soo the optimizer simply thres both the TOP and the accompanying ORDER BY in the bin. That broke a lot of code that depended on this undocumented trick.
And then, people found that they could restore the behaviour by using TOP (some large number) instead of TOP 100 PERCENT. The optimizer does not yet have the logic to compare the number after the TOP with the actual number of rows in the table. But who know - that logic might be added in the next service pack, and then this trick, too, will start to fail.



Hugo Kornelis, SQL Server MVP
Visit my SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis
Post #980701
Posted Saturday, September 04, 2010 9:49 PM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 5:25 AM
Points: 11,168, Visits: 10,927
Hugo Kornelis (9/4/2010)
My examples on page 11 in this topic still produce incorrect results.

If we add the safety check, things work out nicely:

DECLARE @Sequence   INTEGER = 0,
@Counter INTEGER = 0;

-- Quirky update with safety check
UPDATE SafeTable
SET @Counter = AccountRunningCount = @Counter + 1,
@Sequence = CASE WHEN Sequence = @Sequence + 1 THEN Sequence ELSE 1/0 END
FROM (
SELECT Sequence = ROW_NUMBER() OVER (ORDER BY TD.AccountID, TD.[Date], TD.TransactionDetailID),
TD.AccountRunningCount
FROM dbo.TransactionDetail TD
WHERE TD.TransactionDetailID BETWEEN 120000 AND 120010
) SafeTable;

-- Results
SELECT *
FROM dbo.TransactionDetail
WHERE TransactionDetailID BETWEEN 120000 AND 120010
ORDER BY
AccountID,
Date,
TransactionDetailID;

Query Plan:


Correct results and an optimal query plan!

Paul




Paul White
SQL Server MVP
SQLblog.com
@SQL_Kiwi


  Post Attachments 
QU Hugo.gif (274 views, 10.79 KB)
Post #980735
Posted Saturday, September 04, 2010 10:37 PM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 5:25 AM
Points: 11,168, Visits: 10,927
Hugo Kornelis (9/4/2010)
I thought that this method of finding a starting point and then scanning a range of rows was called a scan, but I just double checked in SSMS and the execution plan clearly labels this operator as a seek.

The terminology is annoying imprecise.

Is it a "seek plus range scan", a "partial scan", a "range seek" or simply a "seek"? I remember Gail Shaw and I had a debate about this once, and agreed to disagree

For my money, the simple term "seek" is not enough. I'm normally pretty careful about making the distinction, so apologies to you and Jeff for the ambiguity.




Paul White
SQL Server MVP
SQLblog.com
@SQL_Kiwi
Post #980738
Posted Saturday, September 04, 2010 11:28 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 7:47 AM
Points: 35,950, Visits: 30,231
Hugo Kornelis (9/4/2010)
Less than two days, and I already have a lot of catching up to do. This topic sure if lively! ;)

Jeff Moden (9/3/2010)
I usually value your opinion but you have to stop hammering at me personally.

Let me respond to this first. Jeff, please accept my sincerest apologies for coming across as hammering at you personally. I know I can disagree very violently, but I always try to attack only the message, never the messenger. The quote above indicates that I failed.

I have a very deep respect for you, and for your enormous contributions to the SQLServerCentral community. The fact that I utterly disagree with your opinion on this particular subject does not change that respect one bit.

But (yes, there is the but, and it's a big one) I will NOT stop hammering at the quirky update method you advocate in this article.

The technique works,

No, it does not. My examples on page 11 in this topic still produce incorrect results. (Maybe you addressed those in one of the post I've yet to read).

I've added code to check that it works

True. But the code to check runs slower than the fully documented algorithm I posted and you then optimized further. So why would you then still choose quirky update + check?

(and Paul White has recently done a much better job)

Yes, I've seen it and it's awesome. It still uses undocumented features for the quirky update, but the code to check that it works correctly uses documented features only without taking so much time as your code. So the calculation using Paul's code might still fail, but at least you'll get an error message instead of erroneous results in the database.
I could imagine some very specific circumstances where I myself might choose to implement this code rather than mine (for speed) or the cursor (if simplicity of future maintenance is more important than speed). But the circumstances would have to be very specific, and I'd add large blocks of comments to point out the potential problems for future maintenance!


I'll get back to the "meat" of this thread soon but I wanted to address this particular post separately.

Hugo, it takes one awesome professional to post something like that above. Even if we vehemently disagree (and you'll be surprised that we disagree less than you think) on something, the tone of my future conversations with you will be markedly different. You're just trying to do the same thing I am... help people. Thank you for your courtesy and good will.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

"Change is inevitable. Change for the better is not." -- 04 August 2013
(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #980740
Posted Saturday, September 04, 2010 11:40 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 7:47 AM
Points: 35,950, Visits: 30,231
As a bit of a tease and a bit of an explanation as to why my response to all of this is taking so long is {drum roll please and pardon the run on sentence}, Wayne S. finally caused a true "after the fact" break in the method but Paul White was originally correct in what I should not have changed because it prevents the break and Hugo was partially wrong but was also essentially correct for the same reason as Wayne S’s break. Whew! Again, both breaks (and similar) are preventable by what Paul said I shouldn't have removed for speed.

I'm checking performance on a couple of things that Hugo and Paul came up with and making sure of others. It'll take me some time but I'll be back.

Thank everyone who actually spent some constructive time on this especially (but not limited to) Hugo and Paul.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

"Change is inevitable. Change for the better is not." -- 04 August 2013
(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #980741
Posted Sunday, September 05, 2010 1:17 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 5:25 AM
Points: 11,168, Visits: 10,927
Jeff Moden (9/4/2010)
Hugo, it takes one awesome professional to post something like that above. Even if we vehemently disagree (and you'll be surprised that we disagree less than you think) on something...

I could not agree more.




Paul White
SQL Server MVP
SQLblog.com
@SQL_Kiwi
Post #980757
Posted Sunday, September 05, 2010 4:36 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Yesterday @ 6:06 PM
Points: 6,543, Visits: 8,751
Hugo Kornelis (9/3/2010)
Please go back to page 4 of this topic. One of my posts on that page includes a very fast algorithm that completely avoids undocumented features. And further down, Jeff even posted an optimized version of that code.

Hugo Kornelis (9/4/2010)
My examples on page 11 in this topic ...


Hugo, I have a small request. Instead of posting the page #, could you post the url of the post (in case you're not aware of this, click the post # at the bottom left of the post and up pops a message box with the complete url to the post).

The reason I ask is that the # of posts to display in a page is configurable. If yours is set to 10 per page, then your post on page 4 is between post 31-40. If yours is set to 20, then your post is between 61-80. Mine is set to 50 per page - which is between 151-200. For me, page 11 would be for posts 550-600, and we're not that high yet.

Thanks!


Wayne
Microsoft Certified Master: SQL Server 2008
If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
Links: For better assistance in answering your questions, How to ask a question, Performance Problems, Common date/time routines,
CROSS-TABS and PIVOT tables Part 1 & Part 2, Using APPLY Part 1 & Part 2, Splitting Delimited Strings
Post #980773
« Prev Topic | Next Topic »

Add to briefcase «««1819202122»»»

Permissions Expand / Collapse