Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««12

Updating through cursors Expand / Collapse
Author
Message
Posted Thursday, July 24, 2008 5:25 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 9:48 AM
Points: 35,950, Visits: 30,232
or because you run into one of the very rare problems where a cursor actually runs faster than a set-based solution


Hi Hugo,

Do you have an example of when that might occur? I ask because so many people write hidden RBAR into solutions such as running total problems and call them "set based" just because they don't contain an explicit Cursor or While loop.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

"Change is inevitable. Change for the better is not." -- 04 August 2013
(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #540009
Posted Thursday, July 24, 2008 5:56 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 4:05 PM
Points: 5,793, Visits: 8,002
Jeff Moden (7/24/2008)
or because you run into one of the very rare problems where a cursor actually runs faster than a set-based solution


Hi Hugo,

Do you have an example of when that might occur? I ask because so many people write hidden RBAR into solutions such as running total problems and call them "set based" just because they don't contain an explicit Cursor or While loop.


Hi Jeff,

I've got a few, actually. :)

1: There is already a tried and tested stored procedure that processes a single row at a time, and the logic in it is quite complex. The estimated cost to rewrite the logic into a set based query is too high to justify it, especially since the total time taken is not problematic (for instance, a ten-minute batch running in a six-hoour maintenance window).

2: Common problems such as running aggregates or string concatenation - typically tasks better suited for offloading to the client, but that's not always feasible. There are some well-known solutions to these problem, but they have their own problems. For starters, most are undocumented and not guaranteed to always work, which should be a major showstopper for any serious application. Besides, these also have a problem with multiple sets (e.g. running totals for the output, but reset when a new branch starts).
The one solution I do know that is documented and guaranteed is the FOR XML trick for string concatenation - and that one has its own issues when you can't guarantee the absence of special XML characters (lesser than, greater than, and ampersand) in the data.

3: Some problems known to be "NP-complete" - basically, this means that the "best" solution can only be found by trying every possible permutation. Doing this iterative means you can eliminate many possiblilities by deciding to cut a branch short if it's already clear that it won't ever become good (like a chess program might do: if a move causes it to lose its queen, there's really not much point in calculating all the possibilities for the moves after that). In a set-based solution, that is not possible as you have to rely on the optimizer "seeing" this as a way to speed up execution - and though that might happen in simple situations, the real situations are often too complex for the optimizer to see this.

Though these are all examples where cursor-based is better than set-based, they are not all examples where cursor-based is best. Some (not all!) of the problems in categories 2 and 3 are better solved using a technique that I call "set-based iteration" and that blends set based and iterative elements. I 'll have a session about this techique at this year's PASS Summit.

If you check my blog, you'll see that I started a series on "bin packing" (an example of an NP-complete problem). Lack of time has caused some severe delays in finishing and publishing parts four (that will show that a set-based approach really only can serve to test the newest and hottest 32-way quad-code processors with 500 zettabytes of RAM but not as a viable real-life solution) and five (where I will present an alternative solution using set-based iteration and show how it blows both the cursorbased and the setbased solutions). This example will probably also be included in my PASS presentation.



Hugo Kornelis, SQL Server MVP
Visit my SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis
Post #540030
Posted Thursday, July 24, 2008 6:05 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 9:48 AM
Points: 35,950, Visits: 30,232
Thanks for taking the time for that feedback, Hugo. I appreciate it.

Running totals are no excuse for using cursors... not even for display.
http://www.sqlservercentral.com/articles/Advanced+Querying/61716/

So far as the "bin packing" thing goes, I too am remiss... I said I think I know a set based way to do it and I just haven't gotten back to it. My appologies to you and myself because it's definitely an interesting problem and sounds like a lot of fun.



--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

"Change is inevitable. Change for the better is not." -- 04 August 2013
(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #540037
Posted Thursday, July 24, 2008 8:31 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 4:05 PM
Points: 5,793, Visits: 8,002
Jeff Moden (7/24/2008)
Thanks for taking the time for that feedback, Hugo. I appreciate it.

Running totals are no excuse for using cursors... not even for display.
http://www.sqlservercentral.com/articles/Advanced+Querying/61716/


Hi Jeff,

Just as I appreciate your (and all other!) comments here!

I have read that article, and some of the comments (but with over 20 pages of them, I hope you forgive me for not reaching the end ;)). I won't be adding to that, but I will explain here why I would not use that method in production code:

The index hint will tell SQL Server to use that index, but it does not tell it how to use it. I see three major risks:
1) SQL Server might chose to do an unordered index scan, in which pages are read in allocation order instead of in the logical order dictated by the next page/prev page pointers.
2) There is also the possibility of a merry-go-round scan (http://msdn.microsoft.com/en-us/library/ms191475.aspx). The TABLOCKX hint will effectively prevent most of this problem - with the exception of table scans that use NOLOCK or the READ UNCOMMITTED transaction isolation level.
3) If the plan for the update includes parallelism, you'll definitely get incorrect results as well. I understand from the comments to your article that I did read that you tried to repro this yourself. This is a bit unfortunate, for (as far as I know) no current version of SQL Server will ever use parallellism for data modifications; this is so far limited to selecting only. However, with the increase of multi-socket and multi-core servers, I think it's only a matter of time until Microsoft does add the possibility of parallel plans for inserts, updates, and deletes. At that point, your code will break.

I have tried multiple times to break your code using one of the first two risks. I failed. I was unable to force an unordered scan for the update; I never figured out if that's because of something in my desktop, or because SQL Server understands that this update needs an ordered scan. I was also unable to repro a merry go round scan on my desktop - both for this update and for two near-simultaneous select statements, so that is probably due to something in my desktop.

So, I have to applaud you for presenting a very efficient and, based on tests so far, apparently very safe solution. It IS indeeed a very clever and creative solution. BUT ... in production code, "apparently very safe" is not enough. One day, Microsoft might tweak their optimizer a bit, and if that affects the way this update is processed, you'd be SOL.

If an important report for upper management shows incorrect figures, I desperately want to be able to point to Microsoft and say it's *their* fault for building a database that doesn't work as documented. Since the behaviour you're using here is not documented, I'd be unable to do that if stuff starts hitting the fan, and it'll be *my* head on the chop-block.

Do you rmember how everyone started creating views with TOP 100 PERCENT and ORDER BY in SQL Server 2000, so that they needed no ORDER BY when selecting from the view? And do you also remember what happened when SQL Server 2005 had an improved optimizer that understood that TOP 100 PERCENT when ordered by something is just everything, so that no ordering was required?



Hugo Kornelis, SQL Server MVP
Visit my SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis
Post #540217
Posted Thursday, July 24, 2008 8:57 AM
SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Thursday, March 20, 2014 7:32 AM
Points: 5,181, Visits: 1,368
Good one Hugo.........


Post #540242
Posted Thursday, July 24, 2008 5:30 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 9:48 AM
Points: 35,950, Visits: 30,232
Thanks for the feedback, Hugo,

I explained why a merry-go-round scan is not possible for this in the article. I even demonstrated what a merry-go-round scan is... updates don't do merry-go-round scans on Clustered indexes especially when the index "hint" forces the index to be use. I put "hint" in parenthesis because when it comes to index hints, it's not a hint... it's a mandatory directive. MS even warns against such a practice for "normal" queries because it overrides the optimizer and most people aren't smarter than the optimizer.

I also appreciate your extreme testing and reporting back on you found. That's awesome. Thanks.

If anyone can cause the code to fail on an unpartitioned table, I'd sure be interested in seeing it... I'm not beyond publishing a retraction of my findings if I'm proven wrong or publishing an update to the article for an additional "exception."

Thanks folks.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

"Change is inevitable. Change for the better is not." -- 04 August 2013
(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #540619
Posted Friday, July 10, 2009 1:32 PM


SSC-Insane

SSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-Insane

Group: General Forum Members
Last Login: Today @ 6:50 PM
Points: 20,455, Visits: 14,074
Great stuff/thread. Wish i could have made it to that Pass session.



Jason AKA CirqueDeSQLeil
I have given a name to my pain...
MCM SQL Server


SQL RNNR

Posting Performance Based Questions - Gail Shaw
Posting Data Etiquette - Jeff Moden
Hidden RBAR - Jeff Moden
VLFs and the Tran Log - Kimberly Tripp
Post #751359
Posted Saturday, October 02, 2010 5:43 PM


SSCrazy Eights

SSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy Eights

Group: General Forum Members
Last Login: Yesterday @ 4:56 PM
Points: 8,271, Visits: 8,717
The chosen answer is wrong, the right answer is "it depends". The whole point of the "For Update" mark on a cursor declaration is to ensire that ITU locks are taken early, not late; in cases where there is risk of deadlock, this minimises the both the frequency of deadlock and the amount of work that will be thrown away (rolled back) when deadlock occurs.
It's not surprising that it's easy to demonstrate that a cursor definition designed to optimise performace in cases where there is significant c=oncurrent access and risk of deadlock will not be optimal when it's the only query being run against the data concerned. It's also not useful to make that demonstration.


Tom
Post #997255
« Prev Topic | Next Topic »

Add to briefcase ««12

Permissions Expand / Collapse