Cursors Be Gone!

  • Phil Factor (6/18/2010)


    [p]I thought that it was a good idea to re-feature the article, because it provoked an interesting debate, and I enjoyed reading through all the points that were made. It is a fascinating thread.[/p][p]Maybe, with re-featured articles, we ought to pop in an editorial note for the benefit of newer subscribers saying 'be sure to read the discussion that followed'. We old-timers tend to instinctively click through, particularly if the article looks at all controversial.[/p][p]Anyway, many thanks to Gaby Abed, and all the others who take the time and effort to contribute articles. Long may they continue to do so! (and thanks to Steve for re-featuring this article)[/p]

    Phil (or anyone else),

    I did not catch this article the first time around and did not realize it wasn't current till I got the page in the discussion that referenced Hugo’s blog. I'm only a little less then half way thru the pages and as much as I’d like to read them all I do have to do some work during the day and so I have a question or 2 I was hoping you could answer.

    First and foremost is did JAR (Jonathan AC Roberts) and others like him, but mostly JAR ever post anything faintly resembling an apology to Jeff Moden for the harsh bashing and outright rude posts they made even after Jeff’s replies were always more than polite and far more so then those he replied to deserved? I couldn't believe the level of harshness and negativity some of these posters like JAR made during the discussion. It was as if they were trying to be as big a jac…, well you get the point.

    I'm also curious if the

    “Hey there’s nothing wrong with a Cursor or 2 or 3 or everywhere in your code”

    crowd ever got-it and clued in to what Jeff and you and several others were trying hard to convey about Cursors in the SQL language?

    Thanks

    Kindest Regards,

    Just say No to Facebook!
  • To all who are pro-Cursor fans that for whatever reason just refuse to listen to or simply can’t grasp the concept and therefore choose to ignore the advice that many here (like Jeff Moden and Phil Factor) have tried to offer up; all I can say is keep doing just like you are. Please keep using cursors as much as you can. This way you’ll generate more job opportunities for the rest of us who are smart enough to take advice and listen to those who know what they are talking about.

    In addition, kudos to Gaby for at least trying and being a good sport in spite of the level of nasty replies he got form some. It’s fair easier to be an armchair DBA/critic then to actually put something out on the site for review. It is the nasty comments and non-constructive comments that help keep away what might be good articles from users who are hesitant to write something b/c they’ve seen the hack jobs some will do with their ugly posts.

    Constructive criticism is good; acting like a self-righteous you know what does nothing but let the rest know what to expect from you.

    Kindest Regards,

    Just say No to Facebook!
  • TheSQLGuru (6/18/2010)


    Myself and a number of other MVPs have been hitting Microsoft up to get them to make cursors perform much better on the platform, which would be a benefit to MANY organizations out there, especially those with legacy and/or ISV apps that use them heavily and always will. Keep your fingers crossed!

    I'd actually hate to see that happen to such a nice declarative language. 😉 It would be much nicer if they'd spend the time return some of the functionality they took away betwee 2k and 2k5 and to fix some of the... ummmm.... "incomplete" functionality they've added both is SSMS and T-SQL.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Every SQL book I ever read or course I ever took had somewhere in it the admonition to avoid cursors wherever possible. Back in the dawn of SQL time (talking 6.5 here - and yes, I am that old), I used WHILE loops exclusively.

    Then another developer pointed out to me that if looping through a collection of records, a WHILE loop has to execute the query for every iteration through the loop. However, a cursor would execute the query once and then traverse through the result. Frankly, I didn't believe and we argued rather hotly about it until he produced a lab that proved performance in a cursor was far better than in a WHILE loop - at least for the type of coding we were doing.

    After a bit of pondering, I came to the conclusion that all the bad press about cursors was really that SQL works best in set-based operations. But sometimes, you have to iterate through a result set of some kind. In some circumstances a cursor would work best and others a WHILE loop.

    For me, the moral of the story is that good and bad is relative when it comes to coding logic and you have to keep an open mind ready to experiment to find the best solution.

  • Why use cursors in any guise? You have a giant pile of data and you want to go through it and take an action on each row? Sometimes the right tool for the job is a set-based dump truck that moves the whole pile in one shot. With proper indexes and ideally tuned hardware, multi-million row tables can be manipulated in mere moments. Other times there might be some perception that one spoonful of data must be examined at a time. We talk about making the spoon-by-spoon operation fast enough to compete with the dump truck and sometimes it does win. However I wonder if there is another approach; let's stop sifting through piles of data every time we want to answer a question. When the same question is asked daily/weekly/monthly, we should anticipate that and plan a good solution in advance.

    Transactional systems generally are small numbers of rows per transaction and the status changes occur in response to transaction activity. Rather than throwing all these transactions into a heap for later analysis and reporting (a resource intensive task) would it be possible to architect data storage so the follow-on from the row-affecting action flows in response to each single point of activity? Is there a modified schema that could allow the reporting system to be as up-to-date in nearly-realtime as the these bulk-load and brute force operations currently take to complete?

    Should we maybe consider that the choice is not simply RBAR vs Set-based but could be a different kind of data structure altogether? How many OLTP systems are strained beyond necessity because OLAP is not as well documented and familiar to those who have to "get it done; today." Seems like the price we pay to get it done today may be to have to work the same way tomorrow too.

  • GabyYYZ (6/18/2010)


    The two main points about this article for me, as I see it NOW, are:

    1. Do I still think cursors are, how shall we say it, sub-optimal? For large amounts of data, they may not always be the best solution, but they definitely have their uses, especially if the solution can't be found with a set-based query,

    and

    2. The answer to 90% of SQL Server related questions is "It Depends!". I'm learning not to say never or always as much as I used to.

    I've made peace with this article and will concede, especially with regards to point 2 above, to never say never, but I also will strive to look for a set-based solution whenever possible first. I hope everyone has gotten some food for thought from the discussion and that's about all I can hope from this article at this point. 🙂

    Trouble is that the wrong culprit is blamed: the problem is not cursor versus no cursor but row-based versus set-based. So the question to start from is "Am I doing something essentially row-based that needs a while loop?". Most of the time the answer will be "No", so the question of a cursor never arises. On the rare occassions when the answer is "Yes", the next question is "will a cursor be the best way to handle sequencing in this while loop?", and most of the time the answer to that question will be "Yes".

    The example in the article is a case where - if you decide to have a while loop - a static [or insensitive] fast_forward read_only cursor will result in clearer code as well as marginally better performance than does populating and depopulating a table variable as advocated in the article.

    So the article is advocating that people (a) ask the wrong question and (b) do the wrong thing (reject the cursor in the case when it is the best solution). So I find it hard to make peace with the article (although I do tell people that cursors are to be avoided like the plague, just to encourage them to be column (as opposed to row) oriented in their thinking.

    Tom

  • Isn't this really an issue for those of us to know when to use and not use a cursor? There are instances where I choose to use a cursor while at other times I use a set based operation.

    Some requirements are easier to put into production using a cursor. I have a computation that took me over two years of testing and trying different things before I could get it into a set based operation. I would not have wanted to hold things up just so I could dump the cursor. Now that I have had time to test, tune and validate the set based operation the results are phenomenal. Still even if I had to use a cursor the total time for the complex operation was just under 8 hours for 6.5 million records. The set based reduced the time to 45 minutes.

    I believe matching your work to business requirements within a required time of completion may force us to at times to use cursors. However, if you read the Microsoft standard documentation, they really want you to use CLR procedures for heavy cursor procedural based operations.

    Either way I think we all should know when to make the move. We just don't need to be lazy and never revisit our cursors just because they "work"

  • Tom.Thomson (6/19/2010)


    GabyYYZ (6/18/2010)


    The two main points about this article for me, as I see it NOW, are:

    1. Do I still think cursors are, how shall we say it, sub-optimal? For large amounts of data, they may not always be the best solution, but they definitely have their uses, especially if the solution can't be found with a set-based query,

    and

    2. The answer to 90% of SQL Server related questions is "It Depends!". I'm learning not to say never or always as much as I used to.

    I've made peace with this article and will concede, especially with regards to point 2 above, to never say never, but I also will strive to look for a set-based solution whenever possible first. I hope everyone has gotten some food for thought from the discussion and that's about all I can hope from this article at this point. 🙂

    Trouble is that the wrong culprit is blamed: the problem is not cursor versus no cursor but row-based versus set-based. So the question to start from is "Am I doing something essentially row-based that needs a while loop?". Most of the time the answer will be "No", so the question of a cursor never arises. On the rare occassions when the answer is "Yes", the next question is "will a cursor be the best way to handle sequencing in this while loop?", and most of the time the answer to that question will be "Yes".

    The example in the article is a case where - if you decide to have a while loop - a static [or insensitive] fast_forward read_only cursor will result in clearer code as well as marginally better performance than does populating and depopulating a table variable as advocated in the article.

    So the article is advocating that people (a) ask the wrong question and (b) do the wrong thing (reject the cursor in the case when it is the best solution). So I find it hard to make peace with the article (although I do tell people that cursors are to be avoided like the plague, just to encourage them to be column (as opposed to row) oriented in their thinking.

    Wow Tom, you really need to whip up on this guy bad ey? Is that he must confess that every word is his story is wrong and you are his SQL superior or what? I think its safe to say by now that everyone realizes what was intended and what is best when approaching the use of Cursors instead of some standard Set based methods to solve some unique problem. Are you done yet with beating up on this article and its author?

    BTW - Your analogy with asking the right question is just as potentially flawed because it assumes that the user will correctly identify the right question to ask and too often those with procedural code backgrounds will favor the curosr/loop angle because its what they know and are familiar with. The majority of problems that arise from incorrect use of cursors in our world isn't from an excess use of set based methods where cursors would be better but the complete opposite of that.

    Most DBA's worth their wieght in gold push the "avoid cursors at all cost" line because they know that far too often someone who is working on SQL development who has any type of background in procedural programming is going to over use cursors; not under use them.

    Kindest Regards,

    Just say No to Facebook!
  • YSLGuru (6/21/2010)


    Wow Tom, you really need to whip up on this guy bad ey?

    It my be just me but it doesn't look that way to me. Gaby cited some lessons learned and Tom cited another possible lesson or two. I don't see Tom picking on Gaby and I believe that Gaby sees that, as well (although I could be mistaken). It's just another dialog spawned by this particular article... not a butt chewing. 🙂

    Gaby... well done on having big shoulders.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (6/22/2010)


    YSLGuru (6/21/2010)


    Wow Tom, you really need to whip up on this guy bad ey?

    It my be just me but it doesn't look that way to me. Gaby cited some lessons learned and Tom cited another possible lesson or two. I don't see Tom picking on Gaby and I believe that Gaby sees that, as well (although I could be mistaken). It's just another dialog spawned by this particular article... not a butt chewing. 🙂

    Gaby... well done on having big shoulders.

    Thanks Jeff and YSLGuru...lessons definitely learned, experienced does that to you, especially since I initially wrote the article. Nuff said. 🙂

    Gaby________________________________________________________________"In theory, theory and practice are the same. In practice, they are not." - Albert Einstein

  • If I can add another more physical perspective to this, I'm afraid that I've found that the temp variable solution can be *vastly* slower than cursors. The time it takes to reserve and allocate all that memory on a server which doesn't have a massive amount spare can be considerable. Doing this on a congested server and then looking at the plan you get CLUSTERED INDEX SCAN 1%, TEMP TABLE INSERT 99%

    We made the mistake of doing something similar using the exact same reasoning on a highly-trafficked SP. By the time your table gets to about 100,000 rows, that's about 400k to reserve if you only have a single column in your temp table. If you have a few hundred users, all using such a procedure on a table of over 250,000 rows, your working set bounces up and down like a yoyo, and the search can time out quite easily as pages are stolen left, right and centre. When the pages end up being stolen from the index cache, Fooom...

    It's a good idea if you *really really hate cursors*, but sadly there doesn't seem to be one catch-all solution. Adding FAST_FORWARD does help a lot, I've found.

    HTH

    N.

    p.s. Adding PRIMARY KEY to your temp table (for small loops) does improve things a lot.

  • Nick Walton (6/22/2010)


    If I can add another more physical perspective to this, I'm afraid that I've found that the temp variable solution can be *vastly* slower than cursors. The time it takes to reserve and allocate all that memory on a server which doesn't have a massive amount spare can be considerable. Doing this on a congested server and then looking at the plan you get CLUSTERED INDEX SCAN 1%, TEMP TABLE INSERT 99%

    We made the mistake of doing something similar using the exact same reasoning on a highly-trafficked SP. By the time your table gets to about 100,000 rows, that's about 400k to reserve if you only have a single column in your temp table. If you have a few hundred users, all using such a procedure on a table of over 250,000 rows, your working set bounces up and down like a yoyo, and the search can time out quite easily as pages are stolen left, right and centre. When the pages end up being stolen from the index cache, Fooom...

    It's a good idea if you *really really hate cursors*, but sadly there doesn't seem to be one catch-all solution. Adding FAST_FORWARD does help a lot, I've found.

    HTH

    N.

    p.s. Adding PRIMARY KEY to your temp table (for small loops) does improve things a lot.

    I have found that when the developers or app. support folks send me a script to, say, clean data by doing a join on a temp table, the time spent creating a clustered index is well worth the time saved with the join. Many don't do that, although a few I've worked with closely know better and anticipate my "Did you create an Index?" question.

    Gaby________________________________________________________________"In theory, theory and practice are the same. In practice, they are not." - Albert Einstein

  • Here's another way. This one uses a range provided by a numeric column, even better if it is indexed. No need for counts or deletes.

    -- we use a range provided by a numeric column

    -- even better if the column is indexed

    declare @query varchar(100),

    @dbname sysname,

    @dbid int -- to iterate through the range

    set @dbid = 4 -- start of the range (in this case to skip system databases)

    select @dbid = min(database_id)

    from sys.databases

    where database_id > @dbid

    while (@dbid is not null) -- this condition here can be customized

    -- to change at a specific value

    begin

    -- do your thing here

    -- ====================================

    select top 1 @dbname = dbname

    from sys.databases

    where database_id = @dbid

    select @query = 'dbcc checkdb(' + quotename(@dbname) + ')'

    exec(@query)

    -- ====================================

    -- fetch next value in the range

    select @dbid = min(database_id)

    from sys.databases

    where database_id > @dbid

    end

  • It depends....

    If you are looping thru an OLTP system, then there is locking overhead with a cursor that you want to avoid.

    If you are looping thru a reporting system that is basicly read often, update daily, then a cursor could scale well, better than the temp table if your tempdb I/O is either slow or under stress.

  • YSLGuru (6/21/2010)


    Wow Tom, you really need to whip up on this guy bad ey? Is that he must confess that every word is his story is wrong and you are his SQL superior or what? I think its safe to say by now that everyone realizes what was intended and what is best when approaching the use of Cursors instead of some standard Set based methods to solve some unique problem. Are you done yet with beating up on this article and its author?

    No, I don't need to whip up on Gaby, I'm well aware that he understands the issues as that's made clear by his response to comments made last year, when the \rtile was first published. But I do think it's very important that people don't start by asking "cursor or no cursor" but instead "set-oriented or row by row iterative" (as one can write row by row iteration in a single query without resorting to a cursor the distinction bewteen the two questions is non-trivial). The discussion so far doesn't really make that point clea. It's also important that on those rare occassions when the answer to that starting question is "iteration" people don't discard the cursor option in favour of inferior iterative methods like the temporary table one illustrated in the article - and (unless I've missed somethiong) only Hugo had commented on that, so I thought it would be useful to make the two points together in one comment.

    If Gaby thinks I was whipping up oon him I'm sure he'll let me know and if he does I will apologise profusely since that is certainly not an impression I wanted to give him.

    Tom

Viewing 15 posts - 256 through 270 (of 272 total)

You must be logged in to reply to this topic. Login to reply