A Case FOR Cursors...

  • Cursors are a tool, if used properly they can be very effective but like most tools if use improperly they can be very terrible.

  • Having read the entire article and all responses, I'm wondering if people benefited at all or just managed to strengthen their ego positions. I think a much more intelligent approach is to appreciate the various attributes of each solution and pick the correct one for a given situation.

    SET Solution:

    Usually short duration unless using most of the host resources.

    Possibly very high disk, memory, log usage.

    Can't always write SET processing when the problem is very complex and each row needs considered separately.

    Monolithic SQL statements that need fully rewritten if rules slightly change.

    Shorter transaction time if the whole batch needs to be one transaction.

    LOOP Solution:

    Can work in "background mode" while allowing OLTP to be responsive.

    Can handle very complex queries requiring procedural handling.

    Much easier to modify in the future.

    Can work in a minimal resource usage mode.

    Can avoid expanding logs if in SIMPLE mode.

    I would like to complain about CURSORS though, since I think the language is kludgy and adds no power over simpler WHILE loop constructs that don't need deallocated.

  • It is interesting to see cursors compared to GOTO statements, as both are bad primarily because they make it easy to be a lazy programmer, instead of encouraging better habits. I avoid absolutism on whether statements should be "abolished" or "never used," but when starting out in programming, it did take a little nudge to get me to start thinking about correct flow control and then thinking set-based.

    Anyway, I've definitely worked in an environment where we occasionally needed RBAR operations, but because the person doing code review read "cursors are bad" they blanched at some code but the same action taken with a WHILE loop was accepted. I think the moral of the story is that you will occasionally hear some "gospel truth" about something you shouldn't do; but if presented with a compelling exception, you should weigh your alternatives. At the same time, especially if you are just starting out in the RDBMS world, know that these guidelines are good ones that are encouraging you to find the (initially un-intuitive) better set-based ways.

  • Bill Talada (6/1/2015)

    I would like to complain about CURSORS though, since I think the language is kludgy and adds no power over simpler WHILE loop constructs that don't need deallocated.

    Bill you know I am not a huge fan of cursors in most situations but I disagree with your sentiment here. There are a number of options that are available in the cursor declaration which offers quite a bit of control over how the looping is handled that isn't available for a while loop. Don't read into this that I would always choose a cursor over a while loop in those pretty rare cases where I use looping either. They both have their place and you have to evaluate the situation you are coding for to know which is the better option.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Here's a similar article by Aaron Bertrand with some good performance analysis: http://sqlblog.com/blogs/aaron_bertrand/archive/2012/01/26/the-fallacy-that-a-while-loop-isn-t-a-cursor.aspx

  • Dangit I had this idea for a blog post idea.

    Anyway, I agree with the article and the analysis. You shouldn't not use cursors just because it's taboo or because someone said so or because you don't like how it works behind the scenes. As the article and others have said, there are plenty of operations where the engine uses cursors. Likewise, you should not use cursors for RBAR operations as it usually isn't effecient.

    As with anything, test different approaches if you can. In some cases a cursor might be faster for what you need to do.

  • Ken Messner (6/1/2015)


    There are times in system design when one may choose to implement significant chunks of business functionality in T-SQL, as opposed to a non-database language. The reasons for doing so are beyond the scope of this post. When programming business rules and logic in a non-database language, one frequently uses non-set operations. When one is programming those same rules and logic in T-SQL, it often makes very good sense to use the procedural-programming features of T-SQL, including row-by-row processing of data sets. This is where cursors really shine. For example, I have used them very successfully in data warehouse ETL operations, especially the Transform and Load sections.

    You should at least give one example.

    If you are using Cursors for ETL operations, you should really be using Integration Services packages, which can give the same result, but do it far, far more efficiently.

    Gerald Britton, Pluralsight courses

  • Bill Talada (6/1/2015)


    Having read the entire article and all responses, I'm wondering if people benefited at all or just managed to strengthen their ego positions. I think a much more intelligent approach is to appreciate the various attributes of each solution and pick the correct one for a given situation.

    SET Solution:

    Usually short duration unless using most of the host resources.

    Possibly very high disk, memory, log usage.

    Can't always write SET processing when the problem is very complex and each row needs considered separately.

    Monolithic SQL statements that need fully rewritten if rules slightly change.

    So would the logic in a loop

    Shorter transaction time if the whole batch needs to be one transaction.

    LOOP Solution:

    Can work in "background mode" while allowing OLTP to be responsive.

    Can handle very complex queries requiring procedural handling.

    Much easier to modify in the future.

    I don't think I disagree more with that statement! I cannot stand reading cursor-based code. The first thing I usually do is find a way to replace it with set-based logic, even if I have to use dynamic SQL to do it. If I see nested cursors, I usually go back to the business rules and rebuild the code from scratch.

    Can work in a minimal resource usage mode.

    Can avoid expanding logs if in SIMPLE mode.

    ... as can Set-based code

    I would like to complain about CURSORS though, since I think the language is kludgy and adds no power over simpler WHILE loop constructs that don't need deallocated.

    Simple While loops can be (often are, actually) worse than cursors from a performance perspective.

    Gerald Britton, Pluralsight courses

  • Jeff Moden (6/1/2015)


    While I appreciate the hair the author had to say that not all cursors are bad, I find this article woefully short of any meat on the subject.

    I agree. One star. It seems as if the author's intent was more to stir things up than it was to provide any usable advice. To those that complain about readability, please grow up and learn your trade. If you don't understand a complicated query written by someone else, break it down, analyse it, and add comments until you do understand it. Post it here and ask questions if you have to. If you hired someone who can't maintain or understand existing code, you may want to re-evaluate your selection process. If you have someone who can't properly format and comment their code, you may want to re-evaluate your code review and/or training policies.

    Anyone can take row by row or set based T-SQL and make it difficult to understand and maintain, or easy to understand and maintain by changing formatting, aliases, and comments. What's so magical about loopy code that makes it more readable?

    James Stephens (6/1/2015)


    There are other cases for cursors as well. First, let's get real here and everybody admit that we're not all working on high-speed trading systems where every nanoseconds counts. I have seen SOOOO many ridiculous SQL acrobatics done "Just because you can". Statements that are undreadable and unmaintainable. Ultra-long lines of SELECTs with nested, embedded CASE statments, complex AND and OR operators where the slightest mistake the position of a paren will still allow the statement to run, but UTTERLY DESTROY the intent of the statement.

    This is not maintainable code. Just because some hot-shot said "Hey, I can do this in one SQL statement. Look at me--oh--and please, protect your knees as you cow-to my brilliance", and and he/she's spent hours (if not days) constructing it and finally getting it wor work--even they may not understand that in one week. When the business rules change, and you've got to move those parens around and change this AND to an OR and deal with all that???

    No.

    Sometimes top-down procedural loop on a Cursor is the more easily understandable and maintainable way to code something. Sure, like I said--you've got to consider performance but again--most of use can afford the the perf penalty in return for ease of understanding and maintenance in at least some of these cases where SQL-Peackocks write some of this unecessarily unreadable code.

    And let's also consider the other reality--just because *you* may think you're a sql God--chances are the person maintaining that in 6 months won't be.

    Thank you for injecting some humor into my Monday morning. "sql God"? "SQL-Peackocks[i]sic[/i]"? Really? Because I'm more concerned about database performance than I am about the competence of my successor? If you're letting someone maintain your code who's befuddled by those darn parens and confusing ANDs and ORs, then you deserve the consequences of their efforts.

    OF COURSE changing the position of a paren can radically change what the statement does! That's what they are for. Open the hood of your car and swap two spark plug wires around. OMG your engine may still start, but it's performance has been UTTERLY DESTROYED. What a bad design! Now use that as an argument that horses are better because they don't have those confusing spark plug wires for an incompetent mechanic to mess up.

    If you have your befuddled developer re-write that complicated and tricky set based code as a row-by-row "solution", I'll bet the results will be more buggy and less maintainable than before and the code will still have those confusing parens, ANDs and ORs that need to be in just the right place. I almost forgot, it will also probably gobble up more than a few extra nano-seconds, too. But who cares if your end-users have to stare at that spinning hula hoop a little longer or if your system now gets all these mysterious deadlocks. It's not like it's a high-speed trading system or anything.

  • Bill Talada (6/1/2015)


    ...

    SET Solution:

    ...

    Possibly very high disk, memory, log usage.

    Relative to what? Do you have any examples of a specific problem where a cursor, loop or recursive CTE generates less IO or requires less system resources than a well-written set-based query?

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • When dealing with large data that needs to be updated differently, row by row, cursors can have much higher performance for a couple reaons:

    1.

  • Cursors can have a higher performance in many cases where dealing with large updates. There are a couple reasons for this:

    - The deleted and inserted tables used by larger triggers may have no indexes or poor indexes. That alone can make all the difference in the world when updating large chunks of data

    - The locking when updating huge chunks of data in a single set operation in a production system can be prohibitive. Updating them one by one has little impact.

    You can always throw more hardware at the problem, but with equal hardware by actual test cursors can have better performance than set updates for large amounts of data. Just use them well.

  • I've had much better luck historically by creating a [font="Courier New"]#TempTable[/font] with a [font="Courier New"]Done = cast(0 as bit)[/font] extra column.

    That way I can review the results at the end of the process and verify what data is going in. I cycle through the RBAR with a

    WHILE (SELECT COUNT(*) FROM #TempTable WHERE Done = 0) > 0

    BEGIN

    SELECT @Col = MIN(Col)

    FROM #TempTable

    WHERE Done = 0

    <process here>

    UPDATE #TempTable

    SET Done = 1

    WHERE Col = @Col

    END

    This has always worked well for me and I've yet to meet programmers or civilians who couldn't follow the pattern.

  • ray 64276 (6/1/2015)


    When dealing with large data that needs to be updated differently, row by row, cursors can have much higher performance for a couple reaons:

    1.

    LOL!

    +1000

    Gerald Britton, Pluralsight courses

  • ray 64276 (6/1/2015)


    Cursors can have a higher performance in many cases where dealing with large updates. There are a couple reasons for this:

    - The deleted and inserted tables used by larger triggers may have no indexes or poor indexes. That alone can make all the difference in the world when updating large chunks of data

    A cursor will not run any faster in that situation. There has to be a loop somewhere (to perform a table scan). It can either be in the engine or in your code. A loop running in the engine will beat your code every time

    - The locking when updating huge chunks of data in a single set operation in a production system can be prohibitive.

    not if you use appropriate isolation levels.

    Updating them one by one has little impact.

    except making the cursor-based update take forever

    You can always throw more hardware at the problem, but with equal hardware by actual test cursors can have better performance than set updates for large amounts of data. Just use them well.

    Actually, I challenge that claim. Please post some actual results.

    Gerald Britton, Pluralsight courses

Viewing 15 posts - 31 through 45 (of 215 total)

You must be logged in to reply to this topic. Login to reply