Cursors - need to help to explain why to not use them

  • I'm a junior DBA and know that everyone book and DBA guru says to rarely use cursors because their not as efficient as the set-based approach and the cursor logic goes against what SQL is designed for. However a programmer/co-worker recently gave me heat because I rewrote his procedure to be set-based rather than use a cursor to perform INSERT and DELETEs. It's a delete and archive procedure that basically moves old order-related data from several source tables to corresponding archive tables. The databases size usually are around 1.5 million records.

    He likes the cursor because it's reliable, although it takes a long time. He has had problems with set based approaches for moving large amounts of data in the past because one time it taxed the database hard. But that's because he was runniing the delete and archive for the first time -- in that case, it had to move a lot of records to delete and archive. The procedure is designed to run daily and when do so fewer records are moved and thus it's less taxing on the database. I have since improved it by limiting the rows in each transaction. But he still asks me to tell him why even the improved set-based procedure is better than the cursor. I can show him countless books that say it's bad, but he's wanting evidence. Any suggestions on what I can present to him to make my argument against cursors? By the way, he thought that I never used cursors because I just didn't know how to program them! Arrogance!


    smv929

  • Cursors are less efficient, but sometimes are the best approach. As for why he chose cursors for this issue, I have an idea. Here's my approach:

    http://www.sqlservercentral.com/columnists/sjones/batching.asp

  • Well, There will be many instances where Cursors come handy and even performance efficient while doing batch updates, so I believe that outrightly we cannot make a statement that Cursors are inferior to set statements, however it has to be decided on a case basis.

    Thanks

    Prasad Bhogadi
    www.inforaise.com

  • Yes as mentioned, try to avoid believing that cursors are always bad. Sometimes their performance can be as good as, if not better than other approaches. However, when using them you really should try to explore other ways of doing things as more often than not, they're not really required.

    I think the main problem with cursors is that they present a certain way of thinking. That way of thinking for some people can be easier to visualise than other approaches. However, many times there is a set based solution to what the cursor is being used for. I think the best analogy I've heard is to think of the task of putting sugar in a coffee. You can use a pair of tweezers and take one grain at a time or you can use a tea spoon and do it in one go. You want to make sure you select the tea spoon approach. The tweezer approach might be technically correct but you're leaking performance. So when using a cursor see whether you can do bulk processing on the records you are feeding into your cursor rather than against each record in turn.

  • An example I can give is that I turned a 27 hour legacy stored procedure (Oracle, running on some pretty big hardware) into a 5 second stored procedure, by switching it from cursor to set-based. Those are the actual, non-inflated numbers. My manager would not believe me when I told him, and said bullsh*t several times over after hearing me say 5 seconds. But he didn't make me switch it back.

    I have met many DBAs who write cursor-based routines almost exclusively. Like your guru who assumed you didn't know cursors, I assumed they don't know GROUP BY But of course everything has its place in this world, sometimes cursors are the way.

  • Thanks to all for the respones. Please don't misunderstand me. I do actually believe cursors are good and do consider them. However, I tend to 95% of the time envision a set-based technique that works great, so I rarely have to resort to them. Whereas this other guy (and other programmers here) always suggest a cusror first because I think they are not comfortable with UPDATE/DELETE joins or ignorant of using CASE in such statements or derived tables. I was just frustrated because of their SQL ignorance and arrogance assumingthat I just don't know how to program!

    I explained to him the moving sugar analagy (with a tweezer vs. teaspoon) and that performance is better with a teaspoon. He basically argued that if the data is too big, a shovel instead of a teaspoon would be used which would kill the performance. I told you limit the transactions to a tablespoon, but he didn't buy it. I guess I will have to time both techniques and grab statistics to show him.


    smv929

  •  

    From my experience, use cursors only when there is no set based solution. And trust me, there are quite a few where there is just no set based solution (for example list files in each database from sysfiles, or users for that matter). If there is a set based solution, that beats the cursors hands down, every time I used. Even I used to use cursors until I have seen some very elegant solution from SQL gurus on this forum and other places that do not require cursors. One caveat is that when your using a set based solution  brings more data into memory and tempdb than the system capacity I may use cursors to stay within resources. Once again I see this as a system resource limit than the set based solution limit. One example is if you have three  very large tables (10 mil each ) a set based solution joins them and criteria requires about 1 million records from each table, your temp db should hold 1 mil * 1 mil * 1 mil rows. If you do not have that space, you may limit the query to go againt N records each time using the cursor depending on the memory and tempdb space.

  • I've also heard that cursors are generally not fast, but for many of the things I've been writing lately, I've used cursors a lot.  An alternative I've come across is using a table variable, and performing what I need on the table records one by one, essentially mimicking a cursor. 

    In addition, can anyone shed some light on cursors vs. table variables, where each is stored, and the pros/cons of each.  It seems like they could be used equivalently, so I'd like to know how to choose which is better for what I'm doing.

    Many times, I need to do several things with each record (insert/update records in several supporting tables based on the cursor record and/or values derived from values in the cursor record), and that's why I've been using cursors.  So far, I haven't run into any speed issues; however, my cursors typically hold up to perhaps 30,000 records.  However, if anyone has any better approaches, I'd like to hear them.

  • I have similar purge/archive routines.  In dev environment I can run these as single large delete, and they run twice as fast as when cursoring thru and deleting 1 row at a time. 

    But in busy prod environment, single large delete (not that big, say 20-50k rows from 10M row table) never finishes.  That is to say I have never waited long enough to let one of them because a) it takes long time, and b) it causes massive blocking. 

    I think you cannot win your argument, in general, because there are times when you cannot exec one big stmt, either takes too long or blocks too much, or grows txlog too much, etc.  One could argue that your co-worker's approach at least will always perform reliably & predictably, whereas yours may or may not work at all.  But you should be able to prove one way or the other whether yours works and which approach is faster in your environment.

    Be sure to run your test either in prod or prod-like environment to mirror contention issues that will face your real routine.  Measuring IOs may show no differences, I'd be interested in time duration and blocking impact.  Post your results!

  • Just to add another twist in the story.

    We have a regular process that archives records on an hourly basis from a table that has 10mil+ rows. Each hour approx. 1000 records get archived.

    This is all done without cursors using simple a WHILE loop and limiting the number of rows using SET ROWCOUNT x.

    This gives the best of both worlds, set based insert/delete's as well as a cursor like process that minimises impact.

    --------------------
    Colt 45 - the original point and click interface

  • Although I use cursors regularly when they seem to apply, the most prevalent performance issue that I get called in on for vendor applications is the use of cursors for updating very large tables when batch alternatives exist.  Vendors often test using 1000 row tables where performance looks good even if executing 8-10 SQL statements per row.  On a 20 million row table, we're now talking about executing 160-200 million SQL statements, and performance gets severely degraded.

     

    Can't discard cursors - they are too valuable - but they do carry a significant performance overhead.

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply