Eliminating Cursors

  • >>I don't think you understood... the whole idea is to get rid >>of cursors and other RBAR methods in SQL Server.

    My point is that the idea of "getting" rid of cursors/justifying their non-existence dates back to when there weren't any cursors in the SQL Server world to justify.  And the code that started this thread is nothing new but dates back to when cursors did not exist in the SQL Server world. 

    Engineers should be making use of all available tools when applicable.  And running a cursor, particularly against a temp table, specifically when there is no logical way to perform a set operation, is a valid application of an available tool.

    After we all get done discussing this topic, lets move onto something even more exciting, like 3rd normal form!!!

     

     

  • Ummmm.... wanna tell me what version of SQL Server there was that didn't have cursors?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • You hit the nail on the head: "And running a cursor ... specifically when there is no logical way to perform a set operation, is a valid application of an available tool."

    Very, very, very rarely is a cursor absolutely necessary.  There is almost always a way to conquer T-SQL tasks using T-SQL, sans cursors.  Over-reliance on cursors is a fallback position for many people who come from backgrounds in flat-file processing, and want to use SQL server as a flat-file parsing tool.

    If all you want to do is process data flat-file style, there are much better tools available than SQL Server cursors.

  • Great discussion!  As usual there are many ways to solve a problem and each of us has to determine based on our own experience how best to analyze and address them as they arise.  Cursors are just one tool in our bag and though I would never use them as a first choice I guess I wouldn’t throw them out entirely …but then you should see my garage. 

     

    As others have pointed out Cursors are notoriously inefficient in SQL Server.  As a general rule you want to avoid processes of long durations not only because they consume more resources but also because the longer a process takes the more potential that exists for blocking and other problem conditions to arise.  I offer the following true case history as an example:

     

    In the late nineties I was hired by a national distribution and warehousing company to assist them in a project to upgrade their SQL Servers from 6.5 to 7.0.  The critical application was an order processing system.  Orders were entered on AS 400’s and consolidated on an HP 3000.  They were then imported via flat file batches several times per day into a SQL Server where business rules were applied and orders were replicated to the various warehouses around the country for processing and shipping.  Shipping status was updated and files then replicated back to headquarters.

     

    My first thoughts were that while the upgrade would no doubt result in some improvement in performance there were most likely other factors related to the code and/or architecture that were likely at the root of their problems.  I told them that I would certainly assist in the upgrade but would also like to analyze the code to see if there were any obvious issues that could be causing their problems.  Convincing them to modify the code was going to be a challenge since it was written by the current IT manager whom I later learned had been a cobol programmer earlier in his career.

     

    When I finally got down to business and began looking at the code I discovered cursors everywhere. …cursors in the code that applied the business rules of course but also cursors during replication and cursors during the update of shipping status.  I knew immediately that the upgrade was not going to meet their expectations but I met with a lot of resistance from management.

     

    When the upgrade was complete processing times did improve somewhat but we were still experiencing significant blocking at various times especially during replication and status updates.  I finally convinced them to let me rewrite the process.  Following analysis of the code I restructured the staging tables and was able to break the code down into 16 processes that applied the various business rules and pre-processed the order records prior to replication.  I also modified the replication process to eliminate most of the code including all of the cursors by moving the logic forward into the pre-processing code.

     

    When the re-write was complete we ran a series of tests demonstrating that the new processes performed the same function as the old and produced identical results.  A typical batch of 10,000 orders that took just under 2 hours to process from bcp in to replication complete took approximately 2 ½ min. using the new code.  This improvement proved to be extremely important.  The old process meant that orders could only be processed every two hours and that a maximum of 240,000 records could be processed in a 24 hour period.   While this was not an issue most of the time it did have a negative impact during the peak summer months when retailers were placing their orders for the coming Christmas season.  Downtime in the year prior to the upgrade and rewrite had been in the hundreds of hours (at an estimated cost of $26,000/hr).  Downtime in the year following the upgrade was 0 hours. 

     

    I only wish I had negotiated a contingency contract instead of an hourly rate.

     

    -ron cicotte

     

    Ron Cicotte
    ron.cicotte@gmail.com
    Data Transformations, LLC

  • I'm getting into my "Way back machine" here, but I think support for cursors was introduced in 6.5.

    /*****************

    If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek

    *****************/

  • Cursors existed in 4.2 (Now Im showing my age) and as far as I could remember could achieve results we could not get any other way. SQL server was extensively re-writen for 7.0 which introduced much better solutions but left cursors as an option and they remain today, we have had regular examples of cursors being replaced with set based operations resulting in run time going from a number of minutes to just a few secs...

  • Thanks, DC and Ieea,

    I always thought cursors were a part of SQL Server... didn't know they were actually added as a "feature" at some point.  Must'a been to draw the folks that couldn't get over the "read a record, write a record" paradigm (or is it 4 nickels :hehe in the ISAM world.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I think that in your article the biggest difference in speed between cursor and while loop exists because of use of temporary table. Have you tried to use cursor instead of while loop in your second code example? Could you do it and comment results?

    I know that for same kind of select statement especially when we join multiple tables to get result, temporary table can improve performance because the number of records used in query can drop down dramatically. And maybe in your case (you grab data for cursor from multiple tables) you can speed up process without replacing cursor by while statement.

    I do not know what are you doing in stored procedure which you called inside cursor/loop but sometime is good to change the logic and check if it is not possible to replace cursor-stored procedure call to just stored procedure with select statement inside it which will do work without cursor at all.

    Thanks

  • I fought against the inclusion of cursors in the original SQL spec, but COBOL programmers of the late 70's had great difficulty wrapping their mind around a technique that did not allow them to touch every row.

    As Jeff Garbus puts it, "Cursors are the last refuge of those who refuse to accept set processing." 

    Hierarchical structures are the only situaions that I have seen in 30 years of working with RDBMS. Suffice it to say that if you are using the relational tool to solve hierarchical problems you're using the wrong tool. Finally, XML has come to the rescue.

    lfernquist@vurv.com

  • The original code for SQL Server was developed by Sybase. Microsoft, Ashton-Tate and Sybase collaborated on the first version of SQL Server (4.2) built for OS2 in the late 1980's (yes, I'm old enough to remember it).  At v4.2 Sybase and Sql Server were virtually identical in syntax and functionality.  Sybase has since renamed their product Adaptive Server Enterprise. 

    So SQL Server has always had cursors because of its origins.  But set based queries are orders of magnitude more efficient.  I once re-wrote an entire order processing system originally written using cursors* to use set based queries. Prior to the re-write it would take approximately 1.5 - 2 hours to process 10,000 orders.  Post re-write the same process took less than 2 minutes.

    -ron

    *Note: written by a former cobol programmer

    Ron Cicotte
    ron.cicotte@gmail.com
    Data Transformations, LLC

  • Does anyone have a link to some examples of scripts using cursors and the same scripts without the cursors, I used to have some good examples but can't find them now 🙁

  • I bet this is the last time anyone brings up the subject of cursors!!

  • bet its not 😉

  • I tried this solution. It gives a reduction with small set of rows. From 10000 rows it is just as fast. I tried it with a return set of 6 million. The procedure still run after a weekend.

     

    So for small sets it works. But not as a general solution.

  • Since the thread has come alive again, I resubmit my last post, to get the opinions of the community.

    On a different note, how do you approach eliminating a cursor for "batch" inserting correlated items and the parent item uses an identity as PK (for example, inserting a batch of orders and the corresponding orderItems)?

    I have two approaches, but neither is very ellegant:

    • Reserve some IDs on the table and enable "identity insert" to insert the calculated IDs and the child items;
    • Add another column to the table to insert an ID with which I can relate the order and order items

    Another approach is with the help of triggers, but I'm not very fond of using triggers

Viewing 15 posts - 61 through 75 (of 296 total)

You must be logged in to reply to this topic. Login to reply