Eliminating Cursors

  • Valid points all but it doesn't get to the crux of my question in that where do people get the idea that cursors are bad.

    I know I am from an Oracle background but in my SQL experience (some 6 years) I have never had a problem with cursors! Memory or otherwise, if somebody could point me to an article that points out the flaws problems with cursors I would be most grateful..

    Coming out with statements such as "I don't like cursors so I don't use them" just isn't helpful.....

  • I concur with your statement of "I don't like cursors so I don't use them". It is an empty and unsupported posit.  It is my understanding that if your procedure code is not running fast (which by the way is a relative term) due to alleged cursor code, then perhaps the database server’s RDBMS is not getting the horsepower it needs like memory, larger hard-disk, or a stronger processor.

     

    There are cost-benefit trade-offs all over the place.  If a developer is cursor-centric in his procedure creations but said procedures do not perform “fast-enough”, what is “fast-enough” ?  Sure, a code review by another set of eyes might help point out some glaring in-efficiencies, but if “not fast-enough” means having to rewrite something at a cost that is less than cost of attaining more hardware, then so be it, attempt to reach that margin of diminishing return in terms of speed, rewrite your code. 

     

    I liken the analogy to the number of people attempting to enter a very small automobile.  If you can only fit five persons but need to fit ten, are you going to cram five more persons or are you going to get a larger automobile?

     

  • I was doubtful of the solution posed in this artcle also, so I did my own tests. I got similar results as found by bobjbian.  The static cursor beat all comers for larger result sets.  As the result set got smaller, the times between the 3 styles got closer.  I suppose for very small result sets (<100 rows?) that the temp table might have less overhead than a cursor, but it's only marginal.

    No doubt the SP called by the article performed an update to the underlying table causing the cursor to to be updated.  So while the table variable may have helped the performance, so too would a STATIC cursor.  And the STATIC cursor would have been faster than the table variable.

    - J

  • I see that there is at least one person who has accused me of plagiarism. I can assure you that this is not the case and that this is a piece of code which has been recently implemented by myself in one of our enterprise solutions.

    I do, however, agree to the point that I should have been more explicit in describing a few things, especially a little more insight as to why using a WHILE construct gives better results (which it certainly does in my case!) than a CURSOR routine. But then not everyone is good at writing articles and I am just learning the art here .

    As for my comment that using WHILE is better than using CURSORS: I understand(as I have noticed as well), and do correct me if I am wrong, that the inherent design of CURSOR implementation within SQLServer causes inconsistent locking and extra IO. This can be avoided using the WHILE technique since all you are doing is re-running the SQL statement one after the other. In my implementation this is all the more important since there is heavy usage of OUTPUT stored procedures which , unfortunately, cannot be used in simple SELECT statments.

    Lastly, to answer the question "where do people get the idea that cursors are bad"

    I say: experience

  • Actually, this is kind of funny as I've been using this approach for a couple of years and not thought to much of it so as to comment to the world ... Since my background is in software development - namely c, c++, Ruby, Lisp and other syntactically succinct languages, I felt driven to look for a way to get around the clunky syntax of the cursor declaration, as well as the issue that the reference variable appears differently in code than a standard variable in that the @ is not present and the parser uses this "syntactical difference" as a class differentiator. While there is a little more code, it is also more "readable" and therefore leans toward meeting goals of self-documentation.

    The "next one" in line with this (and I have not looked to see if there are any postings to this effect already), is using this technique for dynamic iteration of tables using sys information – which has a few different caveats. If there is any desire for this, let me know and I can post example code and ancillary discussion. One benefit to this is constructing queries on the fly where you do not want to have to hard code in order to get the efficiency of using the column names (as opposed to using the * operator, which causes performance degradation in testing).  This enables you to create “template” procedures more readily. There are other advantages as well.

  • As for why cursors are percieved as bad in SQL Server, it's simply about performance. SQL is optimised for set based operations. 'Never use cursors' is not a rule, but rather a piece of advice. If you can do something with one piece of set based sql, it will always be quicker and more efficient than using cursors. I mentioned some exceptions in an earlier post, but this is when you actually want you code to be less impactful.

    A week ago, I had a grad show me a piece of code he wrote to loop through every row in a 10,000,000 row table, changing the value of a flag for 50% of the rows. Obviously performing this with an update statement supported by indexes is much more sensible, and obviously it is much much quicker. I don't think I have ever seen a cursor that couldn't be written better in straight sql.

    As for Mike DiRenzo's post. All I can say is I'm real glad you don't work in my team with thought processes like that. 'Not fast enough' usually means, the business does not consider it fast enough. Just because you wrote some inefficient code, doesn't mean that people have to put up with it. 90% of database performance problems are caused by poorly constructed databases and procedures.

    (by the way a) not brand new does not mean plagurised, b) if the script is making your tempdb fill up, that's your problem not his, c) if you are working with temp tables then obviously, they are going to be in memory)

     

     

  • I'm fairly new to this, so forgive me, but if the whole idea is to stay set-based rather than iterative-based, isn't it obvious that there are only marginal improvments to be had between these two iterative approaches?

    Doesn't this drive one to think about the stored proc that generates the purchase order and ask, why is this second proc not set based? I realize this was a contrived example, but that's where my thoughts go. ? Like, here's the set of purchase order numbers, generate 'em all!

    As another, perhaps dumb idea, if one were to put the generation of the purchase orders in a function, would putting said function directly in the SELECT keep you set-based and therefore at a higher level of performance? (Realizing that there would be a great deal of I/O as the function was called for every row of the set). Or is that the same thing??

  • Here I will post two pieces of codes which are doing exactly the same thing: insert records into a two-column table. The total number of records is 6275. My test have shown that the WHILE construct takes 14.141 secs whereas the CURSOR implementation takes 44.391 secs.

    --CREATE TABLE temp_cp(client_project_id INT , project_name varchar(50))

    TRUNCATE TABLE temp_cp

    DECLARE @client_projct_id INT

    DECLARE @project_name Varchar(50)

    DECLARE test_cursor CURSOR FOR

    SELECT

    cp.client_project_id

    ,cp.project_name

    FROM dbo.bv_client_project cp

    OPEN test_cursor

    FETCH NEXT FROM test_cursor INTO @client_projct_id , @project_name

    WHILE @@FETCH_STATUS = 0

    BEGIN

    INSERT INTO temp_cp

    VALUES(@client_projct_id,@project_name)

    FETCH NEXT FROM test_cursor INTO @client_projct_id , @project_name

    END

    CLOSE test_cursor

    DEALLOCATE test_cursor

    SELECT * FROM dbo.temp_cp

    And here is the same code using WHILE construct. Notice how the second TABLE variable is used to do the final INSERT. If we use TABLE variables , as I have indicated in my article, we will not go into worrying about tempdb size increasing

    TRUNCATE TABLE temp_cp

    DECLARE @client_projct_id INT

    DECLARE @project_name Varchar(50)

    DECLARE @loop_counter INT

    DECLARE @total_projects INT

    DECLARE @temp_table TABLE(pk_column INT IDENTITY(1,1) NOT NULL, cp_id INT, proj_name varchar(50))

    DECLARE @temp_table_2 TABLE (cp_id INT, proj_name varchar(50)) -- this will be used TO INSERT the final VALUES into the actual TABLE

    INSERT INTO @temp_table

    SELECT

    cp.client_project_id

    ,cp.project_name

    FROM dbo.bv_client_project cp

    SET @loop_counter = 1

    SET @total_projects = (SELECT COUNT(*) FROM @temp_table)

    WHILE @loop_counter <= @total_projects

    BEGIN

    SELECT

    @client_projct_id = t.cp_id

    ,@project_name = t.proj_name

    FROM @temp_table t

    WHERE t.pk_column = @loop_counter

    INSERT INTO @temp_table_2

    VALUES (@client_projct_id ,@project_name)

    SET @loop_counter = @loop_counter + 1

    END

    INSERT INTO temp_cp

    SELECT

    * --bad practice !

    FROM @temp_table_2

    SELECT * FROM dbo.temp_cp

    I would highly appreciate if someone can run the two pieces of code and let me know the results in terms of performance. Thanks!

  • Mark,

    From experience a while loop is significantly faster than a cursor.  Much of the performance gain is due to the resources used in building the cursor.  My experience has been that you gain even more performance with a larger recordset. 

    I do agree that it may be, and probably is, possible to do the porcessing using sets, but not having seen the actual processing code it is hard to tell. 

  • In response to Mark Hickins comment concerning my thought processes, without question, you are obviously erudite.  True, I probably would not be on your team but probably leading your department.  My thought processes stem not only from a software engineer's but from a CEO's.  Your role in my company would be of a leader and mentor to the software development team as you have clearly proven in your past postings.

    Management does not care if your procedure code is running efficiently.  It is only perfectionists like us who care and of course the slow performing database server.  If something is working, albeit not efficiently, does the cost-benefit justify more quantitative analysis?  From a software developers perspective, yes; management, you do the math.

    Personally, I prefer using variables of type table-based looping constructs and have found them to perform relatively faster than cursor constructs.

     

  • I think that everyone is missing the obvious here.  The desired result of running the following SP is to obtain a list of associated Purchase_Order_IDs:

    EXEC dbo.usp_generate_purchase_order @item_category_id, @order_id, @purchase_order_id OUTPUT

    If performance is a real problem with this process, it would be my guess that one could combine the query for the Order_ID, Item_Category_ID with the Purchase_Order_ID and do this all in a set based operation. 

    The end result should be a temporary table of Purchase_Order_IDs that can be used by the next process in this chain of commands.

     

  • Oracle is more optimized to work with cursors whereas SQL Server is not. Under any kind of load, a set based operation will vastly outperform a cursor based one on SQL Server.

    You can use hardware to mask the issues, but then you still have powerful hardware running inefficient code.

    Oracle seems to be different and can perform very well using cursor solutions. But that's a decision the platform architects made when building both pieces of software.

    I wasn't sure this technique offered tremendous improvement since it didn't move to a set based solution, but I have to take Mr. Kamran as his word that it worked in his environment. We do read the articles and make suggestions, point out potential areas of improvement, but we are a community site that seeks to get articles written by the author.

    If you doubt a conclusion, I'd post a question here or email the author, but keep it professional and let's not attack someone. This might work in their environment and it might work in someone else's, even if it doesn't work for you. It's a valid way to improve performance, although not one I'd choose.

  • Nice code but to compare the two you must use the same method.

    The first part (where a cursor is slow) you insert each row into the table, one at a time, the table is on disk, each time you insert it does a disk seek to find the next place it can insert the record etc. etc. etc. This coupled with network bandwith for each record (if your running on the client) will have a peformance impact..

    The second part you insert each record into a table variable which, by it's nature, is stored in memory. Hence any access to this table will be much faster.

    You then copy en-mass all records from memory into the table, a bulk-insert by any other definition.

    It would be interesting to see your case using a table variable for the cursor based approach!!

    Cheers

    Bob

  • Another question that comes to mind is that all of the cursor based approaches use the following logic

    open cursor

    fetch next from cursor into ....

    while @@fetch_status = 0 begin

    ...

    ...

    ...

    fetch next from cursor into ....

    end

    close cursor

    deallocate cursor

    can anyone see any problems with this logic, which I generally use so as not tpo duplicate the fetch..

    open cursor

    while 1 = 1 begin

    fetch next from cursor into ....

    if @@fetch_status 0 break

    ...

    ...

    ...

    end

    close cursor

    deallocate cursor

    Thanks

    Bob Bain

  • "If we use TABLE variables , as I have indicated in my article, we will not go into worrying about tempdb size increasing"

    The TABLE variables are also written to TEMPDB. I thought the same way as you do, but on another discussion, I discovered that table variables are very similar to TEMPDB tables.

Viewing 15 posts - 16 through 30 (of 296 total)

You must be logged in to reply to this topic. Login to reply