Replacing Cursors and While Loops

  • Comments posted to this topic are about the content posted at http://www.sqlservercentral.com/columnists/cherring/replacingcursorsandwhileloops.asp

  • Try this code. you might save remaining 12 Minutes also..

    UPDATE

     B

    SET

     b.id = A.id

    FROM

     dbo.many_tbl A inner join dbo.OutpUt_tbl B

     ON A.id = B.ID

     

    Jeswanth

    --------------------------------

  • I can't see the purpose of the ORDER BY clause.  If you were to remove this from your select statement, you may find that you improve the performance.  It may also be the case that the outer loop has an ORDER BY clause that can also be removed.

    Profile the procedure to determine what indexes are being used on the tables.  Ensure that the indexes in use are the most efficient. You may be able to improve the performance yet again by creating covering indexes.

    Stephen Bailey

  • If you don't believe this example take a look at the example I posted which Noeld and Remi helped me convert:

    http://www.sqlservercentral.com/forums/shwmessage.aspx?forumid=8&messageid=197948

    This went from 2 hours to 6:46 to 0:26 in the final iteration!!

     

  • Recently I have had to create an admin task that needs to be run on our 24gb database each day.  Unfortunately due to our company security I cannot show the DDL but will do my best to explain...

    In this task (50000 lines of TSQL) I have had to use several cursors to complete the task.  This task is broken down into several smaller or sub tasks to perform the entire operation.

    Though I admit I have not had the chance to fully analyze the task as yet to identify scans and the like (which I will get to shortly) however I wish to try and rid myself of the cursors first.

    In one 'sub' task I managed to do this by converting a cursor into a simple insert statement:

    Insert Into dbo.tbl

    Select col1, col2, col3

    From dbo.workingtbl

    This managed to cut time down from 11 minutes to 3.  Working with approx 500,000 rows in the sql above.

     

    The remaining cursors are becoming a little hard to work our.  In the cursor I open approximately 20 variables which are used throughout the loop..

    Declare #x cursor for

    select col1, col2... col20

    from dbo.table

    order by col1

    fetch next from #x into @var1, @var2... @var20

    while.....  (you all know the rest)

    In the while loop above I perform close to 100 calculations, 5 to 10 updates and inserts etc....

    With this many calculations and updates/inserts could anyone suggest a better method of this loop without using a cursor?  Or perhaps this is one of those isolated cases where a cursor is best?

    Either way, any suggestions would be great!

  • It would be best if you started a new thread for this question. You will get more attention to it and you'll be able to provide more details.

    Could you provide the table definition (even if you change the names). Some sample data and the expected outputs. I'm thinking at the moment that you could do a first select statement that would do the 100 calculations and insert that data in a temp table. Then you could do your updates/inserts using that temp table. It should run 10/50 times faster that way.

  • Great article.  A lot of people make the mistake of trying to use SQL as a procedural language instead of a declarative language.  It's hard for some to think in terms of SQL set-based statements as opposed to iterative looping and cursors.

    I'd estimate that cursors could be eliminated and replaced with set-based SQL statements 99.9999% of the time that they are used.

  • In addition to what was already stated about creating a temp table to hold intermediate results, if your calculations are conditional using IF..ELSE constructs, you might look at converting them to CASE expressions and/or modifying your WHERE clauses.

    Without seeing your DDL it's impossible to get much more detailed than that.

     

  • Hi,

    Good Article!

    In my opinion cursor usage should depend on "How Often". Cursors provide an easy programming solution when the extensive row processing should be done. If something should be done only a couple of times like in one time data migration tasks, I ususally opt for cursors. If something has to run every day and takes a lot of time than the set approach should be researched and maybe some database architecture changes including de-normalizing the database should be done.

    Yelena

    Regards,Yelena Varsha

  • Stephen,

    In my actual code the data needed to be order in the pivoted result set. If one doesn't care about the order then your correct, the order by is not needed.

    Clint

  • The problem is not really in the cursor itself.  It is in the fact that operations are not set-oriented.  Doing 1 million inserts is not the same at all as doing a single insert based on a select that returns 1 million rows and insert them in a single operation.

     

    I converted recently a transact-sql batch that was doing about 8 million inserts. The execution time was about 8 hours. By making a big query using derived tables and case statement to apply some conversion/extraction logic in the query, and union all to bring together rows obtains under disjoin logic context, this time reduced from 8 hours to 30 minutes (a 16 to 1 improvement). By using such query it was also possible to reduce logging by doing a select INTO.

     

    Even without doing a SELECT INTO, SQL Server optimize log writes operations.  It seems that there is a lot less logging required when doing such massive single insert than when doing it row by row.

     

    Cursors are an easy way to describe a job, but the net result is that it ends in row by row reads & writes. For example a single row read use almost all the time suitable indexes, it they exists, which may seems good. But it is very not good if you finally read all the rows from the table this way. When you match a lot of rows using a join SQL Server may evaluate that using an index is not good, and the match is going to be done with a better suited joining algorithm for this type of job. There is another optimization gain possible there.

     

    The only problem with this approach, is that you have to work harder to define the big “do it all” query.

  • This doesn't pivot the results. I wanted the the data from several matching rows in the Many table to come back as a single value of data separated by commas in the results table.

    Suppose I had the following Many table...

    ID DATA

    -- ------

    56 run

    56 jump

    56 scale

    After the pivot the results data would look like this...

    ID Result

    -- ----------------

    56 run, jump, scale

  • After the pivot the results data would look like this...

    ID Result

    -- ----------------

    56 run, jump, scale

    No, not really, it should look like:

    ID Result

    -- ----------------

    56 run,jump,scale,

    That's 15 characters long, and if the return statement was:

    return substring(@value,1,LEN(@value)-1)

    you could drop the comma at the end.  That's assuming your string including the last ',' character was 20 or under characters long.

  • I've had similar experiences with improving cursors.  I had to look at someone's report because they weren't getting good data from it anymore.  I ran the existing proc on my machine just to see what it would produce.  5 minutes later I hadn't gotten a result, and I had started to read what they were doing and understood why they weren't getting fast results, but still didn't understand what they were trying to do.  10 minutes after that I understood what they were trying to do, their proc is still running.  I open up a new window and start getting answers piecemeal.

    If I execute this select I'll get all the users they are interested in.  Yep.  If I join this information I'll find out the offices they are involved in.  Yep.  Hmmm, how am I going to include all the related offices.  Oh, this should do it, but this could loop indefinitely, before I start this, put a different stop in the loop.  (Came in handy when I blew the data comparison later and caused the infinite loop.)

    Anyway, by the end of the day, I had reproduced the report information (including the information the original proc was dropping.) that they wanted.  I left my workstation up overnight because I still hadn't gotten an answer from my machine in 6 hours of running.

    Next morning I killed the 20 hour process, wrote up a new proc and got a report in 30 seconds.  I heard the old processes ran in 4 hours on the faster production machine for 1 DB and 2 hours for another DB.  (Custom processes for each DB.)  My proc gets reports from both DBs in 28 seconds.

  • You are correct for the code I listed in the article. I did change it in the final results here at work to remove the last comma.

Viewing 15 posts - 1 through 15 (of 25 total)

You must be logged in to reply to this topic. Login to reply