Replacing Cursors and While Loops

Question

Post reply

Replacing Cursors and While Loops

Clinton Herring

Ten Centuries

Points: 1160
More actions
June 29, 2005 at 4:16 pm

#167701

Comments posted to this topic are about the content posted at http://www.sqlservercentral.com/columnists/cherring/replacingcursorsandwhileloops.asp

Viewing 15 posts - 1 through 15 (of 26 total)

You must be logged in to reply to this topic. Login to reply

Jeswanth Jaishanker SSCommitted Points: 1869 More actions · Answer 1

Try this code. you might save remaining 12 Minutes also..

UPDATE

B

SET

b.id = A.id

FROM

dbo.many_tbl A inner join dbo.OutpUt_tbl B

ON A.id = B.ID

Jeswanth

--------------------------------

Stephen Bailey SSC-Addicted Points: 423 More actions · Answer 2

I can't see the purpose of the ORDER BY clause. If you were to remove this from your select statement, you may find that you improve the performance. It may also be the case that the outer loop has an ORDER BY clause that can also be removed.

Profile the procedure to determine what indexes are being used on the tables. Ensure that the indexes in use are the most efficient. You may be able to improve the performance yet again by creating covering indexes.

Stephen Bailey

rschaeferhig SSCrazy Eights Points: 8776 More actions · Answer 3

If you don't believe this example take a look at the example I posted which Noeld and Remi helped me convert:

http://www.sqlservercentral.com/forums/shwmessage.aspx?forumid=8&messageid=197948

This went from 2 hours to 6:46 to 0:26 in the final iteration!!

Anubis-237029 SSC Veteran Points: 206 More actions · Answer 4

Recently I have had to create an admin task that needs to be run on our 24gb database each day. Unfortunately due to our company security I cannot show the DDL but will do my best to explain...

In this task (50000 lines of TSQL) I have had to use several cursors to complete the task. This task is broken down into several smaller or sub tasks to perform the entire operation.

Though I admit I have not had the chance to fully analyze the task as yet to identify scans and the like (which I will get to shortly) however I wish to try and rid myself of the cursors first.

In one 'sub' task I managed to do this by converting a cursor into a simple insert statement:

Insert Into dbo.tbl

Select col1, col2, col3

From dbo.workingtbl

This managed to cut time down from 11 minutes to 3. Working with approx 500,000 rows in the sql above.

The remaining cursors are becoming a little hard to work our. In the cursor I open approximately 20 variables which are used throughout the loop..

Declare #x cursor for

select col1, col2... col20

from dbo.table

order by col1

fetch next from #x into @var1, @var2... @var20

while..... (you all know the rest)

In the while loop above I perform close to 100 calculations, 5 to 10 updates and inserts etc....

With this many calculations and updates/inserts could anyone suggest a better method of this loop without using a cursor? Or perhaps this is one of those isolated cases where a cursor is best?

Either way, any suggestions would be great!

Ninja's_RGR'us SSC Guru Points: 294064 More actions · Answer 5

It would be best if you started a new thread for this question. You will get more attention to it and you'll be able to provide more details.

Could you provide the table definition (even if you change the names). Some sample data and the expected outputs. I'm thinking at the moment that you could do a first select statement that would do the 100 calculations and insert that data in a temp table. Then you could do your updates/inserts using that temp table. It should run 10/50 times faster that way.

Mike C SSC-Insane Points: 23224 More actions · Answer 6

Great article. A lot of people make the mistake of trying to use SQL as a procedural language instead of a declarative language. It's hard for some to think in terms of SQL set-based statements as opposed to iterative looping and cursors.

I'd estimate that cursors could be eliminated and replaced with set-based SQL statements 99.9999% of the time that they are used.

Mike C SSC-Insane Points: 23224 More actions · Answer 7

In addition to what was already stated about creating a temp table to hold intermediate results, if your calculations are conditional using IF..ELSE constructs, you might look at converting them to CASE expressions and/or modifying your WHERE clauses.

Without seeing your DDL it's impossible to get much more detailed than that.

Yelena Varshal SSC-Dedicated Points: 34289 More actions · Answer 8

Hi,

Good Article!

In my opinion cursor usage should depend on "How Often". Cursors provide an easy programming solution when the extensive row processing should be done. If something should be done only a couple of times like in one time data migration tasks, I ususally opt for cursors. If something has to run every day and takes a lot of time than the set approach should be researched and maybe some database architecture changes including de-normalizing the database should be done.

Yelena

Regards,Yelena Varsha

Clinton Herring Ten Centuries Points: 1160 More actions · Answer 9

Stephen,

In my actual code the data needed to be order in the pivoted result set. If one doesn't care about the order then your correct, the order by is not needed.

Clint

Maurice Pelchat Mr or Mrs. 500 Points: 546 More actions · Answer 10

The problem is not really in the cursor itself. It is in the fact that operations are not set-oriented. Doing 1 million inserts is not the same at all as doing a single insert based on a select that returns 1 million rows and insert them in a single operation.

I converted recently a transact-sql batch that was doing about 8 million inserts. The execution time was about 8 hours. By making a big query using derived tables and case statement to apply some conversion/extraction logic in the query, and union all to bring together rows obtains under disjoin logic context, this time reduced from 8 hours to 30 minutes (a 16 to 1 improvement). By using such query it was also possible to reduce logging by doing a select INTO.

Even without doing a SELECT INTO, SQL Server optimize log writes operations. It seems that there is a lot less logging required when doing such massive single insert than when doing it row by row.

Cursors are an easy way to describe a job, but the net result is that it ends in row by row reads & writes. For example a single row read use almost all the time suitable indexes, it they exists, which may seems good. But it is very not good if you finally read all the rows from the table this way. When you match a lot of rows using a join SQL Server may evaluate that using an index is not good, and the match is going to be done with a better suited joining algorithm for this type of job. There is another optimization gain possible there.

The only problem with this approach, is that you have to work harder to define the big “do it all” query.

Clinton Herring Ten Centuries Points: 1160 More actions · Answer 11

This doesn't pivot the results. I wanted the the data from several matching rows in the Many table to come back as a single value of data separated by commas in the results table.

Suppose I had the following Many table...

ID DATA

-- ------

56 run

56 jump

56 scale

After the pivot the results data would look like this...

ID Result

-- ----------------

56 run, jump, scale

Kenneth Lee Ten Centuries Points: 1323 More actions · Answer 12

After the pivot the results data would look like this...

ID Result

-- ----------------

56 run, jump, scale

No, not really, it should look like:

ID Result

-- ----------------

56 run,jump,scale,

That's 15 characters long, and if the return statement was:

return substring(@value,1,LEN(@value)-1)

you could drop the comma at the end. That's assuming your string including the last ',' character was 20 or under characters long.

Kenneth Lee Ten Centuries Points: 1323 More actions · Answer 13

I've had similar experiences with improving cursors. I had to look at someone's report because they weren't getting good data from it anymore. I ran the existing proc on my machine just to see what it would produce. 5 minutes later I hadn't gotten a result, and I had started to read what they were doing and understood why they weren't getting fast results, but still didn't understand what they were trying to do. 10 minutes after that I understood what they were trying to do, their proc is still running. I open up a new window and start getting answers piecemeal.

If I execute this select I'll get all the users they are interested in. Yep. If I join this information I'll find out the offices they are involved in. Yep. Hmmm, how am I going to include all the related offices. Oh, this should do it, but this could loop indefinitely, before I start this, put a different stop in the loop. (Came in handy when I blew the data comparison later and caused the infinite loop.)

Anyway, by the end of the day, I had reproduced the report information (including the information the original proc was dropping.) that they wanted. I left my workstation up overnight because I still hadn't gotten an answer from my machine in 6 hours of running.

Next morning I killed the 20 hour process, wrote up a new proc and got a report in 30 seconds. I heard the old processes ran in 4 hours on the faster production machine for 1 DB and 2 hours for another DB. (Custom processes for each DB.) My proc gets reports from both DBs in 28 seconds.

Clinton Herring Ten Centuries Points: 1160 More actions · Answer 14

You are correct for the code I listed in the article. I did change it in the final results here at work to remove the last comma.