Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

Replacing Cursors with Set-Based SQL Queries – Part 1

Part 1 of a discussion about replacing cursors with SQL statements for significant speed improvements.

In a previous post I discussed where it is best to put logic – SQL or code. In my view, SQL is great for processing sets of data and poorly structured for row-based processing. To put it bluntly, I try to avoid cursors in most circumstances. Is this rule absolute? Not necessarily. Sometimes the operation cannot be done in a set-based way, and a cursor can be developed faster than a program. Sometimes it is known in advance the looping will only involve a few hundred records and so won’t take long to run. Sometimes you’re working with an existing cursor and just need to make a few minor changes – modifying the cursor in this scenario will be faster than rewriting it from scratch. But in general, I try to avoid using cursors when possible [and I'm always looking for alternative ways to code the logic].

I’ve also removed a number of cursors during my time at BPS, some more complicated than others. A cursor in particular that sticks out is one that was used to determine transportation eligibility for students. The cursor used the student’s information and updated the transportation eligibility code for each student – one student at a time. The logic did the following:

  1. Fetch the student’s grade level [Elementary, Middle, or High school] into a variable.
  2. Clear the transportation eligibility code if the student is within the walking distance to the school [walking distance is based, in part, on the students' grade level].
  3. If the student isn’t in a particular category [Special Ed, etc] and isn’t in a particular set of schools, set the eligibility code to ineligible if the student is out of zone [the code used is different than the one in #2]. Note: Boston is divided into 3 school assignment zones, and we generally do not provide transportation to students who attend an out of zone school, although there are exceptions.
  4. If the student is in a half-day kindergarten program, update the eligibility code to highlight the fact that the transportation will be for a half-day only.
  5. Update the transportation eligibility code if the student didn’t meet any of the other criteria [such a student is eligible for transportation].

The logic used in the cursor was a tangled web of update statements fit between many nested if statements with complicated logic. To make matters worse, for 56,000 students this process took over 20 minutes to run.

A couple years ago I decided enough was enough and spent some time replacing the cursor. After reviewing the cursor logic and getting to the heart of what the logic was doing, I was able to replace it with a sequence of 2 queries used to populate table variables and a final query used to update the transportation eligibility codes. All 3 queries are set-based and handle all students simultaneously. In general, the cursor was composed of many repetitive tasks done in a certain order. I kept the same ordering in the queries by using a case statement.

What was the end result of replacing the cursor with the set based SQL statements? The required runtime went down to just 10 or so seconds. From 20 minutes to 10 seconds? That’s over a 100 fold speed improvement!

I’m going to explore the SQL used [for both the cursor and the replacement] in more detail in my next post.

Comments

Posted by Anonymous on 27 June 2011

Pingback from  Dew Drop – June 27, 2011 | Alvin Ashcraft's Morning Dew

Posted by zuber.patel on 29 June 2011

good one looking for more details with examples

Posted by csaptd on 29 June 2011

We have a system from an external supplier where all the tables have a unique numeric reference number but this field is NOT set to an Identity - so when you insert data you have to generate the reference yourself.  I had a job to import data from a text file to populate some of these tables. I put the data into a temporary table and ran queries to populate the application table.  I used a cursor to generate the reference number  but then I discovered the ROW_NUMBER function so I could generate the reference number in a single SQL statement.  Job's a good 'un!

Posted by jclemens 24860 on 29 June 2011

Good.  One comment.  While it may not always be advisable, I have yet to see a looping approach that cannot be replaced by a set based approach in sql.

Posted by matt.newman on 29 June 2011

I have seen times where you still need a loop. If you are executing a stored procedure. Example below. Sometimes you can't put everything with an inline function call to a set based query. If you can use a set based query though, it is definitely far better than a cursor. I do not use cursors ever. There is more control ofver while loops if you need to execute something you do not have control over.

declare

@current int,

@max int,

@monkey varchar(26)

declare -- for big record results use a table variable and temp index

@table table

(pkey int identity(1,1),

monkeysee varchar(13),

monkeydo varchar(13))

insert

@table

select

'me',

'swing'

select

@max = @@rowcount,

@current = 1

while @max >= @current

begin

select

@monkey = monkeysee + ' ' + monkeydo

from

@table

where

pkey = @current

exec @storedProcedure

@var1 = @monkey,

@sequence = @current

set @current = @current + 1

end

Posted by matt.newman on 29 June 2011

oops... sorry there is a comment in the code... use a table variable for big selects is temp table... use a temp table INSTEAD of a table variable for big results to sort through. Sorry.

Posted by kirk 61364 on 29 June 2011

I find that even if you have to use a loop, but you can drop the cursor, the process is much faster.  Better still if you can get it into a set based operation.  

Some before and after samples would also be helpful.

Thanks Andrew.  

Posted by Rick@bum... bum bum bum! on 29 June 2011

I have a rather unique situation, where I have a link to  MySQL server from my MS SQL Server. I have a series of Databases on the MySQL Server that I know of from a list that was given to me. Here is the catch, not all of the databases have come on line and I will not know exactly when they do come on line. All I know if that when they do I will need to mine data from them.

The existing process was in a SSIS package that we had to redeployed when ever a new database was discovered.

Here is what I did and if anyone has a better approach please let me know.

I loaded all of the databases and tables that I will mine in to a table. I created an automated process that is using a cursor that one by one checks if the database is there and then checks if the tables are all preasent. I do this by using dynamic sql that I execute and trap a count variable and then If it is greater than one I grab the data. This is the code snippet that I use to validate the database. All of the code used is stored in tables and I fill variables.

set @chk_scrpt = @var_scrpt_validate + @var_db_nm + @var_scrpt_validate_end

set @parmdef = N'@varpass int output'

execute sp_executesql @chk_scrpt, @parmdef, @varpass = @passvar output

if (select @passvar)>0 begin --print N'Pass' Database was available set @scrpt_list = @scrpt_list + N', ' + @var_scrp end

I have other logic that I use to build my insert statements so I only go after the specific record that have a database name.

I spin through about 130 different records and it takes about 30 to 40 seconds to process.

Posted by Iulian on 30 June 2011

I am curious if something like this is possible:

SELECT

   item

   , EXECUTE( spActionOnItem ( item ) )

FROM cursor_items

WHERE condition

or any workaround

Thanks,

Iulian

Posted by sql2k8 on 9 July 2013

running of an execute in a select statement is not possible

Leave a Comment

Please register or log in to leave a comment.