Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

Replacing Cursors with Set-Based SQL Queries – Part 2

Part 2 of a discussion about replacing cursors with SQL statements for significant speed improvements.

In a recent post I discussed cursors and mentioned, at a high level, an example of replacing a cursor with a set-based series of SQL statements. In this post, I’m going to provide a simplified version of the SQL for both the cursor and the replacement. I’ll also discuss how I did the SQL conversion and why the new version is so much faster.

The following SQL has been shortened and cleaned up slightly from the original version. Although the version as shown will not run stand-alone, it will illustrate how the cursor was transformed to the set-based version.

Here is the original cursor which took about 20 minutes to run:

DECLARE  CornerCursor   CURSOR FAST_FORWARD FOR
	SELECT 
		StudentNumber,
		SchoolCode,
		CurTransEligibility,
		GradeLevel,
		GeoCode,
		HomeZone,
		BilingualCode,
		SpedCode
	FROM 
		--  [Tables for getting the student data]
 
OPEN CornerCursor
 
FETCH NEXT FROM CornerCursor INTO 
	@StudentNumber,
	@SchoolCode,
	@CurTransEligibility,
	@GradeLevel,
	@GeoCode,
	@HomeZone,
	@BilingualCode,
	@SpedCode
 
WHILE @@FETCH_STATUS = 0
BEGIN
	-- For the grade level, see if the student is ineligible 
              -- for transportation [too close to school].
	IF EXISTS(SELECT Sch FROM TransWalkingDistance 
                       WHERE SchCode = @SchoolCode 
                           AND GeoCode = @GeoCode 
                           AND GradeLevel = @GradeLevel)
	BEGIN
		UPDATE 
		      EnrollmentRecord
		SET
		      TransCode = '',	
                                 LastUpdateUserID= 'CScrpW' -- ineligible
		WHERE 
		      StudentNo = @StudentNo 
                                  AND SchCode = @SchoolCode
 
		GOTO FetchNext
	END
 
 
              IF ((rtrim(@BilingualCode) = '' 
                   OR RTRIM(isnull(@BilingualCode,''))  = 'S')
                   AND (@SpedCode = '' 
                           OR SUBSTRING(@SpedCode, 1, 1) IN ('S', 'R') 
                           OR @SpedCode IN ('V1', 'V2', 'P1', 'P2'))) 
	BEGIN
		IF @GradeLevel <> 'H' 
			AND NOT EXISTS
			(SELECT School FROM BufferZoneLookup 
			      WHERE School = @School  
                                               AND GradeLevel = @GradeLevel 
			      AND (GeoCode = @GeoCode 
                                               OR HomeZone IN (@HomeZone, 'C','H')))
		BEGIN
			-- Code here to run a similar update to the update 
                                         -- statement above, but setting the transportation
			-- code to 'C0RR' and the LastUpdateUser to 'CScrpZ'
		END
 
		GOTO FetchNext
	END
 
	-- Two additional update statements here, each with different 
              -- conditions.  There's a default one as well if none of the above 
              -- updates were used.
 
	FetchNext:
		FETCH NEXT FROM CornerCursor INTO 
		@StudentNumber,
		@SchoolCode,
		@CurTransEligibility,
		@GradeLevel,
		@GeoCode,
		@HomeZone,
		@BilingualCode,
		@SpedCode
	END
END
CLOSE CornerCursor
DEALLOCATE CornerCursor

Notice with the above SQL that there are a sequence of update statements surrounded by if/else conditions. The updates also run in a way where if one update runs, none of the following ones can run. I used these clues to help me design the set-based version. I decided to do things in the following order:

1) Encapsulate the if/else conditions referred to in the cursor into a temporary table with multiple bit columns. For these, more than one of the conditions [bits] can be set.
2) Use the temp table with the bit field settings to determine what the transportation eligibility code should be. The update statements simply modify the transportation code and last update user, so that’s what I set when populating the second temporary table. In the logic, the order of the case statement matches the order of the updates in the cursor.
3) Use the transportation codes set in #2 to actually do the updates.

Here’s the resulting SQL:

-- This query will populate a table with different bit fields 
-- representing the if/else conditions used throughout the cursor.  
-- So essentially this is centralizing disperse if/else conditions into 
-- one record for each student that can be then used by the next query.
SELECT
	studentdata.StudentNumber,
	studentdata.SchoolCode,
	studentdata.TransCode,
	studentdata.GradeLevel,
	studentdata.BilingualCode,
	studentdata.SpedCode,
	studentdata.GeoCode,
	studentdata.HomeZone,
	(CASE WHEN studentdata.sch IN('4261','1010','1020','1340')
		THEN 1 ELSE 0 END) AS 'CanNotBeOutOfZone', -- Citywide
              (CASE WHEN isnull(SchZone.sch, '') <> '' THEN 1 ELSE 0 END) 
                                                            AS 'InZoneOrBufferZone',
	(CASE WHEN isnull(TransWalkingDistance.sch, '') <> '' THEN 1 ELSE 0 END)
                                                            AS 'IsInWalkingDistance',
	(CASE WHEN studentdata.grade IN ('K0', 'K1') 
                     AND studentdata.isExtendedDay = 0 THEN 1 ELSE 0 END) 
                                                            AS 'IsHalfDayKindergarten',
	-- Bilingual non-spanish students are always transportation eligible.
	(CASE WHEN NOT isnull(studentdata.BilingualCode, '') = '' 
                  AND NOT (isnull(studentdata.BilingualCode,'')) = 'S' THEN 1 ELSE 0 END)
                                                            AS 'BilingualIneligibleForC0RR',
	-- A few additional case statement options omitted [SpedIneligibleForC0RR].
INTO #StudentsToProcess
FROM
	--  [List of tables] similar to the one in the cursor definition.   
 
-- Use the bit fields to set what the transportation code would 
-- be for the student.  The case statement to set PropsedTransCodeBasedOnData 
-- runs in the same precedence order as the original cursor.  In other words, 
-- order is important to keep the results the same.
SELECT  studentdata.studentno, studentdata.transcode AS 'curTransCode',
	-- in the walkeligible range
	(CASE WHEN IsInWalkingDistance = 1 THEN ''
	-- Set out of zone.  High school grades and some other 
              -- categories cannot be out of zone.
	WHEN BilingualIneligibleForC0RR = 0
	    AND NOT sys.grade IN('09','10','11','12') 
	    AND InZoneOrBufferZone = 0 
	    AND CanNotBeOutOfZone = 0 THEN 'C0RR'
	WHEN (IsHalfDayKindergarten = 1 
                                   AND isnull(sys.transcode, '') = '') THEN 'C5*R'
	WHEN (IsHalfDayKindergarten = 1 
                                   AND isnull(sys.transcode, '') <> '') THEN sys.transcode
	ELSE 'C5RR' END) AS 'PropsedTransCodeBasedOnData'
INTO #ProposedTranscodeChanges
FROM 
	-- [List of tables] 
 
-- Run the update - set trans eligibility codes [c5rr, c0rr, etc].  
-- Each update used a different user ID tag, so set this appropriately.
UPDATE studentdata SET transcode = propsedTransCodeBasedOnData,
LastUpdateUser = (CASE isnull(propsedTransCodeBasedOnData, '') 
              WHEN 'c5rr' THEN 'CScrpC'
	WHEN 'c0rr' THEN 'CScrpZ' 
              WHEN '' THEN 'CScrpW' 
              WHEN 'C5*R' THEN 'CScrpK' 
              ELSE 'CScrpX' END)
FROM 
	-- [List of tables]

I could have likely combined the SQL statements above so only 1 or 2 queries would be necessary. But doing so would have been at the cost of increased query complexity and maintainability. The queries run in under 10 seconds, a significant performance improvement over the cursor-based version!

Why is the cursor so slow? It is simple – each student is processed one by one. To make matters worse, multiple queries are run within the body of the cursor, so over 250,000 queries could theoretically be run for 55,000 students. Why run that many queries when 3 queries will do?

Comments

Posted by Anonymous on 6 July 2011

Pingback from  Dew Drop &ndash; July 6, 2011 | Alvin Ashcraft&#039;s Morning Dew

Posted by AlexSQLForums on 6 July 2011

Nicely done. Great Article :)

Posted by Steve Jones on 6 July 2011

Very nice, I like this explanation.

Posted by Anonymous on 7 July 2011

Pingback from  Dew Drop &ndash; July 7, 2011 | Alvin Ashcraft&#039;s Morning Dew

Posted by ray watson on 9 July 2011

Fair enough, but if performance is not a big issue (perhaps you only run it once a month)  the cursor approach has the following advantages.

1) Easier to understand - especially if you document the individual IF (test) 'blocks' well ( which arent in this example)

2) Much easier to accurately produce error messages at the stage when  they occur as opposed to a monolithic FAIL by using  the Select statement, and even write them to an error log.

I have had my fill of huge, SELECT ... CASE code blocks that take forever to figure out when an error occurs.

Posted by Joe Celko on 10 July 2011

It is an improvement over the original cursor mess, but you have assembly language style bit flags in SQL and use temp tablers as pseudo-scratch tapes. It is still not declarative code.

Can you do anything about the way that the same data element having multiple names and bring them up to ISO-11179 Standards?

I am still trying to figure out "bilingual_code"; we used the ISO 639-2 language codes for the second and third languages; in my local elementary school that means Spanish (spa) and Viet Namese (vie) with a few exotics.

We also did walking distance as a computation based on (x,y) co-ordinates. The rules kept changing, so the flags never got updated correctly.

Posted by bbaley on 12 July 2011

Why not use a CTE update ?

something like;

WITH

cteFirstQuery (id, f1, f2) AS

 (select id, f1, f2 from TableA

  where <something>)

,

cteSecondQuery (id, f3, f4) AS

 (select fq.id, fq.f1, fq.f2, sq.f3, sq.f4, calcField from

  cteFirstQuery fq

  inner join TableB sq on fq.id = sq.id

 )

UPDATE

 TableA

 SET f2 = calcField

 from cteSecondQuery

 where TableA.id = cteSecondQuery.id

-- thats completely adhoc unrelated example, but the idea being using a CTE, derived from a CTE and updating based on result(s) set

Posted by Ron Kunce on 5 August 2011

Yes nice article!  It could be even better if it came with some sort of print format to print the text with all the code!

Posted by FreeHansje on 16 August 2011

Every now and then some1 starts the debate cursor vs set-based programming in a relational database. I follow these articles with great interest, for with my procedural programmers background I really would learn how to use set-based statements over cursor-based programming.

To raywatson I would say: you are aware that set-based coding is more efficient, but you seem to be more comfortable with cursors. Set-based is more difficult, you say. But is it not your obligation to code as efficient as possible? And what if now you code a cursor construct, because performance is not an issue, who is to say how this will be 6 months from now? That happens regularly in my experience.

Also, if you are a database programmer, is it not your duty to learn how to do things the best way possible? Saying it is more difficult to do set-based programming does not sound good.

Leave a Comment

Please register or log in to leave a comment.