Blog Post

Replacing Cursors with Set-Based SQL Queries – Part 2

,

Part 2 of a discussion about replacing cursors with SQL statements for significant speed improvements.

In a recent post I discussed cursors and mentioned, at a high level, an example of replacing a cursor with a set-based series of SQL statements. In this post, I’m going to provide a simplified version of the SQL for both the cursor and the replacement. I’ll also discuss how I did the SQL conversion and why the new version is so much faster.

The following SQL has been shortened and cleaned up slightly from the original version. Although the version as shown will not run stand-alone, it will illustrate how the cursor was transformed to the set-based version.

Here is the original cursor which took about 20 minutes to run:

DECLARE  CornerCursor   CURSOR FAST_FORWARD FOR
SELECT 
StudentNumber,
SchoolCode,
CurTransEligibility,
GradeLevel,
GeoCode,
HomeZone,
BilingualCode,
SpedCode
FROM 
--  [Tables for getting the student data]
 
OPEN CornerCursor
 
FETCH NEXT FROM CornerCursor INTO 
@StudentNumber,
@SchoolCode,
@CurTransEligibility,
@GradeLevel,
@GeoCode,
@HomeZone,
@BilingualCode,
@SpedCode
 
WHILE @@FETCH_STATUS = 0
BEGIN
-- For the grade level, see if the student is ineligible 
              -- for transportation [too close to school].
IF EXISTS(SELECT Sch FROM TransWalkingDistance 
                       WHERE SchCode = @SchoolCode 
                           AND GeoCode = @GeoCode 
                           AND GradeLevel = @GradeLevel)
BEGIN
UPDATE 
      EnrollmentRecord
SET
      TransCode = '',
                                 LastUpdateUserID= 'CScrpW' -- ineligible
WHERE 
      StudentNo = @StudentNo 
                                  AND SchCode = @SchoolCode
 
GOTO FetchNext
END
 
 
              IF ((rtrim(@BilingualCode) = '' 
                   OR RTRIM(isnull(@BilingualCode,''))  = 'S')
                   AND (@SpedCode = '' 
                           OR SUBSTRING(@SpedCode, 1, 1) IN ('S', 'R') 
                           OR @SpedCode IN ('V1', 'V2', 'P1', 'P2'))) 
BEGIN
IF @GradeLevel <> 'H' 
AND NOT EXISTS
(SELECT School FROM BufferZoneLookup 
      WHERE School = @School  
                                               AND GradeLevel = @GradeLevel 
      AND (GeoCode = @GeoCode 
                                               OR HomeZone IN (@HomeZone, 'C','H')))
BEGIN
-- Code here to run a similar update to the update 
                                         -- statement above, but setting the transportation
-- code to 'C0RR' and the LastUpdateUser to 'CScrpZ'
END
 
GOTO FetchNext
END
 
-- Two additional update statements here, each with different 
              -- conditions.  There's a default one as well if none of the above 
              -- updates were used.
 
FetchNext:
FETCH NEXT FROM CornerCursor INTO 
@StudentNumber,
@SchoolCode,
@CurTransEligibility,
@GradeLevel,
@GeoCode,
@HomeZone,
@BilingualCode,
@SpedCode
END
END
CLOSE CornerCursor
DEALLOCATE CornerCursor

Notice with the above SQL that there are a sequence of update statements surrounded by if/else conditions. The updates also run in a way where if one update runs, none of the following ones can run. I used these clues to help me design the set-based version. I decided to do things in the following order:

1) Encapsulate the if/else conditions referred to in the cursor into a temporary table with multiple bit columns. For these, more than one of the conditions [bits] can be set.

2) Use the temp table with the bit field settings to determine what the transportation eligibility code should be. The update statements simply modify the transportation code and last update user, so that’s what I set when populating the second temporary table. In the logic, the order of the case statement matches the order of the updates in the cursor.

3) Use the transportation codes set in #2 to actually do the updates.

Here’s the resulting SQL:

-- This query will populate a table with different bit fields 
-- representing the if/else conditions used throughout the cursor.  
-- So essentially this is centralizing disperse if/else conditions into 
-- one record for each student that can be then used by the next query.
SELECT
studentdata.StudentNumber,
studentdata.SchoolCode,
studentdata.TransCode,
studentdata.GradeLevel,
studentdata.BilingualCode,
studentdata.SpedCode,
studentdata.GeoCode,
studentdata.HomeZone,
(CASE WHEN studentdata.sch IN('4261','1010','1020','1340')
THEN 1 ELSE 0 END) AS 'CanNotBeOutOfZone', -- Citywide
              (CASE WHEN isnull(SchZone.sch, '') <> '' THEN 1 ELSE 0 END) 
                                                            AS 'InZoneOrBufferZone',
(CASE WHEN isnull(TransWalkingDistance.sch, '') <> '' THEN 1 ELSE 0 END)
                                                            AS 'IsInWalkingDistance',
(CASE WHEN studentdata.grade IN ('K0', 'K1') 
                     AND studentdata.isExtendedDay = 0 THEN 1 ELSE 0 END) 
                                                            AS 'IsHalfDayKindergarten',
-- Bilingual non-spanish students are always transportation eligible.
(CASE WHEN NOT isnull(studentdata.BilingualCode, '') = '' 
                  AND NOT (isnull(studentdata.BilingualCode,'')) = 'S' THEN 1 ELSE 0 END)
                                                            AS 'BilingualIneligibleForC0RR',
-- A few additional case statement options omitted [SpedIneligibleForC0RR].
INTO #StudentsToProcess
FROM
--  [List of tables] similar to the one in the cursor definition.   
 
-- Use the bit fields to set what the transportation code would 
-- be for the student.  The case statement to set PropsedTransCodeBasedOnData 
-- runs in the same precedence order as the original cursor.  In other words, 
-- order is important to keep the results the same.
SELECT  studentdata.studentno, studentdata.transcode AS 'curTransCode',
-- in the walkeligible range
(CASE WHEN IsInWalkingDistance = 1 THEN ''
-- Set out of zone.  High school grades and some other 
              -- categories cannot be out of zone.
WHEN BilingualIneligibleForC0RR = 0
    AND NOT sys.grade IN('09','10','11','12') 
    AND InZoneOrBufferZone = 0 
    AND CanNotBeOutOfZone = 0 THEN 'C0RR'
WHEN (IsHalfDayKindergarten = 1 
                                   AND isnull(sys.transcode, '') = '') THEN 'C5*R'
WHEN (IsHalfDayKindergarten = 1 
                                   AND isnull(sys.transcode, '') <> '') THEN sys.transcode
ELSE 'C5RR' END) AS 'PropsedTransCodeBasedOnData'
INTO #ProposedTranscodeChanges
FROM 
-- [List of tables] 
 
-- Run the update - set trans eligibility codes [c5rr, c0rr, etc].  
-- Each update used a different user ID tag, so set this appropriately.
UPDATE studentdata SET transcode = propsedTransCodeBasedOnData,
LastUpdateUser = (CASE isnull(propsedTransCodeBasedOnData, '') 
              WHEN 'c5rr' THEN 'CScrpC'
WHEN 'c0rr' THEN 'CScrpZ' 
              WHEN '' THEN 'CScrpW' 
              WHEN 'C5*R' THEN 'CScrpK' 
              ELSE 'CScrpX' END)
FROM 
-- [List of tables]

I could have likely combined the SQL statements above so only 1 or 2 queries would be necessary. But doing so would have been at the cost of increased query complexity and maintainability. The queries run in under 10 seconds, a significant performance improvement over the cursor-based version!

Why is the cursor so slow? It is simple – each student is processed one by one. To make matters worse, multiple queries are run within the body of the cursor, so over 250,000 queries could theoretically be run for 55,000 students. Why run that many queries when 3 queries will do?

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating