Part 2 of a discussion about replacing cursors with SQL statements for significant speed improvements.
In a recent post I discussed cursors and mentioned, at a high level, an example of replacing a cursor with a set-based series of SQL statements. In this post, I’m going to provide a simplified version of the SQL for both the cursor and the replacement. I’ll also discuss how I did the SQL conversion and why the new version is so much faster.
The following SQL has been shortened and cleaned up slightly from the original version. Although the version as shown will not run stand-alone, it will illustrate how the cursor was transformed to the set-based version.
Here is the original cursor which took about 20 minutes to run:
DECLARE CornerCursor CURSOR FAST_FORWARD FOR SELECT StudentNumber, SchoolCode, CurTransEligibility, GradeLevel, GeoCode, HomeZone, BilingualCode, SpedCode FROM -- [Tables for getting the student data] OPEN CornerCursor FETCH NEXT FROM CornerCursor INTO @StudentNumber, @SchoolCode, @CurTransEligibility, @GradeLevel, @GeoCode, @HomeZone, @BilingualCode, @SpedCode WHILE @@FETCH_STATUS = 0 BEGIN -- For the grade level, see if the student is ineligible -- for transportation [too close to school]. IF EXISTS(SELECT Sch FROM TransWalkingDistance WHERE SchCode = @SchoolCode AND GeoCode = @GeoCode AND GradeLevel = @GradeLevel) BEGIN UPDATE EnrollmentRecord SET TransCode = '', LastUpdateUserID= 'CScrpW' -- ineligible WHERE StudentNo = @StudentNo AND SchCode = @SchoolCode GOTO FetchNext END IF ((rtrim(@BilingualCode) = '' OR RTRIM(isnull(@BilingualCode,'')) = 'S') AND (@SpedCode = '' OR SUBSTRING(@SpedCode, 1, 1) IN ('S', 'R') OR @SpedCode IN ('V1', 'V2', 'P1', 'P2'))) BEGIN IF @GradeLevel <> 'H' AND NOT EXISTS (SELECT School FROM BufferZoneLookup WHERE School = @School AND GradeLevel = @GradeLevel AND (GeoCode = @GeoCode OR HomeZone IN (@HomeZone, 'C','H'))) BEGIN -- Code here to run a similar update to the update -- statement above, but setting the transportation -- code to 'C0RR' and the LastUpdateUser to 'CScrpZ' END GOTO FetchNext END -- Two additional update statements here, each with different -- conditions. There's a default one as well if none of the above -- updates were used. FetchNext: FETCH NEXT FROM CornerCursor INTO @StudentNumber, @SchoolCode, @CurTransEligibility, @GradeLevel, @GeoCode, @HomeZone, @BilingualCode, @SpedCode END END CLOSE CornerCursor DEALLOCATE CornerCursor
Notice with the above SQL that there are a sequence of update statements surrounded by if/else conditions. The updates also run in a way where if one update runs, none of the following ones can run. I used these clues to help me design the set-based version. I decided to do things in the following order:
1) Encapsulate the if/else conditions referred to in the cursor into a temporary table with multiple bit columns. For these, more than one of the conditions [bits] can be set.
2) Use the temp table with the bit field settings to determine what the transportation eligibility code should be. The update statements simply modify the transportation code and last update user, so that’s what I set when populating the second temporary table. In the logic, the order of the case statement matches the order of the updates in the cursor.
3) Use the transportation codes set in #2 to actually do the updates.
Here’s the resulting SQL:
-- This query will populate a table with different bit fields -- representing the if/else conditions used throughout the cursor. -- So essentially this is centralizing disperse if/else conditions into -- one record for each student that can be then used by the next query. SELECT studentdata.StudentNumber, studentdata.SchoolCode, studentdata.TransCode, studentdata.GradeLevel, studentdata.BilingualCode, studentdata.SpedCode, studentdata.GeoCode, studentdata.HomeZone, (CASE WHEN studentdata.sch IN('4261','1010','1020','1340') THEN 1 ELSE 0 END) AS 'CanNotBeOutOfZone', -- Citywide (CASE WHEN isnull(SchZone.sch, '') <> '' THEN 1 ELSE 0 END) AS 'InZoneOrBufferZone', (CASE WHEN isnull(TransWalkingDistance.sch, '') <> '' THEN 1 ELSE 0 END) AS 'IsInWalkingDistance', (CASE WHEN studentdata.grade IN ('K0', 'K1') AND studentdata.isExtendedDay = 0 THEN 1 ELSE 0 END) AS 'IsHalfDayKindergarten', -- Bilingual non-spanish students are always transportation eligible. (CASE WHEN NOT isnull(studentdata.BilingualCode, '') = '' AND NOT (isnull(studentdata.BilingualCode,'')) = 'S' THEN 1 ELSE 0 END) AS 'BilingualIneligibleForC0RR', -- A few additional case statement options omitted [SpedIneligibleForC0RR]. INTO #StudentsToProcess FROM -- [List of tables] similar to the one in the cursor definition. -- Use the bit fields to set what the transportation code would -- be for the student. The case statement to set PropsedTransCodeBasedOnData -- runs in the same precedence order as the original cursor. In other words, -- order is important to keep the results the same. SELECT studentdata.studentno, studentdata.transcode AS 'curTransCode', -- in the walkeligible range (CASE WHEN IsInWalkingDistance = 1 THEN '' -- Set out of zone. High school grades and some other -- categories cannot be out of zone. WHEN BilingualIneligibleForC0RR = 0 AND NOT sys.grade IN('09','10','11','12') AND InZoneOrBufferZone = 0 AND CanNotBeOutOfZone = 0 THEN 'C0RR' WHEN (IsHalfDayKindergarten = 1 AND isnull(sys.transcode, '') = '') THEN 'C5*R' WHEN (IsHalfDayKindergarten = 1 AND isnull(sys.transcode, '') <> '') THEN sys.transcode ELSE 'C5RR' END) AS 'PropsedTransCodeBasedOnData' INTO #ProposedTranscodeChanges FROM -- [List of tables] -- Run the update - set trans eligibility codes [c5rr, c0rr, etc]. -- Each update used a different user ID tag, so set this appropriately. UPDATE studentdata SET transcode = propsedTransCodeBasedOnData, LastUpdateUser = (CASE isnull(propsedTransCodeBasedOnData, '') WHEN 'c5rr' THEN 'CScrpC' WHEN 'c0rr' THEN 'CScrpZ' WHEN '' THEN 'CScrpW' WHEN 'C5*R' THEN 'CScrpK' ELSE 'CScrpX' END) FROM -- [List of tables]
I could have likely combined the SQL statements above so only 1 or 2 queries would be necessary. But doing so would have been at the cost of increased query complexity and maintainability. The queries run in under 10 seconds, a significant performance improvement over the cursor-based version!
Why is the cursor so slow? It is simple – each student is processed one by one. To make matters worse, multiple queries are run within the body of the cursor, so over 250,000 queries could theoretically be run for 55,000 students. Why run that many queries when 3 queries will do?