In a previous post I discussed where it is best to put logic – SQL or code. In my view, SQL is great for processing sets of data and poorly structured for row-based processing. To put it bluntly, I try to avoid cursors in most circumstances. Is this rule absolute? Not necessarily. Sometimes the operation cannot be done in a set-based way, and a cursor can be developed faster than a program. Sometimes it is known in advance the looping will only involve a few hundred records and so won’t take long to run. Sometimes you’re working with an existing cursor and just need to make a few minor changes – modifying the cursor in this scenario will be faster than rewriting it from scratch. But in general, I try to avoid using cursors when possible [and I'm always looking for alternative ways to code the logic].
I’ve also removed a number of cursors during my time at BPS, some more complicated than others. A cursor in particular that sticks out is one that was used to determine transportation eligibility for students. The cursor used the student’s information and updated the transportation eligibility code for each student – one student at a time. The logic did the following:
The logic used in the cursor was a tangled web of update statements fit between many nested if statements with complicated logic. To make matters worse, for 56,000 students this process took over 20 minutes to run.
A couple years ago I decided enough was enough and spent some time replacing the cursor. After reviewing the cursor logic and getting to the heart of what the logic was doing, I was able to replace it with a sequence of 2 queries used to populate table variables and a final query used to update the transportation eligibility codes. All 3 queries are set-based and handle all students simultaneously. In general, the cursor was composed of many repetitive tasks done in a certain order. I kept the same ordering in the queries by using a case statement.
What was the end result of replacing the cursor with the set based SQL statements? The required runtime went down to just 10 or so seconds. From 20 minutes to 10 seconds? That’s over a 100 fold speed improvement!
I’m going to explore the SQL used [for both the cursor and the replacement] in more detail in my next post.