CURSORS and WHILE loops, we can’t seem to live with them and we can’t seem to live without them or can we?! I recently had to look at some code here where I work that has been in place for some time, code not written by me. I was testing performance on a new box (8 gig of RAM and 4 PROCs with hyper-threading) and we wanted to compare its performance to one of our current boxes (4 gig of RAM and 4 older PROCS), so after doing the usual - backups, restores, re-indexes, & DBCC - against a 17 gig database, we decide to import an important job and run it. The good news is it ran much faster as was expected: 2 hours 35 minutes down to 1 hour 19 minutes. I was, however, a little distraught to see that the code had inner cursors embedded in an outer cursor, a BIG RED flag to me that this code wasn’t performing as well as it could.
The inner cursors were attempting to pivot data from the many side of a one-to-many table into single column data of an Output table. The approach was to loop through each primary record getting the key value (1.7 million of them) and then loop through the secondary table for each of those primary records while altering and concatenating the retrieved data for each into 3 different variables. This was followed by an update to the Output table based on a match to the primary record ID after which the process continued until all output records were updated. The approach is sound enough but slow - very, very slow. There is a lot of overhead Declaring, Opening, Fetching, Closing and De-Allocating cursors. I knew I could do better. The cursor code is shown below (changed to not expose any specific metadata information of the company where I work). Only one inner cursor is shown.
(Declare & start of outer Cursor code) --Inner Cursor code…code for just one column pivot -- (variables are previously declared an) DECLARE temp_cursor CURSOR FOR SELECT column_data FROM DB.dbo.many_tbl WHERE id = @tmp_id -- key data ORDER BY column_data OPEN temp_cursor FETCH NEXT FROM temp_cursor INTO @tmp_data WHILE @@FETCH_STATUS = 0 BEGIN SELECT @tmp_values = @tmp_values + convert(varchar(20), @tmp_data) + ',' FETCH NEXT FROM temp_cursor INTO @tmp_data END CLOSE temp_cursor DEALLOCATE temp_cursor UPDATE DB.dbo.OutPut_tbl SET column_out = @tmp_values WHERE id = @tmp_id (Fetch next in outer Cursor) (Close & Deallocate outer Cursor)
Replacing the Inner Cursors
Back in 2002 I had submitted the “Easy Table Pivot” code to SQL Server Central and I knew that I could at least get rid of the inner cursor(s). The code below directly replaced the above inner cursor.
(Declare & start of outer Cursor code) --Direct column pivot into a variable -- (variables are previously declared) SELECT @tmp_values = @tmp_values + convert(varchar(20), column_data) + ',' FROM DB.dbo.many_tbl WHERE id = @tmp_id ORDER BY column_data UPDATE DB.dbo.OutPut_tbl SET column_out = @tmp_values WHERE id = @tmp_id (Fetch next in outer Cursor) (Close & Deallocate outer Cursor)
Replacing the Outer Cursor
This code is simpler and faster than the cursor and this piece of the total job dropped from 52 minutes to about 11 minutes. So who could ask for anything better! Perhaps in SQL 7.0 you can’t, at least I don’t think you can (all the above code was already in a stored procedure) but what about SQL 2000? After looking at the above code the select statement looked like a good candidate for a scalar function. I was sure this part of the job would perform better as a set process in a single update with a function call then as a loop. By the way, I had replaced the cursor with a WHILE loop to eliminate the overhead associated with Opening, Closing and De-Allocating the cursor. Anyway, here is the function I created, basically from the same select statement.
CREATE function dbo.ufn_data_pivot(@id as int) Returns varchar(20) AS BEGIN DECLARE @value varchar(20) SET @value = '' SELECT @value = @value + convert(varchar(20), column_data) + ',' FROM DB.dbo.many_tbl WHERE id = @id ORDER BY column_data Return @value END
The update call in the main code then looked like the following…
UPDATE DB.dbo.OutPut_tbl SET column_out = dbo.ufn_data_pivot(key_column)
And just how did this code perform? It literally blew the doors off the old embedded cursors. This section of the code dropped to just over 2 minutes! Keep in mind that I had three functions pivoting into three different output columns and the key data set was about 1.7 million records! WOW!
After making some other changes in other parts to the job code, the whole job dropped from 1 hour 20 minutes on the new box to just over 12 minutes! Boy, was I (and several others) happy.
You can be sure that I will be making every attempt in the future to replace all manual looping code with function calls in SQL 2000 and beyond, especially on large data sets.