Robyn Page’s SQL Server Cursor Workbench

The topic of cursors is the ultimate "hot potato" in the world of SQL Server. Everyone has a view on when they should and mainly should not be used. By example and testing Robyn Page proves that, when handled with care, cursors are not necessarily a "bad thing". This article coined a phrase, 'Quirky Update', that has since established itself as the industry-term. We now feature a new revised version of the old classic, with help from Phil Factor.

The purpose of this series of workshops is to try to encourage you to take a practical approach to SQL skills. I always find I learn things much quicker by trying things out and experimenting. Please don’t just run the samples, but make changes, alter the data, look for my mistakes, try to see if there are different ways of doing things. Please feel free to criticize or disagree with what I say, if you can back this up. This workbench on cursors is not intended to tell you the entire story, as a tutorial might, but the details on BOL should make much more sense after you’ve tried things out for yourself!

Contents

What are cursors for?

Cursors were created to bridge the ‘impedence mismatch’ between the ‘record- based’ culture of conventional programming and the set-based world of the relational database.

They had a useful purpose in allowing existing applications to change from ISAM or KSAM databases, such as DBaseII, to SQL Server with the minimum of upheaval. DBLIB and ODBC make extensive use of them to ‘spoof’ simple file-based data sources.

Relational database programmers won’t need them but, if you have an application that understands only the process of iterating through resultsets, like flicking through a card index, then you’ll probably need a cursor.

Where would you use a Cursor?

An simple example of an application for which cursors can provide a good solution is one that requires running totals. A cumulative graph of monthly sales to date is a good example, as is a cashbook with a running balance.

We’ll try four different approaches to getting a running total…

Why not try these different approaches, with tables of different sizes and see how long the routines take? (I demonstrate a suitable test-rig shortly).

Is there a quicker or more elegant solution?

Global Cursors

If you are doing something really complicated with a listbox, or scrolling through a rapidly-changing table whilst making updates, a GLOBAL cursor could be a good solution, but is is very much geared for traditional client-server applications, because cursors have a lifetime only of the connection. Each ‘client’ therefore needs their own connection. The GLOBAL cursors defined in a connection will be implicitly deallocated at disconnect.

Global Cursors can be passed too and from stored procedure and referenced in triggers. They can be assigned to local variables. A global cursor can therefore be passed as a parameter to a number of stored procedures Here is an example, though one is struggling to think of anything useful in a short example.

Transact-SQL cursors are efficient when contained in stored procedures and triggers. This is because everything is compiled into one execution plan on the server and there is no overhead of network traffic whilst fetching rows.

Are Cursors Slow?

So what really are the performance differences? Let’s set up a test-rig. We’ll give each routine an increasingly big cashbook to work on up to 2 million rows, and give it a task that doesn’t disturb SSMS/Query analyser too much with a large result, so we can measure just the performance of each algorithm. We’ll put the timings into a table that we can put into excel and run a pivot on to do the analysis.

We’ll calculate the average balance, and the highest and lowest balance so as to check that the results of each method agree.

Now, which solution is going to be the best?

339-QuirkyData.jpg

The raw data is shown here. What I have not shown is the check of the results, which shows that every solution gave consistent results.

The iterative and cursor solution both give similar results since, under the covers, they are doing similar things. They are dramatically faster than the ‘correlated subquery’ and ‘group by’ methods as one would expect.

You will see from the graph that we couldn’t even attempt the correlated subquery methods under a ‘production’ table size. It would have taken too long.

Conclusion? If you don’t feel confident about using ‘Quirky Update’ (and it is easy to mess-up, so you have to test it rigorously), then Running totals are best done iteratively, either by the cursor or the WHILE loop. The WHILE loop is more intuitive, but there is no clear reason in favour of one or the other. For almost all work in SQL Server, set-based algorithms work far faster than iterative solutions, but there are a group of problems where this isn’t so. This is one of them. For a good example of another one, see Phil Factor Speed Phreak Challenge #6 – The Stock Exchange Order Book State problem.

339-QuirkyGraph.jpg

Cursor Variables

If you change the cursor type definition routine above you’ll notice that @@CURSOR_ROWS returns different values.

  • A negative value >1 is the number of rows currently in the keyset. If it is -1 The cursor is dynamic.
  • A 0 means that no cursors are open or no rows qualified for the last opened cursor or the last-opened cursor is closed or deallocated.
  • A positive integer represents the number of rows in the cursor.

The most important type of cursors are:

FORWARD_ONLY
Tou can only go forward in sequence from data source, and changes made to the underlying data source appear instantly.
DYNAMIC
Similar to FORWARD_ONLY, but You can access data using any order.
STATIC
Rows are returned as ‘read only’ without showing changes to the underlying data source. The data may be accessed in any order.
KEYSET
A dynamic data set with changes made to the underlying data appearing instantly, but insertions do not appear.

Cursor Optimization

  • Use them only as a last resort. Set-based operations are usually fastest (but not always-see above), then a simple iteration, followed by a cursor
  • Make sure that the cursor’s SELECT statement contains only the rows and columns you need
  • To avoid the overhead of locks, Use READ ONLY cursors rather than updatable cursors, whenever possible
  • Static and keyset cursors cause a temporary table to be created in TEMPDB, which can prove to be slow
  • Use FAST_FORWARD cursors, whenever possible, and choose FORWARD_ONLY cursors if you need updatable cursor and you only need to FETCH NEXT.

Questions

  1. What is the fastest way of calculating a running total in SQL Server? Does that depend on the size of the table?
  2. what does it suggest if the @@CURSOR_ROWS variable returns a -1?
  3. What is the scope of a cursor?
  4. When might you want locking in a cursor? Which would you choose?
  5. Why wouldn’t the use of a cursor be a good idea for scrolling through a table in a web-based application?

Acknowledgements

Thanks to Nigel Rivett, Phil Factor and Adam Machanic for their ideas. Thanks to Phil Factor for revising this to give a more comprehensive test harness.