There Must Be 15 Ways To Lose Your Cursors...
Part 1: An Introduction
A long, long time ago, it was not possible to do everything in the SQL language that a database developer or a DBA might need to do with just set-based SQL. And thus was born the Cursor. The bastard love-child of the declarative relational database language SQL and her first wild fling with a much more experienced and confident procedural programming language (whose identity is still unknown), the Cursor arrived in the early 80's amid promises and predictions to fix all manner of ills in the house of SQL. And at first, that's how it appeared. Anything that set-based SQL couldn't do on her own, her erstwhile son would step right in and handle for her. In fact he handled so many things for her, that she became convinced that she could not get by without him.
But then she began to notice some things that disturbed her. For one thing, he was just a little bit slow and a whole lot lazy. He never hurried anywhere; he always moved just one step, then another. Plus, he didn't seem capable of doing more than one thing at a time. Then there was the way that he did his work. He didn't just do it, he would lay down tarps, drop cloths, tools and toolboxes everywhere, until the whole room (and then some) was practically unusable. And this condition would persist far longer than it should because he was so slow in his work.
Which was ironic because he ultimately only ever did one thing: apply duct tape. No matter what the task, chore or repair was, he would just duct tape over it. Instead of replacing broken windows, he would duct tape them back together. When the siding needed repair, he duct taped it. Leaky pipes? Duct tape was the answer. Lost shingles? More duct tape.
"My way is cheaper" he would say when she asked him about it. "Why invest all of that time and money in new windows, pipes or shingles when with a little duct tape, we can be done now?" And because she didn't know about these things, she had to accept his word on it. But she had her doubts. First, his repairs didn't seem to work very well. The pipes still leaked some, as did the windows which she could hardly see out of. Secondly, she suspected that these cheap repairs and constructions were actually costing her more in utility bills and damage to things around them than they were saving.
And then there was the appearance. The once beautiful house of SQL, conceived, designed and built to widespread acclaim and admiration, was now a shocking eyesore. With its obscured windows, sagging fences, and omnipresent patchwork of duct tape, it looked not so much like a home as it did a duct tape shelter with bits and pieces of a real house randomly attached.
However, over the years she had been steadily learning how to do these things herself and now, not only did she realize that she did not need him anymore, she also knew for sure that there were better ways to do these tasks. In fact, the best thing for her would be to redo every single thing that he had done. But how to do it? He still insisted on doing all of the repairs himself. And after almost thirty years he was still living in her basement and showed no sign of either getting a job or moving out. "Well," she thought to herself, "there are ways of getting rid of uncooperative children..."
This series of articles intends to show you not only how to get rid of the Cursors that you may currently have, but also how to avoid ever having to use them in the first place. Up through SQL Server 2000, I would have agreed with most people that there were some things that Cursors were necessary for but, with the release of SQL Server 2005, all of those reasons have disappeared in the face of the new features and capabilities that it provides. Specifically, the new features that have enabled this elimination are:
- Large Strings(VARCHAR(MAX), etc)
- Windowed Aggregate Functions (especially ROW_NUMBER())
- FOR XML PATH
As we progress in this series, we will see how these three additions have greatly extended the reach of set-based SQL.
And although I have only mentioned Cursors so far, the same applies equally to WHILE loops as well. What I will be showing you in this series of articles is how to use the new and old features of Transact-SQL to both create and convert SQL routines that are faster, smaller, cleaner, clearer and more supportable without the use of Cursors or While loops.
What's Wrong With Cursors Anyway?
First, they are slow and resource intensive. Many of them will create a table in tempdb and copy the entire dataset into it, which can cause resource problems. Those that do not are frequently forced to use single-record retrieval methods that can be much slower. In either event, the Cursor is forced to process the returned rows one at a time, which means a WHILE loop and that adds its own performance problems (the repeated execution overhead of multiple statements for each row). Additionally, there is locking overhead to maintaining this single record state in a database that may also have contending DML statements going on at the same time.
Secondly, they can slow other processes down as well. Using more resources (CPU, memory, tempdb, etc.) means that less is available for other uses, and using them for a longer time also means that they are unavailable to others for a longer time. Still, further, the additional locking that Cursors employ can result in blocking other processes too.
Thirdly, Cursors and While loops are blatantly procedural code in SQL, a language that was designed from the start to be a declarative language. What's the difference? Procedural languages, the more traditional approach to programming, as seen in VB, C#, Java, etc., consists of writing code in the form of procedures. That is, an ordered series of steps that individually tell the computer what to do and collectively tell it how to do it. Declarative languages, on the other hand, simply describe the results that are to be returned, leaving the optimizers the freedom to determine how best to do it.
For instance, consider the question that a cook might ask "What food should I make?" The procedural answer might be:
- Get butter and sugar
- Cream them together
- Add vanilla and eggs
- Add flour, salt and baking powder
- Etc., etc. ...
On the other hand, the declarative answer would be:
- Cookies, with chocolate chips
Implemented correctly, declarative code is generally much more concise than procedural code. Obviously, you don't want to use procedural answers where a declarative one will do.
Fourthly, SQL routines with Cursors and loops are, in general, harder to read, harder to understand, and harder to maintain and support. Is this just personal preference on my part? That's a valid question, but I think that the answer is no. First, look at our abstract examples above. You may notice something interesting about the procedural example; it never tells us that it is making chocolate chip cookies. That's because procedural code tells us how to do something, but not what it is that we are doing. For this reason alone, declarative code tends to be easier to read and understand. Now this single example may not be convincing, especially since it does not involve actual code of either kind, however, we will have many opportunities in this series to directly compare functionally identical procedural SQL with declarative SQL that I think will demonstrate this beyond any reasonable doubt.
Finally, you do not need them. Seriously. Let me say that again so that there is no uncertainty: As of SQL Server 2005, the only reason to use Cursors is if you actually want to slow down your code (for instance to do a big update in chunks). Otherwise there is just no reason that I can see to use them, and many reasons not to.
Why Do People Still Use Cursors?
Whenever I start to talk about the problems with cursors there are two questions that always come up. The first is "If Cursors are so bad, then why do they exist?" The answer to that question is historical, which I believe I have covered adequately above.
The second question is "If Cursors as so bad, then why do people keep using them?" That's an excellent question because no matter how often I tell SQL developers who I am training not to use Cursors they invariably do, and then end up calling me desperately to fix the problems they have caused. In fact this tendency is in my experience so powerful that it seems like an irresistible force. In short, Cursors and While loops attract developers the way that a black hole attracts matter. The cause of this is, I believe, due to a combination of reasons.
First, I believe that many SQL practitioners simply do not realize the problems with cursors or may even come from an environment like Oracle where they perform somewhat better. One of the goals of this series is to remedy that through information and education.
Secondly, I think that all developers and, indeed, almost everyone in the computer field is familiar with procedural thinking and by implication procedural programming. This is much less so for declarative programming. Even if we haven't been trained as a programmer, our job and career environment are saturated with procedural thinking and procedural perspectives. This collective procedural ambiance leads us to see procedural programming as a natural and straight-forward thing.
If you want some confirmation of this all you have to do is to look on the some of the technical support forums. It's amazing to me how many posters seeking help can only answer the question "What are you trying to do?" with a series of steps instead of an actual description of the results that they want. They have become stuck in a procedural mindset where everything can be truly described only by using an ordered series of imperative instructions. In this mindset, end-state descriptions (declarations or descriptions of desired results) are seen as inherently incomplete because they do not describe how to get to that end-state.
Thirdly, there is a phenomenon I have experienced as a (non-SQL) programmer that I like to call "Heads-Down Programming" or "Design-less Coding". When faced with a complicated task, the developer just writes a big loop to process each input row and then inserts a line or block of code for each requirement of their task. This is the essence of mediocre programming, and every developer has likely done it at one time or another. Why? Various reasons to explain this that I have heard (or have given myself) include:
- "Because I am having difficulty figuring out the task or the tools and I need to get moving on it"
- "Because I need to finish my tasks quickly and I do not have time to think about them"
- "Because I am only being measured by my boss on how quickly I finish my tasks and problems that might arise downstream in testing, QA or production, are not counted against me"
- "Because I am not very good at this, yet"
- "Because I don't really care about the quality of the work"
- "Because I do not know any better" (more common than you may think)
Thus, Cursors and While loops serve as a way for a developer to get their job "done" (sort of) as quickly as possible. They do not have to try to figure out what is the correct way or the best way to do something, just meet the functional requirements and move on. The problem here though is that they are reducing the cost of their initial development by adding serious or even severe costs to testing, QA, production, support, maintenance and upgrading. Developing in this way only makes the initial development easier at the expense of every other stage of the software lifecycle. And this only really works in procedural programming where you can look at things one small step at a time in isolation and do not ever have to look at the big picture or really understand what it is that you are trying to do in a purposeful context.
This is an approach that is not readily available to the declarative developer, because the individual pieces of a query cannot be considered in complete isolation from each other. The relationships between the different table sources, between the table sources and the output columns, between the output columns and the ordering columns and between the Group By clause and the rest of the query are all very significant. This means that you cannot do declarative programming without thinking about it.
The fourth reason is that because SQL is a domain-specific or special purpose language many of the developers who use it never become truly proficient in it because they do not consider it to be a central part of their expertise. I have observed this many times, even among developers who spend 50% or more of their time developing in SQL over several years. Consequently, they are frequently confronted with situations that could be easily solved with good declarative, set-based SQL, but are unfamiliar with the techniques or features that would allow them to do so. This leads them to fall back to what they knew before that started using SQL; procedural techniques and algorithms.
Homework: Exercise and Example
I know that some readers may be disappointed by the lack of code so far, so I will leave you with the following example and problem (don't worry, it's easy):
Declare @dummy int
Declare @X int
Set @X = 0
Declare XCursor Cursor LOCAL FORWARD_ONLY For
From master.sys.columns c1
Cross Join master.sys.columns c2
FETCH NEXT FROM XCursor Into @Dummy
WHILE @@Fetch_Status = 0
Set @X = @X + 1
FETCH NEXT FROM XCursor Into @Dummy
First, copy and run this procedure, measuring its run-time on your system. Next, figure out what this Cursor procedure really does (remember, that is part of the problem with procedural code). Now write a faster version that accomplishes the same task without any Cursors or loops. And don't worry if you find the homework difficult, we will cover it in detail in part two.
Coming in Part 2: "Just Put It in a Set, Brett"
We will pickup in part 2 with the simpler cases of cursor and loop based code and how to convert them to set-based or declarative SQL. I will show a straight-forward method for converting simple Cursor and While loop based procedures to declarative queries. Then, we will look at several slightly more complex instances and see how each can be easily rewritten using this technique.
My thanks to Jeff Moden for his comments and criticisms in the preparation of this article.
Read Part 2 now.
R. Barry Young is a Principal Consultant for Proactive Performance Solutions, Inc., a Microsoft Gold Certified Partner, located in northern Delaware. He has been programming for over 35 years, a computer professional for 30 years, a professional consultant for 25 years, a Systems Performance Analyst for 20 years and a Database Consultant for the last 15 years. He estimates that he has written between 800,000 and 1,000,000 lines of procedural code by now and thinks that he is finally starting to get the hang of it.