RE: Hanging WHILE loop – SQLServerCentral

SSC Guru

Points: 1003865

May 11, 2014 at 11:32 am

tombiagioni1983 (5/7/2014)
Thanks everyone for the quick responses and the helpful attitude! I come from a C# programming background where iterative loops aren't as costly, So i'll definitely be reading that article! I'll add in the additional BEGIN/END statements and report back.
Thanks again everyone!

Iterative loops in T-SQL aren't costly. Just the explicit ones are. 🙂

One of the things that most books and courses on SQL Server and, in particular, the T-SQL Language fail miserably in is how to loop. In courses like those for C#, the first thing they do is teach you how to setup the working environment. The next thing they do is teach you how to print "Hello World" to demonstrate that your working environment is setup. After teaching you some syntax, functions, and a couple of other things, they teach you how to write a loop that will count from, say, 1 to 100 because looping is the backbone of every program.

T-SQL isn't any different except you don't have to write the loops. For example, instead of opening a file, checking to see if any rows exist, reading a row, doing something with it, and looping back to see if another row exists, all you have to do in T-SQL is write a SELECT statement. Behind the scenes, SELECT statements aren't a whole lot more than a machine-language-level loop that works much like I just described. Thanks to a fellow by the name of R.Barry Young, I call those hidden, very high performance loops behind every SELECT a "Pseudo Cursor" and, yes, they are controllable.

If you look at Luis' code, it uses such a Pseudo Cursor to replace your explicit cursor (While Loop, in this case). It's MUCH faster than the While Loop because it only has to make 1 execution plan and then SQL Server loops behind the scenes at machine language speeds. With a While Loop, it has to make or reuse an execution plan for each and every iteration even if it decided that there's an execution plan that it can reuse. The While loop also has to take separate locks for every iteration instead of locking what it needs all at once like the SELECT does. All of that takes time.

My best suggestion to make the paradigm shift from writing procedural code in a language like C# is in my signature line below. First, make the realization the T-SQL will build the loops it needs for you and second, like the tagline says, "[font="Arial Black"]Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column[/font]" and let T-SQL figure out how to express what you want done to each row.

The really cool part about all of this is that you don't actually have to use anything from the table to use the table to loop. All you need is the "presence of rows" to get the job done. You'll find an example of this in the Tally Table article that was previously cited above by rhythmk but, if you have the time, try the following out. It'll help you make the paradigm shift I'm talking about. If you can make that shift, your life will be golden not only for counting but for many other areas of T-SQL, as well.

What does the following statement do?

SELECT *

FROM sys.all_columns

;

Other than offending most people's sensibilities about the use of "*" returning to much data (too many columns), most people would say that the code returns all columns of the table and they'd be partially correct. The other thing that it does is, because there is no limit expressed in the code, it also returns all of the rows. If you look in the lower right corner of the screen, you'll see the number of rows the query returned and, for this table, I absolutely guarantee that number will always be more than 4,000. Write the number of rows returned down. We'll need it again.

Next experiment. What does the following code do? And... what doesn't it do?

SELECT 1

FROM sys.all_columns

;

The obvious thing that it does is that it returns a 1 for every row in the table. You can easily verify that by comparing the number you wrote down in the first experiment with the number of rows this query returned.

The question is, what didn't it do?

The answer is that it didn't use ANY data from the table. It only used the "presence of rows" from the table. To translate that, we used the Pseudo Cursor behind the scenes of the SELECT as a loop to return a certain number of ones, that number being the count of all the rows in the table.

Now, let's do something a bit more practical. Let's count from 1 to however many rows there are in the table. Don't think of it as creating X rows with an incremental count. Think of it as creating a column that has an incremental count up to a limit. Let T-SQL worry about how to create the rows.

SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

FROM sys.all_columns

;

On my little 2005-based test machine, that returns a column of numbers number from 1 to 4,168 even on a brand new, totally empty database. Your number may differ a bit but it will always be > 4,000.

That does bring up the next question. Let's just say that's the largest table in the database and we want to count to, say, 20,000. In C#, you would simply change the max limit of your While loop and you'd be done. In T-SQL, we actually have to provide a larger "Presence of Rows". In C#, if the largest number you could count to was 4,000, you could do it by making a nested loop where each iteration of an outer loop would cause the inner loop to iterate the 4,000 times it's supposed to.

In T-SQL, we can build such a nested loop simply by doing a CROSS JOIN. To keep us from building the 16 Million rows that a nested loop of 4,000 in each loop would deliver, we have to provide a "Stop" value just like you would have to do in C# on the inner loop. Instead of having to worry how to do that in a loop, we just tell T-SQL to stop when it has 20,000 values in the column. We use TOP for this and let T-SQL figure out the rest. Rather than defining the content of each row and evaluating each row for the max value we want, we simply define what we want in the column (as we did before) and when to quit. Like this...

SELECT TOP 20000

ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

FROM sys.all_columns ac1

CROSS JOIN sys.all_columns ac2

;

Taking the next step, think of the CROSS APPLY that Luis used as a "For Each Row". His use of CROSS APPLY replaces the nested SELECT in your WHILE loop so that for each of the 1,000 items selected from the Chest table, it'll return X number of rows from the Items table depending on the random number generator that you used (ABS(CHECKSUM(NEWID()))%range+offset). Behind the scenes, it does exactly the same looping that you did in your WHILE loop but in a more "compiled" fashion rather than an "interpretive" fashion.

Again, the key to success for a great deal of what can be done in T-SQL is to understand that T-SQL loops behind the scenes in a very effective manner and your goal is simply to let T-SQL do that for you instead of overriding what it's good at. Tell T-SQL what you want it to do and not how to do it. In other words (again), "[font="Arial Black"]Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.

[/font]"

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)