RE: What is meant by set based programming?

SSC Guru

Points: 100851

August 14, 2012 at 3:30 pm

patrickmcginnis59 (8/14/2012)
Evil Kraig F (8/14/2012)
...stuff...
The post I was replying to asked if there was anything to back up my theory that RBAR was slow because of the T-SQL overhead. I posted my little test.
I think maybe you are simply not reading the thread you are posting in. But that is also just a theory 😉

No, I fear you're misunderstanding the term RBAR. Row-By-Agonizing-Row. This specifically remarks on the difference between accessing the rows in an index or table one at a time instead of as an internal tableset to the optimizer.

Your examples are simple iterative incrementation and numeric manipulation. This can be a 'set' based on context (particularly if you have a mathmatical background), thus why I asked you for your definition. You are quite correct, however, that .NET and other programming languages are more suited to the task of simple incrementation and variable mathmatics. For a particular example; string manipulation in SQL is outpaced simply by using a CLR function instead.

So you're aware, your smiley does not remove either the sting nor the implications from your words. This is a T-SQL/SQL Server based site and we concentrate on the database layer itself, which implies data retrieval as the primary purpose for any task. I asked merely to maintain we were discussing apples to apples and provided an example for clarity. Set based programming, as a term, gets tossed around a lot as an idea and the context is important. Screw me for being polite and not jamming you.

You made an assertation, and were asked to back it up. You then did so in a way that is completely out of context for 99% of the tasks performed in this language. I'd assumed (stupid me!) that you merely misunderstood what set-based programming meant in a database environment. Also, even then, had you read the thread, set based programming is generally understood to be declarative programming, not procedural... and thus neither of your examples are actually an example. See Springtown's and Dwain's posts above. G2 and Jeff then get into local lingo discussions of the difference.

We'll go to the wiki for an outside reference:

http://en.wikipedia.org/wiki/Set_(computer_science)

Paraphrased: The set is a structure that stores values, not the creation or maintenance of said values.

Set based programming is working with the group in its entirety. RBAR, or iterative programming, are your examples. We're quite aware that works very poorly inside the SQL engine, and just about any other DB engine after the DB2 generation, since that was mostly external engines coded as a hybrid connection, but to your specific assertation:

One thing to keep in mind is that set based execution is not any faster than c based manuallly coded loops except of course that database engine programmers are probably better than the average c programmer. What does make a HUGE difference is how slow T-SQL the language interpreter is. If you were to compare a simple loop coded in T-SQL against pretty much any other language, its just slower to a tremendous degree. So a single line of T-SQL is going to be slow in any case, but if that single line of T-SQL results in an operation against many rows of data, the cost of that slow T-SQL can be amortized across a much greater amount of work.

That the internal set access loop is not any faster than c-based looping. In this, because C based coding can mimic assembler with how close to memory access it can achieve, I'll agree with, as a set has to be in memory for access. This also would assume that the C-based coding included concepts such as index retrievals, key lookups, and any other number of items that are used to speed set-based operations within any DB engine. I imagine that's a LOT of work. Like a few years worth of a dedicated team to create the basics.... oh, right, that's MySQL.

The language interpreter for any piece of code is fired up once, same as .NET and its recent ancestors. Slow or fast, both sides need a 'compile time'. Most .NET languages do that work up front and create an executable, SQL Server stores this in memory and is transparent about its compile times. The compilation of the code is non-applicable during runtime except immediately after a reboot under most circumstances.

You then state a single line of T-SQL is going to be slow in any case. Truly? PRINT GETDATE() returns in microseconds. The majority of the time of the process is network transfer. This requires some exposition if you wish to maintain that as a fact.

Now, loops, which I skipped over briefly. You're not going to hear any of the older hands here state that a loop isn't the slowest method of accessing a piece of data in the database engines, and you're quite correct that T-SQL isn't optimized to hand built incremental iterators. You've proven grass is green, but nothing important to the work. Of all the things to prove, that was the one that noone would disagree with.

So, about reading the thread... would you like to try it yourself? No, that's not meant to either be funny nor polite.

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA