A few reflections on RBAR by a weary application developer

  • The ZEN Master (Jeff Moden) will soon end all doubt 🙂

    No pressure 🙂

  • patrickmcginnis59 10839 (12/9/2013)


    Jeff Moden (12/9/2013)


    patrickmcginnis59 10839 (12/9/2013)


    Jeff Moden (12/9/2013)


    patrickmcginnis59 10839 (12/9/2013)


    I didn't see any explanation on why RBAR is so slow, so rated this article 1 star.

    This is how RBAR comes to be in systems. Many people just don't know and so they lean towards what they do. It gets the job done at the expense of future scalability and rework.

    I'm interested in why RBAR is so slow in SQL systems, and some of the following psychology that develops around this is pretty interesting too. I've seen stuff on the net like "this non-T-SQL program is so slow, I bet it uses RBAR inside".

    I think its getting to be on the level of those cargo-cult societies.

    edit: adjusted cargo-cult sentence 🙂

    Hi Patrick,

    I'm still neck deep in it at work but I did have the time to look at some of the attempts at explaining why RBAR is slower. I actually have a physical example that you'll be able to look at and, after quaffing a mighty and well deserved ale after the nightmare known as "today", I'll try to get it to you. It depends a lot on when I get done today.

    Thank you for the question. The answer might also explain the difference between RBAR and necessary RBR.

    Looking forward to it!

    Here was my first situation that started me looking at T-SQL itself. I was initially interested in the results because I had seen some poor performance with RBAR looking T-SQL so I drilled down and had a go at the language itself independent of data:

    http://www.sqlservercentral.com/Forums/FindPost1344895.aspx

    and I remember it didn't get rave reviews in that thread either, one of my favorite replies:

    http://www.sqlservercentral.com/Forums/FindPost1345055.aspx

    Ok. Here we go, Patrick.

    [font="Arial Black"]Explicit RBAR[/font]

    The first definition of RBAR is the obvious one. In T-SQL, it is simply the use of explicit Cursors/Loops to process single rows of data in a fashion that overrides the ability of SQL Server to process the data using the inherent, very high speed, machine language level “Psuedo-Cursors” (a phrase coined by our own R. Barry Young) that powers every multi-row SELECT, INSERT, UPDATE, or DELETE behind the scenes. For example, SELECT is the set-based command that is used to invoke some good old fashion, high speed, “read a row, process the row, display the row, find the next row” processing that anyone who’s ever taken even a BASIC or DOS course is aware of.

    The reason why that basic form (overriding the “Pseudo Cursors”) of RBAR is so slow is because, although SQL Server is considered to be a “declarative” rather than “procedural” language (you don’t have to tell it how to do it, you just tell it what to do… kind of like using macros), it is not a compiled language. It’s an interpretive language that requires separate steps for each and every action you tell it to take and it does a separate analysis each step of the way to decide if it can reuse an existing execution plan or if it needs to create a new one. This is where the old saw of “It takes longer to process one row a thousand times than it does to process a thousand rows once” comes from. Let’s see that part in action.

    First, let’s create a nice, narrow test table.

    --===== Create a target table

    IF OBJECT_ID('tempdb..#TestTable') IS NOT NULL

    DROP TABLE #TestTable

    ;

    CREATE TABLE #TestTable

    (N INT PRIMARY KEY CLUSTERED)

    ;

    Now, turn on the actual execution plan and run the following code. The code just calculates values from 1 to 1,000 and inserts them into a table so that we can return a single result set to a GUI (for example).

    --===== Count from 1 to some number using the explicit RBAR of a While Loop

    -- and return the result set to the GUI.

    TRUNCATE TABLE #TestTable; --Empty the table to make reruns easier

    SET NOCOUNT ON; --Speed trick to help RBAR run faster.

    DECLARE @Counter INT;

    SELECT @Counter = 1;

    WHILE @Counter <= 1000

    BEGIN

    INSERT INTO #TestTable (N) VALUES (@Counter);

    SELECT @Counter = @Counter + 1;

    END

    ;

    SELECT * FROM #TestTable

    ;

    With the execution plan turned on, that takes quite a while because it has to graphically generate so many (1 for each iteration) execution plans but that’s what I wanted you to see. Each iteration has to do some analysis on whether to use an existing execution plan or create a new one. In either case, it has to setup for a query (INSERT in this case) to run 1,000 times because it’s an interpretive language rather than a compiled language.

    If we turn the execution plan off and rerun the code again, it takes “just” 30 milliseconds duration and 31 milliseconds CPU time to run on my Dev server. But wait… according to SQL Profiler, it also took 2,078 pages of reads (17,022,976 bytes of some form of I/O) and produced an internal rowcount of 3,001 rows!

    Let’s change the 1,000 in the code to 1 million and see what happens (make sue the execution plan is turned off for this or you’ll have to wait for a month of Sundays for it to complete).

    --===== Count from 1 to some number using the explicit RBAR of a While Loop

    -- and return the result set to the GUI.

    TRUNCATE TABLE #TestTable; --Empty the table to make reruns easier

    SET NOCOUNT ON; --Speed trick to help RBAR run faster.

    DECLARE @Counter INT;

    SELECT @Counter = 1;

    WHILE @Counter <= 1000000

    BEGIN

    INSERT INTO #TestTable (N) VALUES (@Counter);

    SELECT @Counter = @Counter + 1;

    END

    ;

    SELECT * FROM #TestTable

    ;

    Again, considering a million rows of input, a little bit more than 24.5 seconds of duration and 22.8 seconds of CPU time doesn’t seem THAT bad but it did 2,628,518 reads and 1,606 writes for a total of 21,545,975,808 (yep… 21 billion) bytes of I/O (a read or write is 8,192 bytes… a page) and an internal rowcount of 3,000,001 rows instead of just a million. Something in C# like this would probably be measure in the very low seconds or might even be sub second (just to give a feel of how slow 24 seconds on a computer actually is).

    If we use a double-nested loop in the form of a CROSS JOIN “Pseudo-Cursor”, the I/O is just as bad as the While loop but we’ve eliminated the overhead of figuring out 999,999 execution plans and the code runs about 7 times faster.

    --===== Use the "Pseudo-Cursor" of a Cross Join

    TRUNCATE TABLE #TestTable; --Empty the table to make reruns easier

    INSERT INTO #TestTable (N)

    SELECT TOP 1000000

    N = ROW_NUMBER() OVER (ORDER BY (SELECT 1))

    FROM sys.all_columns ac1

    CROSS JOIN sys.all_columns ac2

    ;

    SELECT * FROM #TestTable

    ;

    Hmmm… no real differences in duration, CPU, Reads, or Writes, though. Let’s examine the problem a little more closely….

    What was the reason for putting the data into a table in the first place? The answer is, so that we could return all of the values as a single result set to the GUI. Since the CROSS JOIN method returns a single result set, which meets our single result set requirement, let’s see what happens if we take out all the hooie about inserting into a table.

    --===== Use the "Pseudo-Cursor" of a Cross Join without the INSERT

    TRUNCATE TABLE #TestTable; --Empty the table to make reruns easier

    --INSERT INTO #TestTable (N)

    SELECT TOP 1000000

    N = ROW_NUMBER() OVER (ORDER BY (SELECT 1))

    FROM sys.all_columns ac1

    CROSS JOIN sys.all_columns ac2

    ;

    -- SELECT * FROM ETestTable

    --;

    NOW we’re cooking with gas!!! The CROSS JOIN took only 2 seconds duration (more than 12 times or 1,200% faster), about a 5th of a second of CPU time (about 122 times of 12,200% less) , and only 129 reads (millions of reads less). If you look at the execution plan, it only had to do 1 analysis for and make 1 plan to process a million rows instead of a making a million plans to process 1 row for each plan. We also got rid of a whole lot of read overhead by getting rid of a Temp Table that we no longer need.

    To summarize some of the reasons why this form of RBAR is bad…

    1.SQL Server is NOT a compiler. It’s an interpreter. Each iteration of a loop must be analyzed at the statement level and SQL Server must either decide to use an existing execution plan or create a new one. In either case, that takes a relatively long time both duration wise and CPU wise.

    2.Since the minimum size that SQL Server can actually read from or write tois a whole page, each iteration is required to drill down through each page of the B-TREE to the Leaf level (or do a full table or index scan in the worst cases). Even on tiny tables where the B-TREE and Leaf level are the same page, that requires N number of pages read instead of N number of rows read. On larger tables, the B-TREE will likely be at least 2 levels deep which is why the number of reads will exceed (more than double) the number of rows. Even in memory, that takes a lot of additional time.

    3.Since we could return a single result set with the set-based version, we didn’t need the extra overhead of populating a table so another reason why set-based code is usually much faster than this form of RBAR is process and, sometimes, object elimination.

    The bottom line here is that “Explicit RBAR” operates at interpreter speeds rather than compiler speeds. “Explicit RBAR” doesn’t allow SQL Server to run at the machine language speeds that it’s capable of.

    [font="Arial Black"]Hidden RBAR[/font]

    There are a whole lot more different forms of “Hidden RBAR” than there are for “Explicit RBAR”. “Hidden RBAR” can come in the form of Recursive CTEs that process 1 row at a time, Triangular Joins which are typically associated with (but not limited to) correlated sub queries or joins that have an inequality in them, Scalar Functions, and Multi-Statement Table Valued functions (not to be confused with Inline Table Valued Functions) to name just a few of the atrocities known as “Hidden RBAR”. You can read about some of them at the following URLs.

    http://www.sqlservercentral.com/articles/T-SQL/61539/

    http://www.sqlservercentral.com/articles/T-SQL/74118/

    http://www.sqlservercentral.com/articles/T-SQL/91724/

    I haven’t been able to put my hands on it just now but Paul White figured out a way to make the Triangular Joins associated with using correlated sub queries in the running total problem run almost lightning fast. But, it’s still RBAR. Why? Because it still used millions of reads for a relatively small number of rows. In other words, it was still a memory I/O resource hog and a bit of a disk hog because it required additional indexes to make the code run as fast as it did. Also, there are other methods that don’t require those additional indexes and certainly won't generate that number of reads.

    Recursive CTEs (and other forms of recursion) that process one row at a time are also resource hogs because, even though they look set-based, they work pretty much the same way that a While loop works behind the scenes. In fact, a well-built While loop will frequently out perform a Recursive CTE for both speed and lower number of reads.

    So what’s the bottom line for the “Hidden RBAR” type? Even though it looks (or may actually be) set-based, it still uses a huge number of resources per row processed and it may still take an excessive amount of time to execute.

    [font="Arial Black"]Good RBAR[/font]

    Actually, that’s an Oxymoron. There’s no such thing as good RBAR because if it’s still agonizingly slow or uses an agonizing amount of resources, then it’s not “good”. There are, however, places where RBR (Row By Row) is a good thing. One place is for process control. For example, if you want to send 100 emails with different body content, the only way to do that is to send them one at a time because that’s the way things like sp_Send_DBMail work. In this case, it’s just RBR (no “A”) because of 2 things… there’s nothing you can do about how sp_Send_DBMail works (it has to be row-by-row) and the loop, however it may be formed, isn’t the slow or resource intensive part of the code. In fact, even if you use FOR XML PATH to build the individual commands for each email as a single large string and execute it, it won’t be any faster or any less resource intensive even though you’ve seemingly avoided the (explicit) loop.

    Another example might be where you need to do something to every table in a database. Again, the loop isn’t going to be the cause of it being slow or resource intensive. The process that MUST be executed individually is the real problem.

    Before you get all excited about good RBAR (I call it “PBR” or “Process By Row” and intentionally named it after my favorite beer because I like to iterate with beer ;-)), don’t think for a minute that just because someone on your team wrote a stored proc that can only handle one row at a time is necessarily a good thing or justifies the use of RBAR. That would be what I call “RBAR on Sterioids” and the proc should be rewriiten because, unlike sp_Send_Dbmail, it CAN be rewritten to use non RBAR methods.

    There are actually good loops that process data themselves. For example, using a While loop (or Recursive CTE, in this case) to preprocess an Adjacency List (parent/child) Hierarchy is actually set-based code because it’s handling sets of rows all located at the same level in the Hierarchy. Conversely, not all set based code avoids RBAR as in the Triangular Join example.

    [font="Arial Black"]Two Common Mistakes[/font]

    A lot of people have cited a supposed “Best Practice” of when to use RBAR or not and they are pretty much incorrect. They say “Write Set-Based unless it can’t be done set based and then it’s ok to use a loop” or they say “If set based code is slower (like the triangular join thing), then use a loop”. I’m normally not so bold but to that, I say “rubbish”!!! The problem is that many people don’t know what can actually be done with GOOD set based code nor how to do it and because they’re under pressure to get something done in a hurry, they revert to what they know instead of taking the time to learn to do their jobs better. There are places to use loops and recursion, but they’re a whole lot more scarce than many folks can imagine. To wit, just because someone doesn’t know how to avoid a given loop, doesn’t mean that a loop should be used. Instead, it usually means that the person needs to learn more.

    [font="Arial Black"]Definition of RBAR[/font]

    So, whats my definition of RBAR?

    Because of the Hierarchy example I just gave (which is actually set based) you can’t simply say that RBAR is anything that has an explicit While loop in it nor can you say that it’s anything that has recursion in it (although those are frequently RBAR problems).

    You also can’t say that if it is set based that it’s not RBAR because of things like the Triangular joins in the running total example, which actually does operate on sets for each row (although that’s what makes it slow in this case).

    If you look at the million row While loop examples at the beginning of this post and the fact that they generated internal row counts of 3 million rows (each row was touched more than once) or you look at the hundreds or even thousands of times a Triangular join might touch a given row, you might be tempted to define RBAR as anything that touches a row more than once. But that doesn’t work because most people don’t consider the use of a Tally Table to be RBAR and for things like splits, the same rows of the Tally Table are used over and over and over and… etc.

    Some call Scalar UDFs RBAR because they are, in fact, executed for each row in a SELECT but I’ve also seen where certain UDFs (even with While loops in them) can actually beat other methods (including set based methods) so I’m not willing to call all Scalar UDFs “RBAR” (although they usually are).

    And, because of the Triangular join problem, I’m certainly not willing to say that anything that isn’t set based is RBAR.

    If we look at what we know to be RBAR, perhaps we can come up with a definition. For example, we’ve seen that While loops that count or work with individual rows in a table that aren’t actually controlling set based or necessarily individual processes are RBAR. The same holds true for similar Recursive CTEs. Triangular joins use machine language speed “Pseudo Cursors” and they are actually set based in that they are processing sets but they might touch each row thousands of times (once for each set processed) causing an ungodly amount of reads whether they take too long or not.

    And if we look at my friend, the CROSS JOIN, which I use to very quickly generate millions of rows of test data, we find that it’s actually touching the same rows from the row-source tables hundreds of times even though it’s not considered to be RBAR.

    So, what is RBAR? What is my definition of RBAR?

    In scientific terms, I’d have to say that it’s anything that either overrides the ability of SQL Server to use the machine language, high performance “Pseudo Cursors” (loops) that occur behind the scenes or causes excessive duration or use of resources for each row processed even if such “Pseudo Cursors” come into play.

    In less scientific and more fun terms, it’s quite literally the difference between RBR and RBAR… that being the letter “A”, which stands for “Agonizing”. If code causes me to agonize over the duration or resources used for each row being processed, whether the code is set based or not, then, to me, it’s RBAR. 😉

    The problem with a lot of people is that they just don’t know what’s possible so they agonize over their code a whole lot less that I usually do and so they’ll inappropriately settle for what they think is RBR when it’s really RBAR.

    So far as whether or not RBAR is possible in the front end or other managed code, my answer would be “yes” except there, they usually just call it “crap code”.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff,

    In your cross join examples the table #TestTable is written as ETestTable, causing an error if not corrected.

    In addition to this, I found the explicit examples not match my performance expectations, therefore I did some quick experiment. Please note that my server uses SSDs for storage and such will show different characteristics with respect to I/O.

    The first example using a cross join executed in 9 seconds on my system, against 6 seconds for the following example that does not do any insert. This means the insert overhead on my system is like 3 seconds (thanks to SSDs mostly). But since we do insert into ... select ... into a clustered index, we ought to put an order by in it as well as it doesn't cost us anything processing wise, and it simplifies the inserts in and locking of the table.

    Executing the following code reduces the 9 seconds to 7 seconds, making it just 1 second slower then the cross join code that does no insert at all. The query plan however remains unchanged, it is just the performance that increases!

    --===== Use the "Pseudo-Cursor" of a Cross Join

    TRUNCATE TABLE #TestTable; --Empty the table to make reruns easier

    INSERT INTO #TestTable (N)

    SELECT TOP 1000000

    N = ROW_NUMBER() OVER (ORDER BY (SELECT 1))

    FROM sys.all_columns ac1

    CROSS JOIN sys.all_columns ac2

    ORDER BY 1

    ;

    SELECT * FROM #TestTable

    ;

  • Awesome writeup Jeff! Worthy of an article submission IMHO!

  • peter-757102 (12/11/2013)


    Jeff,

    In your cross join examples the table #TestTable is written as ETestTable, causing an error if not corrected.

    In addition to this, I found the explicit examples not match my performance expectations, therefore I did some quick experiment. Please note that my server uses SSDs for storage and such will show different characteristics with respect to I/O.

    The first example using a cross join executed in 9 seconds on my system, against 6 seconds for the following example that does not do any insert. This means the insert overhead on my system is like 3 seconds (thanks to SSDs mostly). But since we do insert into ... select ... into a clustered index, we ought to put an order by in it as well as it doesn't cost us anything processing wise, and it simplifies the inserts in and locking of the table.

    Executing the following code reduces the 9 seconds to 7 seconds, making it just 1 second slower then the cross join code that does no insert at all. The query plan however remains unchanged, it is just the performance that increases!

    --===== Use the "Pseudo-Cursor" of a Cross Join

    TRUNCATE TABLE #TestTable; --Empty the table to make reruns easier

    INSERT INTO #TestTable (N)

    SELECT TOP 1000000

    N = ROW_NUMBER() OVER (ORDER BY (SELECT 1))

    FROM sys.all_columns ac1

    CROSS JOIN sys.all_columns ac2

    ORDER BY 1

    ;

    SELECT * FROM #TestTable

    ;

    Thanks for the heads up and apologies for the error (I've corrected it). Apparently I made a change directly in the article to make up for some missing code and mistyped it.

    As to the performance of the code using SSDs, yes, SSDs can make up for a wealth of sins but they're still sins. Why would anyone write slower code if they knew something faster was available?

    Also, I appreciate the info on the order by but be advised that 1) an ORDER BY ordinal has been deprecated and is considered by most to be a worst practice for multiple reasons and 2) it still assumes that you need to do the insert into a table for this problem.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • patrickmcginnis59 10839 (12/11/2013)


    Awesome writeup Jeff! Worthy of an article submission IMHO!

    Thank you kind Sir. My only question would be, does it sufficiently answer your good question as to why RBAR is slower than other code?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff, I agree and SSD usage to cover up slow code was not my point.

    Your explicit RBAR example compares two situations:

    1. one does a cross join with an insert (very slow and absurd amounts of I/O)

    2. the other does the same cross join, but returns the result directly without writing to disk.

    You conclude based on I/O statistics that RBAR tales place in one, but not in the other despite the same select statement in the query being used. All I did was add an order by, which on my system at least, closes the gap between both situations by a large amount. Could it be that the slowdown was caused by the insert without order by? Is that backed up by the I/O numbers?

    It is in my opinion a valid point to raise, and i am interested in:

    a. how does it make the numbers look on your system, so we have a apples to apples comparison again.

    b. does the code with the insert + order by still properly demonstrate RBAR in your opinion?

    As you said, why would one favor slower solutions. Slower in this case is also doing an unsorted insert into a table with a clustered index. In my example the order by is essentially free as row_number() is what is sorted on...which always increases and the optimizer knows this too i think.

  • Jeff Moden (12/11/2013)


    peter-757102 (12/11/2013)


    ...

    Thanks for the heads up and apologies for the error (I've corrected it). Apparently I made a change directly in the article to make up for some missing code and mistyped it.

    As to the performance of the code using SSDs, yes, SSDs can make up for a wealth of sins but they're still sins. Why would anyone write slower code if they knew something faster was available?

    Also, I appreciate the info on the order by but be advised that 1) an ORDER BY ordinal has been deprecated and is considered by most to be a worst practice for multiple reasons and 2) it still assumes that you need to do the insert into a table for this problem.

    1) Valid point, and I strongly dislike order by ordinal. This was just an easy to do thing...do not copy that folks :).

    2) I never interpreted the inserting into a table being what you wanted to demonstrate as being RBAR in this solution. I was focusing on the cross join and thus found leaving simply the insert out suspect. It would not come up in my mind to insert data from a select, which I can also return directly...it is a unnecessary step that adds no functionality. There are however situations where inserting needs to take place and is in fact the most direct step, leaving me with the question what the code really demonstrates.

  • @peter-2,

    The point of me starting out by leaving the INSERT in the CROSS JOIN example was to, indeed, compare apples to apples and then to demonstrate that the original reason for the INSERT went away because it was no longer necessary. To wit, the RBAR of the loop was what required the INSERT to make the loop capable of returning a single set and the side effect of not using RBAR was that process could be eliminated making the code even faster and the almost total elimination of the extraordinary number of reads associated with the INSERT. The ORDER BY is a nice optimization for when you absolutely need for set based code to do an INSERT but that's not what I was trying to demonstrate. I was trying to demonstrate that RBAR also has side effects that aren't necessary with other forms of code.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Now, I get the point your trying to make and believe it would be good to incorporate your explanation in your original post in the same manner and maybe I am just the one person reading over it. As I wouldn't think and code like in that first example in the first place, i feel pretty blind to the though process behind the examples and take them on more case by case.

  • Jeff Moden (12/11/2013)


    patrickmcginnis59 10839 (12/11/2013)


    Awesome writeup Jeff! Worthy of an article submission IMHO!

    Thank you kind Sir. My only question would be, does it sufficiently answer your good question as to why RBAR is slower than other code?

    Absolutely!

  • RBAR from the client normally incurs the overhead of transferring large amounts of data over the network. Even so, the main issue of RBAR is with the high number of queries executed. If you have a set based query joining two tables of a million records each, SQL Server might be able to stream and buffer the reads of each table at the same time. With luck, it needs to read each record in each table once. SQL Server is good at optimizing a single request.

    The quick way to code the RBAR on the client is with a nested cursor or a cursor with a nested query. For each of the million records of the first table, query the second table for related records. The client has to execute a million nested queries rather than one. SQL Server has to stream and buffer for a million queries rather than one. SQL Server can optimize each of the million requests, but it can't optimize the million requests as a group. This will be slow.

    Is there another way to use RBAR on the client that is faster? If I had to do try, I 'd start with a thread for each table that executes a cursor to read the table. The data from each table would buffered at the same time. Another thread would be used to process the buffers. After a few months of work, I might have something that would only stink compared to a set based plan generated by SQL Server in a few milliseconds.

    RBAR -> many queries and no global optimization

    RandyHelpdesk: Perhaps Im not the only one that does not know what you are doing. 😉

  • peter-757102 (12/11/2013)


    Now, I get the point your trying to make and believe it would be good to incorporate your explanation in your original post in the same manner and maybe I am just the one person reading over it. As I wouldn't think and code like in that first example in the first place, i feel pretty blind to the though process behind the examples and take them on more case by case.

    Understood. I did, however, think that the following statements in the article pretty much covered the same thing....

    What was the reason for putting the data into a table in the first place? The answer is, so that we could return all of the values as a single result set to the GUI. Since the CROSS JOIN method returns a single result set, which meets our single result set requirement, let’s see what happens if we take out all the hooie about inserting into a table.

    ...{snip}...

    3. Since we could return a single result set with the set-based version, we didn’t need the extra overhead of populating a table so another reason why set-based code is usually much faster than this form of RBAR is process and, sometimes, object elimination.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Another issue I've seen is RBR -> RBAR. If something somplace requires processing a row at a time, then that's the way it's all done. For example, sending email. All the pre-processing can be set based until it's time to send the mail.

    RandyHelpdesk: Perhaps Im not the only one that does not know what you are doing. 😉

  • Someone said something like "Always do what makes sense". I vote for that!

    It never stops amazing me how "solutions" are discussed as if they would work universally. Most of them, don't! Most of the time, nobody even asks if we're talking about an OLTP environment or a data warehouse?

    RBAR SSIS may work for you, or not. Let's say you have a data warehouse that you update at night, and it is only in use during the day for custom reporting. What's wrong with SSRS (RBAR) during the night in this case? Nothing. Quite the opposite: when done right (as part of a framework), when you come in in the morning and something went wrong, you know what the error was, it's in your email, you know how many rows were written before it aborted, and, after fixing it, restarting the package will automatically delete the rows written before the error and start over, as well running subsequent packages that are waiting...

Viewing 15 posts - 91 through 105 (of 116 total)

You must be logged in to reply to this topic. Login to reply