A few reflections on RBAR by a weary application developer

Question

A few reflections on RBAR by a weary application developer

Viewing 15 posts - 16 through 30 (of 117 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004685 More actions · Answer 1

Stephen Hirsch (12/9/2013)
"Always do what makes sense"
I'm not so sure about the example given in the article, but that is sound advice. My rule of thumb is, if you need to handle rows individually, then rebar makes sense; else not.

The problem is... people frequently don't know how to handle rows other than individually because they haven't had the training to do things right. That means that my rule of thumb is that if RBAR has been used, it can usually be greatly improved.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004685 More actions · Answer 2

patrickmcginnis59 10839 (12/9/2013)
I didn't see any explanation on why RBAR is so slow, so rated this article 1 star.

This is how RBAR comes to be in systems. Many people just don't know and so they lean towards what they do. It gets the job done at the expense of future scalability and rework.

This is certainly not to say all loops are bad. For example, massive deletes are better done using loops to delete logically ordered sets of rows which isn't RBAR. Still and depending on the percentage of the table to be deleted, making a simple copy of the data to keep and dropping the original table is frequently much more effective (for example).

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004685 More actions · Answer 3

Michael Lysons (12/9/2013)
Good grief, I just had to RBAR that response from Jeff. I was hoping for a "SQL" version: the True DBA fixed it.

It you decipher the names, it was an "SQL" version and better than doing what the false DBA did by creating the problem and not teaching anyone else, the True DBA solved both the immediate and future problems by empowering the people (developers) through the newly guilded True DBA. 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Stephen Hirsch SSCommitted Points: 1822 More actions · Answer 4

Everything can always be improved. And I agree, developers need to know how to do both row by row and set operations, and use whichever one optimizes the best for the task at hand.

Jeff Moden SSC Guru Points: 1004685 More actions · Answer 5

Stephen Hirsch (12/9/2013)
Everything can always be improved. And I agree, developers need to know how to do both row by row and set operations, and use whichever one optimizes the best for the task at hand.

And lest the honorable Knight of Knuth be misquoted again, it's not a form of evil "pre-optimization". It's just good progamming. 🙂

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

x SSC-Insane Points: 23660 More actions · Answer 6

Jeff Moden (12/9/2013)
patrickmcginnis59 10839 (12/9/2013)
I didn't see any explanation on why RBAR is so slow, so rated this article 1 star.
This is how RBAR comes to be in systems. Many people just don't know and so they lean towards what they do. It gets the job done at the expense of future scalability and rework.

I'm interested in why RBAR is so slow in SQL systems, and some of the following psychology that develops around this is pretty interesting too. I've seen stuff on the net like "this non-T-SQL program is so slow, I bet it uses RBAR inside".

I think its getting to be on the level of those cargo-cult societies.

edit: adjusted cargo-cult sentence 🙂

Michael Lysons SSCertifiable Points: 6497 More actions · Answer 7

Michael Lysons

SSCertifiable

Points: 6497

December 9, 2013 at 7:48 am

#1671948

Thanks for making my point for me in 3 lines 😉

PHYData DBA SSCertifiable Points: 7541 More actions · Answer 8

Programing languages, each and everyone of them, to include PSQL and TSQL are abstraction layers. RDBMS are data storage systems. Agreed that RDBMS have an application and programing interface, abstraction layer, aka API to let programs access the data storage system. Calling an entire RDBMS and abstraction layer is like calling NTFS, NFS, etc. an abstraction layer. One could generalize and call eMail an abstraction layer to communication with that logic. Doubt that would do anything but cause confusion not reduce it. 😎

My best RBAR story is from some java application developed using the cut and paste example into new code method. It was causing an Enterprise SQL server with 32 CPU's and 16GB Ram to fail from lack of resources. Seems that the code was developed to extract all User ID's and then for each User Id execute a SQL statement from a var on a SQL connection that was created from a var. Both Vars where declared before the for each loop and where both called during the for each loop.

Yes you got it right. Not only RBAR, but the greatly feared by DBA everywhere leviathan know as CROWRBAR (Connections Redundantly Opened With RBAR) execution.

peter-757102 SSCertifiable Points: 6877 More actions · Answer 9

patrickmcginnis59 10839 (12/9/2013)
Jeff Moden (12/9/2013)
patrickmcginnis59 10839 (12/9/2013)
I didn't see any explanation on why RBAR is so slow, so rated this article 1 star.
This is how RBAR comes to be in systems. Many people just don't know and so they lean towards what they do. It gets the job done at the expense of future scalability and rework.
I'm interested in why RBAR is so slow in SQL systems, and some of the following psychology that develops around this is pretty interesting too. I've seen stuff on the net like "this non-T-SQL program is so slow, I bet it uses RBAR inside".
I think its getting to be on the level of those cargo-cult societies.
edit: adjusted cargo-cult sentence 🙂

When you process row by row you throw out of the door optimization aspects that can be applied to the process as a whole. Things that are not even under your direct control and thus cannot be dealt with by your own (situation dependent) static cleverness. Things like disk IO/patterns, parallelization and memory management.

I will try to explain is as simple words as I can think of right now:

The database engine uses a mathematical model to represent the structure of the data in a database. SQL code interacts on the this same structure and is validated against it. Operating on the same structure thus means operation on the same model! The relational engine is aware of available indexes, the statistical properties of each, entity relationships and restrictions on both of them. Your SQL code is not, but thanks to both working on the same model the optimizations are possible.

Now here comes the fun part…the relational engine…that one processes the model + interaction can choose from a substantial bag of mathematical tricks/transformations to optimize the precise flow of and interaction of/operation on data. In order to judge what will be an efficient transformation it keeps around statistics to weight conflicting possible solutions until it finds one good enough.

If I were to process one row at a time, this complex machinery is underutilized and restricted in what it can do. Smart decisions to reduce the number of page reads from a large persisted table in a proper set processing SQL statement, is simply nullified when I organize my SQL code to so explicitly one row at a time.

Data from outside the relational database itself is different. It makes sense to sanitize it first and sometimes that works fine row by row before putting each row into the database. But most of the times importing all the data into a database temp table using bulk import or a BCP command before further processing is way faster. With the later you have the tools to make sure record for example are unique or follow other rules. Simply put constraints/indexes on the temp table. Then when everything is proven to be sound and following relevant rules (which sometimes go beyond a single record making row by row implicitly inferior), you can efficiently integrate the sanitized data into the existing database!

My answers to the "pre-optimization" is always, that I follow "performance by design" which is not to be confused with optimization :). It generally are always inexperienced people that go out of the way to vigorously optimize a clear and simple function into a complicated mess that no-one understands. The other half of inexperienced people tries to avoid doing any thinking about optimization and using Knuth as a scapegoat to cover their shortcomings....which is worse? I say the later, simply because they they are not in the "mode" of learning when and where spending time to get optimal result works best and thus do not develop into good professionals over time.

Luis Cazares SSC Guru Points: 183706 More actions · Answer 10

Just to be clear. SSIS is not RBAR on its own. SSIS is a tool that can easily use the power of “Tresni Klub”.

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

Charles Kincaid SSChampion Points: 13593 More actions · Answer 11

"Avoid RBAR at all costs!" Right. Yes it helps, and helps a great deal. Another thing that you will hear is "All software is compromise!". So is the "all costs" part true. Well no. I have to amalgamate data from several sources that can't be connected. I use set based solutions on each source to condense that data from that source. Then I go down each result and merge the data row by row, If I had all the data in one source then I could be set based all the way.

Somebody (team member) suggested that we import all the data, use a set based solution to get the final result set, and finally removed the imported data. I pointed out that the import of this staggering amount of data would be RBAR right there. So here is another rule "Right tool for the right job."

ATBCharles Kincaid

PHYData DBA SSCertifiable Points: 7541 More actions · Answer 12

Jeff Moden (12/9/2013)

Consider a daily data file import, 60k rows. The data is loaded, of course, row by agonizing row, into a temp table. The mighty DBA, with his scimitar, strikes a mighty blow and updates the destination tables in one enormous, amazing join. The update runs in production in under three minutes.
Yay! Everyone cries! We weep with the endless possibilities of mighty set based queries!
....

The SQL God continues to read through tears of laughter

Welcome to the battle known as daily, my young friend.

😎 Jeff Moden The SQL Gods have always smiled upon your humor. Laughing with others and at your self is the greatest wisdom of all. 😎

Solomon Rutzky SSCoach Points: 16430 More actions · Answer 13

peter-757102 (12/9/2013)
patrickmcginnis59 10839 (12/9/2013)
Jeff Moden (12/9/2013)
patrickmcginnis59 10839 (12/9/2013)
I didn't see any explanation on why RBAR is so slow, so rated this article 1 star.
This is how RBAR comes to be in systems. Many people just don't know and so they lean towards what they do. It gets the job done at the expense of future scalability and rework.
I'm interested in why RBAR is so slow in SQL systems, and some of the following psychology that develops around this is pretty interesting too. I've seen stuff on the net like "this non-T-SQL program is so slow, I bet it uses RBAR inside".
I think its getting to be on the level of those cargo-cult societies.
edit: adjusted cargo-cult sentence 🙂
When you process row by row you throw out of the door optimization aspects that can be applied to the process as a whole. Things that are not even under your direct control and thus cannot be dealt with by your own (situation dependent) static cleverness. Thinks like disk IO/patterns, parallelization and memory management.
...
If I were to process one row at a time, this complex machinery is underutilized and restricted in what it can do. Smart decisions to reduce the number of page reads from a large persisted table in a proper set processing SQL statement, is simply nullified when I organize my SQL code to so explicitly one row at a time.

I agree with this but would add another main factor: ACID compliance. Locking, unlocking, transaction log writing, etc. all take time and resources. Given the example in the article, 60k transactions is a lot of overhead compared to a single, even if much larger, transaction.

Take care,

Solomon...

EDIT:

I would add that there is no parallel to cargo-cult societies given that it is scientifically provable that row-by-row processing is nearly always slower (within the realm of RDBMS's) than set-based processing (given that the operation can be handled in a set-based approach).

Compromise (in most areas of life) is required and encouraged, but requires having enough knowledge of the options in order to make good decisions for the situation at hand. Charles's example of his situation is compromise. But simply saying that, "because set-based did not work in one particular situation it is now better to do row-by-row", is not compromise if no investigation was done into how to best handle the problem (i.e. needing to find which import rows were causing errors).

SQL# — https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
Sql Quantum Lift — https://SqlQuantumLift.com/ ( company )
Sql Quantum Leap — https://SqlQuantumLeap.com/ ( blog )
Info sites — Collations • Module Signing • SQLCLR

x SSC-Insane Points: 23660 More actions · Answer 14

peter-757102 (12/9/2013)
patrickmcginnis59 10839 (12/9/2013)
Jeff Moden (12/9/2013)
patrickmcginnis59 10839 (12/9/2013)
I didn't see any explanation on why RBAR is so slow, so rated this article 1 star.
This is how RBAR comes to be in systems. Many people just don't know and so they lean towards what they do. It gets the job done at the expense of future scalability and rework.
I'm interested in why RBAR is so slow in SQL systems, and some of the following psychology that develops around this is pretty interesting too. I've seen stuff on the net like "this non-T-SQL program is so slow, I bet it uses RBAR inside".
I think its getting to be on the level of those cargo-cult societies.
edit: adjusted cargo-cult sentence 🙂
When you process row by row you throw out of the door optimization aspects that can be applied to the process as a whole. Things that are not even under your direct control and thus cannot be dealt with by your own (situation dependent) static cleverness. Things like disk IO/patterns, parallelization and memory management.

This implies that if you ARE able to take into consideration things like IO patterns, etc, that you could THEN use RBAR with NO PENALTIES. In my experience this is not the case, there are real and severe penalties using RBAR in T-SQL even when disk IO, parallelization, etc are taken into consideration.

peter-757102 SSCertifiable Points: 6877 More actions · Answer 15

patrickmcginnis59 10839 (12/9/2013)
peter-757102 (12/9/2013)
When you process row by row you throw out of the door optimization aspects that can be applied to the process as a whole. Things that are not even under your direct control and thus cannot be dealt with by your own (situation dependent) static cleverness. Things like disk IO/patterns, parallelization and memory management.
This implies that if you ARE able to take into consideration things like IO patterns, etc, that you could THEN use RBAR with NO PENALTIES. In my experience this is not the case, there are real and severe penalties using RBAR in T-SQL even when disk IO, parallelization, etc are taken into consideration.

I do not think I implied that at all, certainly did not indent to do so. I thought I'd covered that by stating: (situation dependent) static cleverness.

Also under direct control does not mean taking into consideration...we have no control over those things, nor are they ever the same!

Under the hood it all ends with machine code executed in a procedural way. The meaningful differences are in what happens on top of all this. SQL is made to be used with the relational model and systems implementing both are optimized towards that sort of workload. Just the act of writing in a way to counter both the ideas behind relational databases and the server optimizations is asking for slowdowns.

The only times I found when taking over some control works well, is either when the optimizer makes really bad choices or for computational tasks like string splitting and other procedural functions by offloading those with a SQL CRI function call. Even then, keeping in mind how data flows and the consequences thereof has a big impact. You should at no stage loose track of the ideas behind SQL and relational data processing if you want to have a scalable solution in those cases.