Hidden RBAR: Triangular Joins

Question

Hidden RBAR: Triangular Joins

Viewing 15 posts - 61 through 75 (of 255 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1003894 More actions · Answer 1

Christian Buettner (12/7/2007)
Hi Jeff, Chirag,
I am not sure if I misunderstood something, but the SQL Statement does NOT utilize a "triangular join":
Select orderid,orderdate,productid
from orders O INNER JOIN orderdetails OD
On O.orderid = OD.OrderId
WHERE OD.productid <> 30.
Let me know if I misunderstood anything.

You are correct... the query above is just a normal equi-join with an exclusion qualifier. The "=" sign and the fact that the inequality is not part of the join (it's a filter, in this case) makes it so.

That's one of the things I was trying to address... triangular joins look innocent because of code like the above which is good viable code. Make the inequality part of the join (relates columns) as in the respective example in the article, and you have the potential for astronomical internal rowcounts.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1003894 More actions · Answer 2

Does a inner join (for that matter any join) with a inequality condition in the where clause make a triangular join.
Yes, but in the presence of other criteria, it can be quite useful and, well, speedy. In fact, one of the dupe checks I've done to change a 24 hour job that sometimes fails into a 22 minute job that hasn't failed yet, uses a triangular join. See the "conclusion" section of the article... the purpose of the article is not to say "don't use triangular joins" but, rather, to be aware of the catstropic damage they can do if you're caught unaware.
But, even i that light, the real fact is that all triangular joins, no matter how small, will have some perfomance impact compared to true set based operations that touch each row only once...
--Jeff Moden

Chirag (12/6/2007)
Thanks Jeff.
So u mean to say that there is a triangular join in my example. Since there are other conditons in the where clause (i did not put them in the example) this would be ok. but i need to be carefull..

My bad... I need to clarify a bit... the "inequality condition" that I'm speaking of does not apply to "filters" in the form of "columnname r value". It's not possible for that to be a part of a triangular join because, well, it's not joining code... it's a filter.

The only time you can create a triangular join is when the form of "columname r columname" exists and "r" is an inequality of some form.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 3

Ramesh (12/7/2007)
Before 2K5, many a times, we used the following methodology in our financial applications to get the running count for records. People do argued on this stating that the optimizer would not always correctly update the rows or will not correctly use the specified index. Though, it never had any problems in our applications till now, otherwise we would have been screwed up.;)

Just curious - I noticed you said prior to 2005, implying you don't use it anymore. What do you use within 2005?

(edit) - oh wait you said row COUNT... so you're using ROW_NUMBER()? I was in a running total mindframe....

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 4

I 'believe' the optimizer can choose an allocation unit scan instead of a clustered index scan - which would allow unordered rows returned from your table with a clustered index.

Another possible situation is on Enterprise Edition where the read thread(s) join up with a scan already in progress - which would also lead to unordered rows despite the clustered index.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Jeff Moden SSC Guru Points: 1003894 More actions · Answer 5

...how certain types of UDFs can also lead to "RBAR" executions, e.g., when applying a UDF to a column to extend the results of your SELECT.

That's a spot-on observation... and you're implication is spot-on as well... many forms of UDF's are a form of Hidden RBAR and many folks miss that fact.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

corey lawson Hall of Fame Points: 3773 More actions · Answer 6

Yes, the administration of Oracle is a PITA compared to SQL Server, but from a (my) developer's perspective, it's a much bigger, varied set of legos to play with, as it were. Oracle has had the ROWNUM() crutch for so long...

It's like watching MySQL fanboys proclaim how great MySQL is now that it finally has some feature or whatever (like subselects) that has been in every other RDBMS since forever, but they still can't seem to lock down ACID very well. (but I'm not trying to compare SQL Server to MySQL... )

SQLServer::Oracle <> MySQL::every other RDBMS, including Access or even...FileMakerPro.

I just like working with SQL, even Access.

cs_troyk SSCertifiable Points: 5383 More actions · Answer 7

Jeff Moden (12/6/2007)
I would disagree with your assertion that set based programming is code that touches rows once or very few times and that the query you specified is not set based. Set based simply means that you declare what you want to return without regard to how it is going to be returned. Otherwise, this procedural code submitted by Christian Buettner would be more set based than your query (assuming the appropriate index) since it only touches each row once:
Set based cannot be a WHILE loop (Procedureal code) as a single row does not necessarily constitute a set. Touching a row more than once violates the basic principle of set based. If you have to touch a row more than once, it is "multi-set based" and that is also RBAR.
The whole point is that what "you declare what you want to return without regard to how it is going to be be returned" is a classic error that frequently violates the first "rule" of set based programming... that of touching a row more than once.
You'll understand my meaning in the follow-on article which solves the "Running total" problem by only touching each row once in a true set based fashion.

Jeff and Matt;

It makes it difficult to have a conversation when people refuse to use well-established meanings for particular terms.

Your comments betray a lack of understanding of what "set-based programming" means, and sloppy thinking about what a "set" is. There is nothing precluding a set from having only one member, or even no members. "Set-based" programming is _not_ a synonym for "most efficient execution plan", nor does it mean "only touching a row once". It is a computational model that says nothing of the physical implementation.

Therefore, it _is_ the case that a set-based expression can result in a less computationally efficient execution than a procedural-based solution. I've never denied this. In fact, I've changed pure set-based solutions to programming loop based solutions in the past to fix performance problems. The only point I'm trying to make is that "set-based programming" has a particular definition that is different than the one you used in your article.

TroyK

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 8

cs_troyk (12/7/2007)
It is a computational model that says nothing of the physical implementation.

The one that doesn't say anything of the physical implementation is the Mathematical model. While the computational model doesn't SPECIFICALLY describe the inner workings of a set engine, there are a lot of constraints and assumptions built into that which do help to define what is and what isn't a set processor. Even Alan Turing (as I recall) used the concept of a set processor "black box", but it had the constraint that the atomic operation being performed could be done to set items within the set in any order, and is done to multiple items at the SAME TIME. so - no specifics, but definitions nonetheless. Call it a generic class definition if you wish.

You are right that we seem to be talking at cross purposes. For one I'm talking about "set-based processing", and you seem to be talking about the ivory tower call set-based PROGRAMMING. One's theory, the other one's applied theory otherwise known as practice. As another poster's tagline eloguently puts "In theory - practice and theory are the same thing. In practice - they're not".

Now I'll admit right here and now that I have at times used the terms interchangeably, because "set-based programming applied to the appropriate set-based engine to produce set-based processing" is just too much set-based. If that's what you're talking about, then 'tis true, and I'm sorry about that. But that's what we're talking about.

That being said - I like to be accurate. If you have a precise, complete definition of set-based programming that we're not following, please - produce it. I don't like to be inaccurate in any way, so I encourage you to set me straight (specifically). It just has to be something that can in some way be applied to practice, which in this case is to the specific set engine we call an RDBMS.

Really - don't give up on this conversation. This is why we're here.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Charles Kincaid SSChampion Points: 13593 More actions · Answer 9

Jeff Moden (12/6/2007)
...yes, to Oracle 😛
Heh... you're a sick man, Corey 🙂 ... I hate Oracle... 😉

Indeed. Could be worse. Could be DB-400 😀

On another tack, pronouncing it ree-bar got a good laugh in the staff meeting the other day. I was asked if I'm researching data or putting in concrete. I replied that proper data base techniques are a good foundation... At that point there was a shower of notepads coming in my direction. :w00t: I thought that R-Bar might be good.

ATBCharles Kincaid

Jeff Moden SSC Guru Points: 1003894 More actions · Answer 10

It makes it difficult to have a conversation when people refuse to use well-established meanings for particular terms.

Heh... But that's my whole point... I believe that what you and a good number of other folks call "set based", is not set based at all and that is why you're having "difficulty" with the conversation. I believe that triangular joins are not set based because they create multiple sets of rows to get the job done instead of just touching each row once. If it's more convenient for you to think of them as multiple sets consisting of sub-sets of a complete set, then go ahead.

One of my criteria for set-based is to "touch" each row only once. Triangular joins touch each row many times and so that disqualifies triangular joins as being set based. It's ok for you to disagree... as it is for me to disagree with you. 😉 And I won't even think you're being "difficult". 😛

So far as my apparent refusal "to use well-established meanings for particular terms", so what? I'm allowed to disagree with the masses and the individual. That's how new things are invented. 😀

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1003894 More actions · Answer 11

Charles Kincaid (12/7/2007)
On another tack, pronouncing it ree-bar got a good laugh in the staff meeting the other day. I was asked if I'm researching data or putting in concrete. I replied that proper data base techniques are a good foundation... At that point there was a shower of notepads coming in my direction. :w00t: I thought that R-Bar might be good.

Sounds like it might have been a fun meeting 🙂

I pronounce it "ree-bar" because, like the steel rods of a similar name, you can really get stuck in the "mud" with/like rebar.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

jtoxification SSC Journeyman Points: 97 More actions · Answer 12

Thank you, Jeff - from now on I will more-closely watch my use of inline queries.

Taking what Larry Taylor said a little bit further --

Jeff clearly states that not all triangular joins are bad. Kudoz for that, but I get the feeling that some statements are being overlooked by readers:

Not all Triangular Joins are bad. With some restraint and the right criteria, Triangular Joins can be used for some pretty remarkable things... make finite schedules... do high speed dupe checks... etc. But, you've really got to be careful. Improperly written Triangular Joins are worse than even Cursors or While Loops and can bring a CPU and Disk System right to it's knees.

And I have to agree with you there -- (When only grabbing small independent columns that join only to a HUGE primary table, and you have an index-strategy in place and don't have to think about it, you're okay, then inline subqueries are often the fastest way to go, regardless of aggregates.) This is like going back to theory vs research vs technique vs application. you need all, else you fail. end-of-story.

In this crazy software industry, we are often thrust in such positions where there's no time for such skill development, but you must take the time, or else you end up spending more. And then we're back into the discussion of runtime optimization for small and large inputs ...

Your best skill is the ability to stay flexible with what you know "right now" and to practice the underlying concepts so that you'll be ready for "what you'll need know in the future." This always gives you the best payout, as you can see from those individuals discussing Big-O notation, runtime, and join plans.

And the other skill is being able to hold back what you know: If someone wants to talk about how programming sub-linear algorithms isn't ever really set-based, be my guest, but "set-based" sounds catchy to me. Better than "tree-based" or "partially-ordered". Not to mention sidelining discussions of paging, partitions, multi-disk environments, pipelines, caching, 64-bit memory buckets ... bleh.

Jeff Moden SSC Guru Points: 1003894 More actions · Answer 13

Sergey Kazachenko (12/7/2007)

Adam Machanic liked a cursor-based solution for running sums:

http://sqlblog.com/blogs/adam_machanic/archive/2006/07/12/running-sums-redux.aspx%5B/quote%5D

I like Adam... so I won't hold that against him. 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1003894 More actions · Answer 14

Thank you, Jeff - from now on I will more-closely watch my use of inline queries.
Taking what Larry Taylor said a little bit further --
Jeff clearly states that not all triangular joins are bad. Kudoz for that, but I get the feeling that some statements are being overlooked by readers:

Exactly... thank you for the feedback.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 15

Ramesh (12/7/2007)
Christian Buettner (12/7/2007)
Hi Ramesh,
what makes you sure that the order of the rows is guaranteed in your example?
Index

An index on a column does not guarantee that data will be returned in that order. Unless an order by is specified, you cannot be sure that the data will always be returned in the order you want.

An index makes it likely, but not certain.

There's a few things that, even with an appropriate index, can make data return 'out of order'.

The storage engine selecting an allocation order scan to fetch the data, rather than an index order scan. If the scan on the exec plan shows Ordered false, then you've had one of these. Not likely except in read uncommitted isolation

Merry-go-round scan. Only on enterprise. The scan stared in the middle of the table, reads to the end, goes back to the beginning and reads the rest.

Parallelism.

In future editions, there may be even more.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass