Hidden RBAR: Triangular Joins

  • Matt Miller (12/6/2007)


    Well - set-based processing entails a bit more than either description. Jeff's taking issue with the fact that the external query has to be evaluated one single row at a time, since the sub-query is a CORRELATED sub-query. The fact that the inner aggregate operation will be run once for each record in the outer query.

    I'd tend to agree that the triangular join is at best a poor application of "set based", since it fully disregards cardinality. Set based entails that you use the smallest possible set, and I don't think triangle joins do that well at all.

    The actual definition of set-based processing right now anything BUT a foregone conclusion, especially when you deal with throw in optimal/best or any of those qualifier denoting best practices. I doubt you would get the SAME definition of what set-based processing means (what makes something set-based versus not) from anyone here.

    While it is certainly true that procedural code sometimes can have the edge, considering that SQL Server is a set engine, 95% or more of the time, something that is TRULY set-based will outperform something that isn't. By miles.

    Again, you're exhibiting the same confusion as the author. Set-based is a logical concept. It simply means that the computational model being employeed by the programmer uses sets (or their analogues in the given language). It has nothing to do with the physical implemenation that carries out the derivation of the results under the covers.

    And, to restate, a so-called "triangular join" is no more nor no less "set-based" than a cross join (i.e., set product), and is no more nor less "set-based" than an inner join. Certainly, the various types of joins can be more or less costly given certain characteristics of the sets (most obviously, cardinality), but that's a physical concern, not logical.

    Final food for thought... assuming we accept the first example query in the article as being somehow "non-set-based", if the T-SQL query optimizer were improved such that query were converted (automatically, under the covers) to a plan that didn't use the "triangle join", would it then suddenly become "set-based"?

    TroyK

  • cs_troyk (12/6/2007)


    Matt Miller (12/6/2007)


    Well - set-based processing entails a bit more than either description. Jeff's taking issue with the fact that the external query has to be evaluated one single row at a time, since the sub-query is a CORRELATED sub-query. The fact that the inner aggregate operation will be run once for each record in the outer query.

    I'd tend to agree that the triangular join is at best a poor application of "set based", since it fully disregards cardinality. Set based entails that you use the smallest possible set, and I don't think triangle joins do that well at all.

    The actual definition of set-based processing right now anything BUT a foregone conclusion, especially when you deal with throw in optimal/best or any of those qualifier denoting best practices. I doubt you would get the SAME definition of what set-based processing means (what makes something set-based versus not) from anyone here.

    While it is certainly true that procedural code sometimes can have the edge, considering that SQL Server is a set engine, 95% or more of the time, something that is TRULY set-based will outperform something that isn't. By miles.

    Again, you're exhibiting the same confusion as the author. Set-based is a logical concept. It simply means that the computational model being employeed by the programmer uses sets (or their analogues in the given language). It has nothing to do with the physical implemenation that carries out the derivation of the results under the covers.

    And, to restate, a so-called "triangular join" is no more nor no less "set-based" than a cross join (i.e., set product), and is no more nor less "set-based" than an inner join. Certainly, the various types of joins can be more or less costly given certain characteristics of the sets (most obviously, cardinality), but that's a physical concern, not logical.

    Final food for thought... assuming we accept the first example query in the article as being somehow "non-set-based", if the T-SQL query optimizer were improved such that query were converted (automatically, under the covers) to a plan that didn't use the "triangle join", would it then suddenly become "set-based"?

    TroyK

    While it's nice to fully divorce the logical from the physical - the logical layer is by far and large useless if you don't have a physical layer that can accomodate your processing in any other than a rote linear iterative process. Meaning - in our current imperfect world, you need to BOTH a set processor and a process written correctly to do thing in a set-based fashion in order to achieve set-based processing. By correctly - I mean in a way that said SPECIFIC set processor engine can handle. This is applied set processing after all.

    By the way - that's processing that grows a a slower rate than linearly proportional to the cardinalty of the universe you're operating in. It's one where the level of the parallelism of the hardware again has a larger than linear impact on speed of processing (the big the set the set-processing engine can handle, the faster things grow). That also entails atomic operation independence, and that cardinality is a HUGE factor.

    So - does it technically fall under the mathematical definition of a set? sure. Is it bad? no if your universe stays small. Are there other ways to do this? yes, and one of the fastest ones I know is an inner loop technique. Is the set-based version of a process always the fastest? no, of course not, but if the problem at hand can be done without relying on order, and the various other rules/constraints outlined in set-based processing principles, then it's got a shot.

    The mathematical theory's assumption is a great one, but not one we have (i.e. infinite parallelism at all levels). Until we have parallel-capable data storage, anything that requires revisiting the SAME data over and over and over and over and over, is not a great thing, and it not going to be a great thing when you deal with BIG sets. Total disregard for the physical layer implementing this is precisely why something like this is a bad example of set-based (in my mind not even deserving of the word).

    And no - Jeff and I don't necessarily see eye to eye on what set-based means. I suspect I will be sitting in your shoes when the answer comes out (i.e. "fast as hell, but not technically set-based").

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Good Explanation of the implication of the Cartesian product (in some communities called cross join ;))

    Predicates can limit the cardinality of the table(vector) inputs to the product but the complexity is O(n^2). (although the algorithm may be no less necessarily O(N^2). But hey why do N complexity if there is a N^2 algorithm out there, you are just doing yourself out of easy performance optimisations 😉

  • Matt Miller (12/6/2007)


    Total disregard for the physical layer implementing this is precisely why something like this is a bad example of set-based (in my mind not even deserving of the word).

    And no - Jeff and I don't necessarily see eye to eye on what set-based means. I suspect I will be sitting in your shoes when the answer comes out (i.e. "fast as hell, but not technically set-based").

    This was an awesome articles and it spawned a great discussion.

    But the question of whether or not you can disregard the physical layer depends on your goal. In most cases in the practical world you want the right answer quickly, regardless of how elegant the solution is. Also, often development time matters a great deal. A working but inelegant solution you can get quickly may often be better than an elegant solution which takes time to develop, especially when it is something that will likely only be run a few times. On the other hand, if your goal is for elegant well written code above all, especially in a more purely theoretical setting, then you can largely if not completely ignore the physical aspects. That situation is more academic than it is practical, but that does not mean it does not come up.

    ---
    Timothy A Wiseman
    SQL Blog: http://timothyawiseman.wordpress.com/

  • timothyawiseman (12/6/2007)


    Matt Miller (12/6/2007)


    Total disregard for the physical layer implementing this is precisely why something like this is a bad example of set-based (in my mind not even deserving of the word).

    And no - Jeff and I don't necessarily see eye to eye on what set-based means. I suspect I will be sitting in your shoes when the answer comes out (i.e. "fast as hell, but not technically set-based").

    This was an awesome articles and it spawned a great discussion.

    But the question of whether or not you can disregard the physical layer depends on your goal. In most cases in the practical world you want the right answer quickly, regardless of how elegant the solution is. Also, often development time matters a great deal. A working but inelegant solution you can get quickly may often be better than an elegant solution which takes time to develop, especially when it is something that will likely only be run a few times. On the other hand, if your goal is for elegant well written code above all, especially in a more purely theoretical setting, then you can largely if not completely ignore the physical aspects. That situation is more academic than it is practical, but that does not mean it does not come up.

    Assuming it's a one-shot deal and your academic situation doesn't kill the server it's run on by either running it out of disk space, killing all connections, etc... then no - I suppose it wouldn't make a difference how you build it. But by not considering the physical aspect, or how the logical applies to it, you will eventually build something that left to its own devices will do just that (saturate a resource of some kind, kill the temp drive, "blow up" a database, etc...)

    It's kind of writing a bad loop: it can have all of the exit conditions you like, but if none of them fire, it ain't never going to end....

    Sloppy code (in whatever technology) is like playing soccer in a minefield: at some point someone is going to have a really bad day....

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • As a developer I'm not limited to doing EVEYTHING in SQL. For SQL2K I just count the rows coming into the record set. It's the hammer and nail thing. SQL2005 has the nifty features that let you put the row count into the result set. A possible good reason for the upgrade.

    ...yes, to Oracle 😛

  • TheSQLGuru (12/6/2007)


    I must say that I am very happy that people can do things like the triangular join - and sad that Jeff may make less people do it. Lost work opportunities for me!! :hehe:

    Kevin, I know of no higher compliment. Thank you, my friend.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I would disagree with your assertion that set based programming is code that touches rows once or very few times and that the query you specified is not set based. Set based simply means that you declare what you want to return without regard to how it is going to be returned. Otherwise, this procedural code submitted by Christian Buettner would be more set based than your query (assuming the appropriate index) since it only touches each row once:

    Set based cannot be a WHILE loop (Procedureal code) as a single row does not necessarily constitute a set. Touching a row more than once violates the basic principle of set based. If you have to touch a row more than once, it is "multi-set based" and that is also RBAR.

    The whole point is that what "you declare what you want to return without regard to how it is going to be be returned" is a classic error that frequently violates the first "rule" of set based programming... that of touching a row more than once.

    You'll understand my meaning in the follow-on article which solves the "Running total" problem by only touching each row once in a true set based fashion.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • This was an awesome articles and it spawned a great discussion.

    But the question of whether or not you can disregard the physical layer depends on your goal. In most cases in the practical world you want the right answer quickly, regardless of how elegant the solution is. Also, often development time matters a great deal. A working but inelegant solution you can get quickly may often be better than an elegant solution which takes time to develop, especially when it is something that will likely only be run a few times. On the other hand, if your goal is for elegant well written code above all, especially in a more purely theoretical setting, then you can largely if not completely ignore the physical aspects. That situation is more academic than it is practical, but that does not mean it does not come up.

    First, thank you, sir, for the great compliment.

    To continue, if you mean to be an effective programmer in the ways of T-Sql, you cannot disregard the physical layer because it will make you or break you. A succint understanding of the physical layer of code in general is necessary to achieve the desired performance of the code... if you think not, then why so many articles about how to balance TempDB against the number of CPU's and articles about reducing the number of physical reads, etc, to achieve such performance.

    Physical performance is not a purely theoretical setting... academia says that if you "know" how to develop with performance in mind, "elagant" solutions will be of second nature and not require the extraordinary and time consuming thought process that so many believe that it will be. The time to develop good set based code is no different than developing merely adequate code and the time savings realized by not having to repair performance limited code as opposed to spending the time to writing performance enabled code to begin with is huge on the side of doing it right the first time.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi guys

    Just a doubt..

    Does a inner join (for that matter any join) with a inequality condition in the where clause make a triangular join.

    EX:- Select orderid,orderdate,productid

    from orders O INNER JOIN orderdetails OD

    On O.orderid = OD.OrderId

    WHERE OD.productid <> 30.

    "Keep Trying"

  • But by not considering the physical aspect, or how the logical applies to it, you will eventually build something that left to its own devices will do just that (saturate a resource of some kind, kill the temp drive, "blow up" a database, etc...)

    A perect thought, indeed... and that's exactly the extaordinary thinking that I hope all attempt to grasp... Well done, Matt.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • By the way - that's processing that grows a a slower rate than linearly proportional to the cardinalty of the universe you're operating in.

    Again, very well said... if the duration and/or resource consumption of a process grows at a rate of more than a linear fashion, then it is not set based. If it takes 4 times longer to process 20,000 rows than it does to do the same process to 10,00 rows, then it is not set based. In essence, "cardinalty" defines set based. Getting back to one of my original claims, set based means that each row is "touched" only once.

    I'll not soon tire of saying this... Well done, Matt.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • ...yes, to Oracle 😛

    Heh... you're a sick man, Corey 🙂 ... I hate Oracle... 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Does a inner join (for that matter any join) with a inequality condition in the where clause make a triangular join.

    Yes, but in the presence of other criteria, it can be quite useful and, well, speedy. In fact, one of the dupe checks I've done to change a 24 hour job that sometimes fails into a 22 minute job that hasn't failed yet, uses a triangular join. See the "conclusion" section of the article... the purpose of the article is not to say "don't use triangular joins" but, rather, to be aware of the catstropic damage they can do if you're caught unaware.

    But, even i that light, the real fact is that all triangular joins, no matter how small, will have some perfomance impact compared to true set based operations that touch each row only once...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I can no longer contain myself. I must say that I do enjoy seeing the determination of each side in proving their correctness.

    Now if you would, pardon me whilst I sit back with my popcorn and watch the Jr. Heavyweight Championship of the SQL World unfold....

    LOL,

    FYI, My bet is on Jeff. 😛

    ______________________________________________________________________

    Personal Motto: Why push the envelope when you can just open it?

    If you follow the direction given HERE[/url] you'll likely increase the number and quality of responses you get to your question.

    Jason L. Selburg

Viewing 15 posts - 31 through 45 (of 255 total)

You must be logged in to reply to this topic. Login to reply