Hidden RBAR: Triangular Joins

  • sing4you (12/6/2007)


    That was an excellent article. I look forward to seeing the workaround!

    Thanks.

    Thank you for the both the feedback, and the encouragement!

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Nicely done Jeff! The graphics will really help even neophyte SET-BASED devs to understand what is really going on.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Jeff:

    Great article. I'm looking forward to your next article that includes the solutions to avoiding triangular joins and RBAR!

    --Pete

  • Nice one Jeff - I can see it's been a little while in the making (given all of the handy graphics...)

    You might care to edit your first post in this thread (where the link to the article is supposed to appear). Looks like the link got "et" (southern for "eaten").

    Link is Hidden RBAR: Triangular Joins[/url]

    Not paying attention to the cardinality of a query is definitely an issue. Even the best of set processors will eventually choke on sets with polynomial growth factors.

    And - welcome to the SQL2005 family (at last). I'm curious to see if your view of it changes at all.....

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Matt Miller (12/6/2007)


    .....And - welcome to the SQL2005 family (at last). I'm curious to see if your view of it changes at all.....

    Yes Jeff, come to the dark side! :alien:

    ______________________________________________________________________

    Personal Motto: Why push the envelope when you can just open it?

    If you follow the direction given HERE[/url] you'll likely increase the number and quality of responses you get to your question.

    Jason L. Selburg
  • The row number thing is interesting, butt ...

    As a developer I'm not limited to doing EVEYTHING in SQL. For SQL2K I just count the rows coming into the record set. It's the hammer and nail thing. SQL2005 has the nifty features that let you put the row count into the result set. A possible good reason for the upgrade.

    ATBCharles Kincaid

  • Good article Jeff and it's nice to raise awareness about the issues here.

  • Jeff;

    A good lead-in article... it will be nice to see the solutions you propose, both for SS2K and SS2K5 (I imagine they'll each look quite differently).

    One criticism I have is that you muddy the waters quite a bit between the logical and physical when you imply that "set-based" is orthogonal to "RBAR".

    "Set-based" refers to a logical construct--the programming model. "RBAR" refers to a physical implementation of the solution (at least as you use it in the article). I assert that the first query is, in fact, set-based, because the user is expressing the program as such. The way that the query engine processes it behind the scenes is irrelevant to how it's expressed by the user.

    Note that I'm not saying it's unimportant for the person writing the query to have a grasp of how it will be executed at the physical layer. I just think you need a different term than "set-based" to describe the preferred physical processing model, as the term already has a different, well-understood meaning.

    TroyK

  • "Set-based" refers to a logical construct--the programming model. "RBAR" refers to a physical implementation of the solution (at least as you use it in the article). I assert that the first query is, in fact, set-based, because the user is expressing the program as such. The way that the query engine processes it behind the scenes is irrelevant to how it's expressed by the user.

    Thanks for the feedback, Troy. That's exactly what I'm trying to combat, though... just because it's done without an apparent loop, doesn't mean it's set-based. Like I said in the intro, (to me anyway...), set based means touching each row in a "set" just once whether internally or externally... and without an expressed loop.

    But, I do understand the point you're trying to make.

    The real key to me is that whether or not you want to call triangular joins "set based", or not, doesn't really matter to me so long as folks understand the dangers of such code because of the number of internal rows it can spawn.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Charles Kincaid (12/6/2007)


    The row number thing is interesting, butt ...

    As a developer I'm not limited to doing EVEYTHING in SQL. For SQL2K I just count the rows coming into the record set.

    Heh... yeah... agreed and a GUI would be the better place to count such things for small result sets... if you have a GUI to count them in... There's lots of batch processing that is done to create files that don't come close to using a GUI.

    SQL2005 has the nifty features that let you put the row count into the result set. A possible good reason for the upgrade.

    ROWNUMBER is a great feature... so is the IDENTITY column of a reporting/working table 😉 Since I've not had the good fortune to do much performance testing using SQL Server 2k5, I'll be sure to include some in the forthcoming article... after I install 2k5 this weekend.

    Heh... So, how's that nice new "ComputeRunningBalance" function in SQL Server 2k5? 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks for the feedback, Matt... and thanks for the "missing link" info... I just fixed it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I must say that I am very happy that people can do things like the triangular join - and sad that Jeff may make less people do it. Lost work opportunities for me!! :hehe:

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Jeff Moden (12/6/2007)


    "Set-based" refers to a logical construct--the programming model. "RBAR" refers to a physical implementation of the solution (at least as you use it in the article). I assert that the first query is, in fact, set-based, because the user is expressing the program as such. The way that the query engine processes it behind the scenes is irrelevant to how it's expressed by the user.

    Thanks for the feedback, Troy. That's exactly what I'm trying to combat, though... just because it's done without an apparent loop, doesn't mean it's set-based. Like I said in the intro, (to me anyway...), set based means touching each row in a "set" just once whether internally or externally... and without an expressed loop.

    But, I do understand the point you're trying to make.

    The real key to me is that whether or not you want to call triangular joins "set based", or not, doesn't really matter to me so long as folks understand the dangers of such code because of the number of internal rows it can spawn.

    Yes - it really gets down to the query plan (both execution plan and IO profile). To be pedantic, a cross join is also a set-based operation. True, an operation that you probably don't want to apply to two large tables, but a set-based operation nonetheless.

    Moving on... I know the topic has been done before, but it would be great if in one of your follow-up articles, you could point out how certain types of UDFs can also lead to "RBAR" executions, e.g., when applying a UDF to a column to extend the results of your SELECT.

    TroyK

  • I would disagree with your assertion that set based programming is code that touches rows once or very few times and that the query you specified is not set based. Set based simply means that you declare what you want to return without regard to how it is going to be returned. Otherwise, this procedural code submitted by Christian Buettner would be more set based than your query (assuming the appropriate index) since it only touches each row once:

    -- Declarations

    DECLARE @tblResult TABLE (asset_id char(9), sum_amt decimal(19,3), count_amt int)

    DECLARE @asset_id char(9), @sum_amt decimal(19,3), @count_amt int

    SELECT @asset_id = ''-- This is our iteration key

    SET @sum_amt = 0-- Will be incremented for each record

    SET @count_amt = 0-- Will be incremented for each record

    WHILE 1=1 BEGIN-- Loop over all asset_ids

    -- Load the next asset_id and add amount to running total var and increase count for the new record.

    SELECT TOP 1 @asset_id = asset_id, @sum_amt = @sum_amt + cap_issue_amt, @count_amt = @count_amt + 1

    FROM dbo.RRefasset WHERE asset_id > @asset_id ORDER BY asset_id ASC

    IF @@ROWCOUNT = 0 BREAK

    -- Insert new row into temporary results table

    INSERT @tblResult SELECT @asset_id, @sum_amt , @count_amt

    END

    SELECT asset_id, sum_amt, count_amt FROM @tblResult

    Also, by your logic, queries involving nested loop joins against poorly indexed tables would be not set based because a table scan would be involved in each iteration of the loop, whereas the same exact query on a properly indexed table would be set based since it could use a merge join algorithm or at least use the indexes to only get the rows needed by each loop iteration.

    I also disagree with the implication that set based code always equals efficient code and procedural code always equals inefficient code. While set based code has clear advantages in most situations, both because of efficiency and flexibility when the optimizer is determining how to run a query (procedural code "locks" a large portion of the query execution in stone), there are situations - such as for running aggregates - where procedural code has its uses. If procedural code didn't have its uses they would remove it from the language. In this case, with procedural code, such as that above, you can achieve linear algorithmic complexity in terms of the most expensive operation in the query - the page read. The best set based solution - at least that I have ever seen or come up with - that is applicable to all problems of this type, has exponential algorithmic complexity. For small data sets, the cursor and/or loop overhead is going to outweigh its advantages in algorithmic complexity, but this overhead quickly gets overshadowed by the gains as the data set grows.

  • Well - set-based processing entails a bit more than either description. Jeff's taking issue with the fact that the external query has to be evaluated one single row at a time, since the sub-query is a CORRELATED sub-query. The fact that the inner aggregate operation will be run once for each record in the outer query.

    I'd tend to agree that the triangular join is at best a poor application of "set based", since it fully disregards cardinality. Set based entails that you use the smallest possible set, and I don't think triangle joins do that well at all.

    The actual definition of set-based processing right now anything BUT a foregone conclusion, especially when you deal with throw in optimal/best or any of those qualifier denoting best practices. I doubt you would get the SAME definition of what set-based processing means (what makes something set-based versus not) from anyone here.

    While it is certainly true that procedural code sometimes can have the edge, considering that SQL Server is a set engine, 95% or more of the time, something that is TRULY set-based will outperform something that isn't. By miles.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Viewing 15 posts - 16 through 30 (of 255 total)

You must be logged in to reply to this topic. Login to reply