Multiple Join Behaviour

  • Hi guys,
    I've read 2 or 3 articles on Multiple Joins but I'm still failing to understand exactly what's happening under the hood.

    So, specifically, let's take the below code as an example.


    SELECT *
    FROM TableA A
    INNER JOIN TableB B
    ON A.id = B.id
    INNER JOIN TableC C
    ON C.id = A.id

    So the question here is, does TableC join onto TableA's initial state, or is TableC joining onto the result set of TableA and TableB?

    Hope that makes sense.
    Thanks in advance.

  • NikosV - Friday, September 14, 2018 12:04 PM

    Hi guys,
    I've read 2 or 3 articles on Multiple Joins but I'm still failing to understand exactly what's happening under the hood.

    So, specifically, let's take the below code as an example.


    SELECT *
    FROM TableA A
    INNER JOIN TableB B
    ON A.id = B.id
    INNER JOIN TableC C
    ON C.id = A.id

    So the question here is, does TableC join onto TableA's initial state, or is TableC joining onto the result set of TableA and TableB?

    Hope that makes sense.
    Thanks in advance.

    What happens 'under the hood' depends on the SQL Server optimiser. It will make the best decision it can, based on a set of complex algorithms.

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • Table C is joining to Table A's initial state.  But the final result will be affected by both, so I'm not sure that the question isn't semantics at one level. 

  • It's complicated by the fact that there are actually two different (but related) behaviors: logical and physical processing.  Think of it as the logical is how humans think about it and physical is how the computer actually implements it.  I'll only discuss the logical here, because the physical can be very different, although it is required to match the results of logical processing.

    That being said, TableC is joining to the result set of TableA and TableB, but the two will be indistinguishable in most cases

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • drew.allen - Friday, September 14, 2018 12:22 PM

    It's complicated by the fact that there are actually two different (but related) behaviors: logical and physical processing.  Think of it as the logical is how humans think about it and physical is how the computer actually implements it.  I'll only discuss the logical here, because the physical can be very different, although it is required to match the results of logical processing.

    That being said, TableC is joining to the result set of TableA and TableB, but the two will be indistinguishable in most cases

    Drew

    Hi Drew, 
    Thanks. 
    One last clarification, you say "but the two will be indistinguishable in most cases". 

    What two and what exactly do you mean?

  • RonKyle - Friday, September 14, 2018 12:14 PM

    Table C is joining to Table A's initial state.  But the final result will be affected by both, so I'm not sure that the question isn't semantics at one level. 

    That's not true at all.  Here is an easy demonstration of that fact.

    DECLARE @TableA TABLE (a INT)


    INSERT @TableA(a)
    VALUES(1), (2), (3)

    DECLARE @TableB TABLE (b INT)


    INSERT @Tableb(b)
    VALUES(2), (3), (4)


    DECLARE @TableC TABLE (c INT)


    INSERT @TableC(C)
    VALUES(1), (2), (3), (4)

    SELECT *
    FROM @TableA a
    FULL OUTER JOIN @TableB
        ON a = b
    FULL OUTER JOIN @TableC
        ON a = c


    SELECT *
    FROM @TableA a
    INNER JOIN @TableB
        ON a = b
    FULL OUTER JOIN @TableC
        ON a = c

    We see from the FULL OUTER JOIN that C will match to A on everything but value 4.  If the results were determined by the initial state of A, then the row for value 1 should not appear for table C when we change the FULL OUTER JOIN to an INNER JOIN, but we see that it does indeed appear, meaning that the join is on the results of the join between tables a and b rather than the initial state of a.

    Drew

    I had to edit a couple of times, because the software really messed with my code formatting.

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • That's not true at all. Here is an easy demonstration of that fact.

    You demonstration doesn't prove anything.  Changing one of the INNER JOINS to an outer join changes the original issue.  The original query is logically the same regardless of the order of the table and joins.  The result would be the same.  How the database engine goes about getting to that result would depend on a lot of things that aren't in the statement as SQL is a declarative language.

  • NikosV - Friday, September 14, 2018 12:35 PM

    drew.allen - Friday, September 14, 2018 12:22 PM

    It's complicated by the fact that there are actually two different (but related) behaviors: logical and physical processing.  Think of it as the logical is how humans think about it and physical is how the computer actually implements it.  I'll only discuss the logical here, because the physical can be very different, although it is required to match the results of logical processing.

    That being said, TableC is joining to the result set of TableA and TableB, but the two will be indistinguishable in most cases

    Drew

    Hi Drew, 
    Thanks. 
    One last clarification, you say "but the two will be indistinguishable in most cases". 

    What two and what exactly do you mean?

    I mean that it will usually be impossible to tell whether Table C is joining to the initial state of A or joining to the results of joining A and B.  The reason for this is the associative law.  Most queries are going to be set up in such a way that the associative law applies, so it doesn't matter what order you evaluate them in.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • I mean that it will usually be impossible to tell whether Table C is joining to the initial state of A or joining to the results of joining A and B. The reason for this is the associative law. Most queries are going to be set up in such a way that the associative law applies, so it doesn't matter what order you evaluate them in.

    That is originally what I said when I said I'm not sure it's semantics as the result is the same.  Declarative languages only care about the result, not how to go about the result.  The original query joins table C to table A.  There is no way to know if the engine has knocked A's rows down based on B's or not.  But the query writer cannot assume that it is automatically a subset.  They have to assume it's joining to the entire table.

  • RonKyle - Friday, September 14, 2018 12:50 PM

    That's not true at all. Here is an easy demonstration of that fact.

    You demonstration doesn't prove anything.  Changing one of the INNER JOINS to an outer join changes the original issue.  The original query is logically the same regardless of the order of the table and joins.  The result would be the same.  How the database engine goes about getting to that result would depend on a lot of things that aren't in the statement as SQL is a declarative language.

    Actually, it does prove something.  The value 1 exists in the initial state of A.  If the C matches on the initial state of A, it will match to the value 1 that exists there and then be filtered out when that value from A is filtered out.  The fact that it's not filtered out proves that it doesn't match to the initial state of A.

    Changing the INNER JOINs to FULL OUTER JOINs does not change the issue.  It only shows what the result set would be like before any filtering.

    The original query is the same, because it obeys the laws of commutivity and associativity.  Mixing INNER and OUTER joins means that the laws of commutivity and associativity no longer apply, which allows us to understand what the actual logical processing is.

    As I said earlier, I was only going to discuss the logical processing, because (a) it's easier to understand, and (b) the physical processing is required to match it.  The logical processing is clear.  How the "database engine goes about getting to that result" is entirely irrelevant to this discussion.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • I should have sad doesn't prove anything relevant to the original post.

  • RonKyle - Friday, September 14, 2018 1:04 PM

    I mean that it will usually be impossible to tell whether Table C is joining to the initial state of A or joining to the results of joining A and B. The reason for this is the associative law. Most queries are going to be set up in such a way that the associative law applies, so it doesn't matter what order you evaluate them in.

    That is originally what I said when I said I'm not sure it's semantics as the result is the same.  Declarative languages only care about the result, not how to go about the result.  The original query joins table C to table A.  There is no way to know if the engine has knocked A's rows down based on B's or not.  But the query writer cannot assume that it is automatically a subset.  They have to assume it's joining to the entire table.

    If you don't believe me, perhaps you'll believe Itzik Ben-Gan. In Logical Query Processing: The FROM Clause and Joins, he writes:

    If you do have table operators involved, they are logically evaluated in written order—from left to right—with the output of one operator becoming the left input of the next. This logical order defines which elements are or aren’t available to any given step.

      In other words, the output of the join between Table A and Table B (and not the entire table as you state) becomes the left input for the join to Table C.  This is why in my example the value 1 is available to match to table C on a FULL OUTER JOIN, but is not available to match on the INNER JOIN.

    You also state that declarative languages only care about the result, not how to go about the result.  This makes it seem that the logical processing is "how to go about the result" when it's not.  Logical processing is how we determine whether the result is correct.  In other words, logical processing DETERMINES the result that the declarative languages care about.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • drew.allen - Monday, September 17, 2018 11:50 AM

    RonKyle - Friday, September 14, 2018 1:04 PM

    I mean that it will usually be impossible to tell whether Table C is joining to the initial state of A or joining to the results of joining A and B. The reason for this is the associative law. Most queries are going to be set up in such a way that the associative law applies, so it doesn't matter what order you evaluate them in.

    That is originally what I said when I said I'm not sure it's semantics as the result is the same.  Declarative languages only care about the result, not how to go about the result.  The original query joins table C to table A.  There is no way to know if the engine has knocked A's rows down based on B's or not.  But the query writer cannot assume that it is automatically a subset.  They have to assume it's joining to the entire table.

    If you don't believe me, perhaps you'll believe Itzik Ben-Gan. In Logical Query Processing: The FROM Clause and Joins, he writes:

    If you do have table operators involved, they are logically evaluated in written order—from left to right—with the output of one operator becoming the left input of the next. This logical order defines which elements are or aren’t available to any given step.

      In other words, the output of the join between Table A and Table B (and not the entire table as you state) becomes the left input for the join to Table C.  This is why in my example the value 1 is available to match to table C on a FULL OUTER JOIN, but is not available to match on the INNER JOIN.

    You also state that declarative languages only care about the result, not how to go about the result.  This makes it seem that the logical processing is "how to go about the result" when it's not.  Logical processing is how we determine whether the result is correct.  In other words, logical processing DETERMINES the result that the declarative languages care about.

    Drew

    The optimizer could easily start with TableC.  As long as it can guarantee that its chosen order of comparisons will yield the correct result, the optimizer could use any order that will minimize I/O (typically).  The query plan SQL shows to us humans is incredibly simplified compared to the actual access plan SQL uses internally.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • ScottPletcher - Monday, September 17, 2018 2:56 PM

    drew.allen - Monday, September 17, 2018 11:50 AM

    RonKyle - Friday, September 14, 2018 1:04 PM

    I mean that it will usually be impossible to tell whether Table C is joining to the initial state of A or joining to the results of joining A and B. The reason for this is the associative law. Most queries are going to be set up in such a way that the associative law applies, so it doesn't matter what order you evaluate them in.

    That is originally what I said when I said I'm not sure it's semantics as the result is the same.  Declarative languages only care about the result, not how to go about the result.  The original query joins table C to table A.  There is no way to know if the engine has knocked A's rows down based on B's or not.  But the query writer cannot assume that it is automatically a subset.  They have to assume it's joining to the entire table.

    If you don't believe me, perhaps you'll believe Itzik Ben-Gan. In Logical Query Processing: The FROM Clause and Joins, he writes:

    If you do have table operators involved, they are logically evaluated in written order—from left to right—with the output of one operator becoming the left input of the next. This logical order defines which elements are or aren’t available to any given step.

      In other words, the output of the join between Table A and Table B (and not the entire table as you state) becomes the left input for the join to Table C.  This is why in my example the value 1 is available to match to table C on a FULL OUTER JOIN, but is not available to match on the INNER JOIN.

    You also state that declarative languages only care about the result, not how to go about the result.  This makes it seem that the logical processing is "how to go about the result" when it's not.  Logical processing is how we determine whether the result is correct.  In other words, logical processing DETERMINES the result that the declarative languages care about.

    Drew

    The optimizer could easily start with TableC.  As long as it can guarantee that its chosen order of comparisons will yield the correct result, the optimizer could use any order that will minimize I/O (typically).  The query plan SQL shows to us humans is incredibly simplified compared to the actual access plan SQL uses internally.

    I've said repeatedly that I am talking about the LOGICAL PROCESSING ORDER.  The optimizer can (and usually does) use a order different from the LOGICAL PROCESSING ORDER, so what the optimizer does has no bearing on the discussion of the LOGICAL PROCESSING ORDER.

    Drew

    PS: The reason that I am concentrating on the LOGICAL PROCESSING ORDER for this discussion, is that what the optimizer does is a complex, "chaotic" system.  By chaotic, I mean that seemingly small changes can have a significant impact on the processing, so it is much too difficult to predict how a particular query will be executed beforehand.  Even afterward, it can be difficult for a noobie, which the OP obviously is, to understand the query plan that was used.  The LOGICAL PROCESSING ORDER, on the other hand, has a clearly delineated path, so it is much easier for noobies to understand.

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • NikosV - Friday, September 14, 2018 12:04 PM

    Hi guys,
    I've read 2 or 3 articles on Multiple Joins but I'm still failing to understand exactly what's happening under the hood.

    So, specifically, let's take the below code as an example.


    SELECT *
    FROM TableA A
    INNER JOIN TableB B
    ON A.id = B.id
    INNER JOIN TableC C
    ON C.id = A.id

    So the question here is, does TableC join onto TableA's initial state, or is TableC joining onto the result set of TableA and TableB?

    Hope that makes sense.
    Thanks in advance.

    Let me give you a little history about the ANSI standards committee. When we added the infixed JOIN notation to the language, the rules we set up were:

    1) JOINs are evaluated left to right, with the expected rules of parentheses changing that ordering.
    2) the ON clauses apply to the JOIN that is nearest to it. The standard actually specifies other options, such as USING and CORRESPONDING. Not many SQL products actually implement the more exotic ones, but they're in the standards
    3) some infixed operations do not require an ON clause, such as the CROSS JOIN.

    Our goal was to be sure that any statement had a defining execution pattern. Otherwise, the standards are full of a wonderful piece of standard speak "effectively", which says the optimizer is free to rearrange the code anyway it wishes, as long as it produces the same result as the standard definition.

    Your example is effectively:

    SELECT *
    FROM ((TableA AS A
         INNER JOIN
        TableB AS B
         ON A.id = B.id)
          INNER JOIN
          TableC AS C
          ON C.id = A.id);

    This is usually not a problem since INNER JOINs associate and commute and generally behave nicely. But if you take such a join and put parentheses around it, then you can use the AS operator and give it a new name. You are now into all kinds of scoping rules that are pretty much like what you had in block structured languages.

    That is, if I make an inner reference, the table name is resolved by working my way out of the nesting. This is why it is always a good practice in SQL to attach an alias or table name to a column name. Without it, you might draw jump up more levels of nesting than you think the column names of the same at different levels.

    I've got a detailed discussion of the full ANSI/ISO standard rules in one of my books. But my advice is always to keep things as simple as possible. Few decades ago, when I was doing software engineering columns in the trade press, I coined the term "lasagna code" for control flow structures that were nested so deep that they were difficult to understand. This was in contrast to our previous design errors called "spaghetti code", which were confusing because of all the GOTOs.

    Please post DDL and follow ANSI/ISO standards when asking for help. 

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply