Outer join with condition

  • Comments posted to this topic are about the item Outer join with condition

  • Nice QotD and nice explanation..:-)

    Thanks..

  • Very good question. Thanks!

    ~ Lokesh Vij


    Guidelines for quicker answers on T-SQL question[/url]
    Guidelines for answers on Performance questions

    Link to my Blog Post --> www.SQLPathy.com[/url]

    Follow me @Twitter

  • Nice question, thanks.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Known concept. But took some time understand the full qstn.. nice one..

    --
    Dineshbabu
    Desire to learn new things..

  • This was removed by the editor as SPAM

  • Yeah, I'm gonna need someone to draw me a picture for this one. 😉

    ron

    -----
    a haiku...

    NULL is not zero
    NULL is not an empty string
    NULL is the unknown

  • ronmoses (12/27/2012)


    Yeah, I'm gonna need someone to draw me a picture for this one. 😉

    ron

    I'm bad at pictures, so how about a step-by-step description of the query evaluation?

    (Note that this describes the LOGICAL queury evaluation - the physicial processing order may be completely different, as long as the same end result is achieved)

    Query evaluation always starts with the FROM clause, and unless there are parentheses or nested joins, always works left to right. So the first step is the join of service and doc_serv. That's a standard inner join, with this internal table as intermediate result:

    service_id name price doctor_id service_id

    1 serv1 1000 2 1

    2 serv2 500 3 2

    2 serv2 500 1 2

    3 serv3 700 1 3

    3 serv3 700 3 3

    4 serv4 1200 2 4

    The next step is the join with doctors. This is a right outer join, so the right table (doctors) is preserved. If any row from the doctors table would not survive the result of an inner join on the same condition, it will still be there, but with a NULL in all the columns from the left table (the intermediate result above). Let's first look at how an inner join with the same conditions would look:

    service_id name price doctor_id service_id doctor_id name

    2 serv2 500 3 2 3 doc3

    2 serv2 500 1 2 1 doc1

    3 serv3 700 1 3 1 doc1

    3 serv3 700 3 3 3 doc3

    All rows in the intermediate table have a matching row in the doctors table, so no rows are eliminated because of that requirement. But rows in the intermediate table with a price of 1000 or more are lost. And as a result, we only have rows for two of the four doctors. The outer join says that all doctors have to be retained, so rows for the missing doctors are added back in:

    service_id name price doctor_id service_id doctor_id name

    2 serv2 500 3 2 3 doc3

    2 serv2 500 1 2 1 doc1

    3 serv3 700 1 3 1 doc1

    3 serv3 700 3 3 3 doc3

    NULL NULL NULL NULL NULL 2 doc2

    NULL NULL NULL NULL NULL 4 doc4

    This would be the result if the query would ask for SELECT *. There are no WHERE, GROUP BY, or HAVING clauses, so the next step is the SELECT list. This asks for an aggregate. Without a GROUP BY, that means aggregate over the whole result set. The rows are counted and the result (6) is returned.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Hugo Kornelis (12/27/2012)


    I'm bad at pictures, so how about a step-by-step description of the query evaluation?

    Ahh... you know, I actually had the mechanics straight, it was the fine details I screwed up. Somehow in my head I managed to lump the one excluded left record in with the duplicate 1s and 3s, instead of the 2s and 4s where it belonged, if that makes sense. So I subtracted one of the duplicates rather than NULLing the appropriate record. At least I understood the concept. Forest for the trees, etc.

    thanks!

    ron

    -----
    a haiku...

    NULL is not zero
    NULL is not an empty string
    NULL is the unknown

  • A good one -- thanks!

  • Thanks for the question.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • This example shows that the query isn't logically the same as if it would contain filtering in "where" which is executed after joins are made.

    This example shows how difficult it is to work out a join on a piece of paper, instead of firing up SSMS and using copy and paste.

  • A very good question.

    Personally I would avoid writing a condition on one of the joined table expressions in the ON clause, but instead write it as an explicit subquery (ie modify the table-expression concerned). That way there is no scope for anyone mistaking the meaning. I'm tempted to say that it would have been better language design to preclude such conditions in the ON clause, but the time for that argument is long past. I'm sure it would have been better language design to exclude such conditions from the ON clause of the MERGE statement; BoL has a nice big caveat that makes it pretty clear that they should be avoided there.

    Tom

  • I'm with you on that, Tom. I get confused enough by RIGHT joins, never mind a join with conditions in the ON clause that don't directly define the relationship. Here's the query re-structured, first to use a LEFT join, and then the simple move of the price<1000 condition to the sub-query's WHERE clause.--All queries return * instead of just count(*) so we can see what's counted.

    -- Original join layout

    select *--quantity = count(*)

    from @service s

    inner join @doc_serv ds on s.service_id = ds.service_id

    right outer join @doctor d on ds.doctor_id = d.doctor_id and s.price < 1000

    order by d.doctor_id,ds.service_id

    -- Turn it over to use LEFT join to a subquery

    select *--quantity = count(*)

    from @doctor d

    left outer join

    (Select s.service_id, s.price, ds.doctor_id

    from @doc_serv ds

    inner join @service s

    on s.service_id = ds.service_id

    and s.price < 1000

    ) x

    on x.doctor_id = d.doctor_id

    order by d.doctor_id,x.service_id

    -- Move the condition (price < 1000) to WHERE clause of subquerey

    select *--quantity = count(*)

    from @doctor d

    left outer join

    (Select s.service_id, s.price, ds.doctor_id

    from @doc_serv ds

    inner join @service s

    on s.service_id = ds.service_id

    where s.price < 1000

    ) x

    on x.doctor_id = d.doctor_id

    order by d.doctor_id,x.service_id

    I'd prefer the longer code as shown here to the compact right join as it's much easier (at least for me) to understand.

  • Hugo Kornelis (12/27/2012)


    I'm bad at pictures, so how about a step-by-step description of the query evaluation?

    Thanks for the explanation, I should have gone through it step by step just like that instead of trying to work it out in my head.

    john.arnott (12/27/2012)


    I'm with you on that, Tom. I get confused enough by RIGHT joins, never mind a join with conditions in the ON clause that don't directly define the relationship. Here's the query re-structured, first to use a LEFT join, and then the simple move of the price<1000 condition to the sub-query's WHERE clause.

    Much prefer the version with the price condition in a WHERE clause, seems much clearer to me.

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply