Outer Join Trouble

  • Steve Jones - SSC Editor

    SSC Guru

    Points: 715401

  • David.Poole

    SSC Guru

    Points: 75182

    The behaviour of the outer join is very useful if you get an question such as

    "Give me all the customers who ordered in December who didn't order in November"

    You can do an INNER JOIN on the orders table to get the December figures and a LEFT JOIN on the ORDERS table and include the condition order_date BETWEEN '2005-11-01' AND '2005-11-30' in the join.

    Although this way of doing things makes for a very easy to read query it isn't necessarily the most efficient way of achieving the desired result.

    I found that in most situations it is actually more efficient to perform the LEFT JOIN on a derived table containing customers for November orders

  • Lutz Albers

    SSC Rookie

    Points: 46

    The behaviour posted in completly explainable. SQL-Server is working according to the SQL-Standards.

    What most people don't know is that SQL "boolean" logic is actually a tripartite logic (a logic with three states) governed (simplified) by these additional rules:

    • state: true, false, undefined
    • a comparsion (==,<>,>,< ) with NULL is undefined (this is the reason for the IS NULL operator)
    • A and undefined is undefined
    • A or undefined is A
    • not undefined is undefined

    A SELECT will deliver all rows where the WHERE clause evaluates to true.

    A LEFT OUTER JOIN will match all rows from table A with all rows from table B for which the ON clause evaluatues to true. If there is no such match then it will deliver row A once with all SELECTed fields from B set to NULL.

    If you apply this to the last example you can explain it:

    select *

    from a

    left outer join b

    on a.id = b.id

    where b.id <> 8

    This will deliver the following result before applying the WHERE clause:

    id          id         

    ----------- ----------- 

    1           NULL

    2           2

    3           NULL

    4           4

    5           NULL

    6           6

    7           NULL

    8           8

    The only rows where the WHERE clause evaluates to true are 2,4,6. Row 8 evaluates to false, all other to undefined.

    select *

    from a

    left outer join b

    on a.id = b.id

      and b.id <> 8

    id          id         

    ----------- -----------

    1           NULL

    2           2

    3           NULL

    4           4

    5           NULL

    6           6

    7           NULL

    8           NULL

    This examples select all rows from A with rows from B which matching ID unequal to 8 (2,4,6) and selects one row from A with a NULL row from B.

    One must be very careful when doing logic with NULL values.

  • Scott D. Smith

    SSC Veteran

    Points: 278

    I confess that I find the note by Lutz Albers much more comprehensible than the article about outer joins itself. I have found that doing everything inside of a stored procedure that begins like this ALWAYS solves these kinds of problems:

    SET QUOTED_IDENTIFIER ON

    GO

    SET ANSI_NULLS ON

    GO

    The reason it works? Because setting ANSI NULLS ON gets rid of the three-way logic (true, false, and null). I have read numerous articles telling me that setting ANSI NULLS ON is totaly wrong. To this, I respectfully disagree: I program in the real world and don't have time for debates about set theory logic. Turning them off gets rid of these mysterious problems and that's good enough for me.

  • Bob Monahon

    Ten Centuries

    Points: 1282

    Hello,

    Both the article and the comment by Lutz Albers were enlightening.

    When I ran into this problem, I would solve it like this:

    SELECT *

    FROM a

    LEFT OUTER JOIN

      (SELECT * FROM b WHERE b.id <> 8) AS bx

    ON a.id = bx.id

    The above code clearly specifies (to me, anyway!) that excluding the value 8 should be applied to table b first, before the outer join is performed.

    Thanks for the additional insight into this problem.


    Regards,

    Bob Monahon

  • jeff451

    SSC-Addicted

    Points: 496

    I agree with the previous posts. The article and the comment from Lutz Albers certainly helped explain an issue I have dealt with many times.

    Here is my quick and easy way of remembering how the where clause affects an outer join -

    If the where clause contains a reference to the table from the outer join, then the join is in effect an inner join.

     

    That may not be a great technical explanation but it works for me.


    Jeff451
    SQL Guru wannabe

  • BobAtDBS

    SSCarpal Tunnel

    Points: 4582

    Steve made one very good point in his article, which you don't want to lose as this thread drifts away from joins to nulls.  The article was not about how boolean logic works with nulls, (which WAS explained quite nicely by Lutz).

    A Where clause in an outer join is a filter, applied after the join takes place.

    After that sinks in, all the other logic makes complete sense.  It also explains why you sometimes see SQL Server not use an index where is "seems like it should" due to a field being mentioned in a where clause.

     


    Student of SQL and Golf, Master of Neither

  • Canute Magalhaes

    SSC Rookie

    Points: 34

    Hi Steve,

    to your question on "how can you explain that those rows are removed? One other thing to notice is that this query matches up using an inner join (the first plan). " in your second example  where b.id <> 8. I found a good article http://www.databasejournal.com/features/mssql/article.php/3399931 that says that " if your database or connection ANSI_NULL setting is set ON. An ANSI_NULL setting of ON is normally the default settings for databases and connections. When ANSI_NULL setting is set to ON, the ANSI SQL-92 standard states that any equal (=) or non equal (<&gt statement must equate to FALSE when compared against a NULL."

    that could possible explain the reason for why the null <> 8 returns false.

    Regards Canute.

  • Martin Vrieze

    SSCrazy

    Points: 2760

    Steve and all, nicely done!!!

    I find the use of the additional "filter" in the ON clause facinating...

    "...left outer join orders o

    on c.customerid = o.customerid

    and datepart(m, o.orderplacedate) = 10..."

    I use A LOT of left outer joins in my work and this is the first time I've seen this type of qualifier embedded in the table join qualifier.

     

  • Antares686

    SSC Guru

    Points: 125444

    Keep in mind the query is run thru the interpreter before it actually executes and many times the query is reformulated to the overall condition.

    In you last examples the interpreter sees the fact the right table is referenced in the where clause despite it being a left join. Since the net condition is an inner join, that also would provide the fastest execution plan for the overall query, the query is optimized by the interpreter and actually executed as an inner join.

    If you run

    select *

     from a

     inner join b

      on a.id = b.id

     where b.id != 8

    you will find you get exactly the same execution plan as

    select *

     from a

     left outer join b

      on a.id = b.id

     where b.id != 8

  • Antares686

    SSC Guru

    Points: 125444

    Actually it is inacturate to state a Where clause in an outer join is a filter applied after the join takes place. Try this on the last example:

    select *

     from a

     left outer join b

      on a.id = b.id

        and b.id <> 8

    where a.[id] in (1,2,3)

    You will find the condition is applied to the A before the join actually takes place. The query optimizer ways multiple factors to determine best access method and will reformulate the query in the background (conceptually) when it determines this based on the costing factors. In the above example it more or less runs like this.

    SELECT

     *

    FROM

     (

      SELECT

       *

      FROM

       a

      WHERE

       a.[id] = 1 OR

       a.[id] = 2 OR

       a.[id] = 3

    &nbsp A

    LEFT OUTER JOIN

     (

      SELECT

       *

      FROM

       b

      WHERE

       b.[id] <> 8

    &nbsp B

    ON

     A.[ID] = B.[ID]

    Which if you look at will give exactly the same execution plan as my first query, the first is just far simpler to read.

    The query optimizer in the interpreter does a lot of work trying to provide fastest access to your data.

  • Stephen Hirsch

    SSCommitted

    Points: 1822

    I thought that null = null always evaluates to false. It's news to me that null in any boolean expression would ever evaluate to a value other than false. Who knew?

    I come from the Oracle world, I'm peeking in on SS. Can you write an outer join like this in SS?

    select a.field, b.field

    from table1 a, table2 b

    where a.id = b.id (+)

    That will return all of the a.field values, with either the b.field value if a match can be found for a.id in b, or null otherwise.

    Thanks,

    Steve

  • john kelly-275143

    SSC Enthusiast

    Points: 113

    This really is a non-issue and should not be passed off as an article or issue relavent to professional DBA, developers, etc.  It is not appropriate for this forum.  If you search the BOL you will see that this topic is covered tersely when it states that qualifiers in the join sytax are applied before the tables are joined, qualifiers in the WHERE clause are applied after the join to the result set.  Cant be more clear than that.  To suggest that this might confuse experienced DBAs is a joke.  Its very second nature.  Perhaps your target audience is newbies?

  • Yelena Varshal

    SSC-Dedicated

    Points: 34207

    Talking about NULLs: this something I do EVERY time I am in a hurry and then wonder where are my rows for 30 seconds before realizing that I probably should use a correct sysntax.

    Suppose you need to select only customers who did not place orders and you do it by adding a line to the query  in Steve's example:

    where  o.orderid = null

    Then the query will return different results based on ANSI_NULLS settings. SET ANSI_NULLS ON and then running the query will return zero (0) rows.  SET ANSI_NULLS OFF and then running the query will return all customers who did not place orders.

    Now, let's re-write the condition correctly from the point of view of using NULLs:

    where o.orderid IS NULL

    In this case the query will correctly return all customers who did not place orders.

    Moral: do not write code when you have to run and use IS NULL instead of =NULL

    Regards,Yelena Varsha

  • David Jackson

    SSCertifiable

    Points: 6404

    Good article, I always enjoy arguing with oracle dbas on this, and use the example in the BOL as evidence on why joining in the where clause is just wrong.

    My rule of thumb on this is never refer to a left outer joined table in the where clause.  Unless of course you are doing an anti-join ...

    Dave J

     


    http://glossopian.co.uk/
    "I don't know what I don't know."

Viewing 15 posts - 1 through 15 (of 28 total)

You must be logged in to reply to this topic. Login to reply