Fun with Outer Joins

  • Oh I understand what the author is doing and it's a good article.

    But when it takes a less time, is probably more efficient and in my opinion is easier to visually understand, then I'll go for my suggestion. That would be my driving force. I'd change it if I found the author's method was more efficient.

  • First, it is actually not true that an expression containing NULL evaluates to FALSE.

    It evaluates to UNKNOWN.

    A WHERE clause filters out all UKNOWN values, and that is why you get this behaviour in your first query.

    But not all SQL constructs filter out unknown values so it will give confusion to people if you tell them that expressions with NULL in them evaluate to FALSE.

    If I had to teach your students, I would also tell them to never add any condition into the ON clause which is not a direct join relation between the tables (i.e. table1.column1 = table2.column2). This helps them understand the proper purposes of the ON versus the WHERE clauses. (In your final "correct" query you added such a condition).

    Rather have them understand why the WHERE clause does this (filtering UNKNOWN values).

    Otherwise I recommend the book from Itzik Ben-Gan "T-SQL Fundamentals" which covers the above.)

    (Interestingly, he writes in that book that it is the most difficult task to write about the "simple" basics - and it is ...)

    Second, I would add that there is another confusion which specifically arises with LEFT OUTER joins, which I believe is even more common with SQL beginners:

    If you, in your example, simply wanted to get a list of professors, and also list if they had any class at all, you "naturally" would think "we'll use a left outer join, this will return a list of all professors, and it will list the class he has and if he has no class it will be null on the right side".

    And then they are confused as to why it does not return a single line with each professors, but there are several rows for a single professor (if he has more than one class).

    So the description of a left outer join as "it returns all rows of the left-hand table, even if there is no row in the right-hand table" is very misleading - it can be understood to imply it would return exactly the rows of the left-hand table, plus data from a matching row if there is one.

    But that is not what it does.

    In fact, what it does is: join all matching rows from both tables (inner join), resulting in potentially many "duplicate" rows from the left-hand table, and then add a single row for all left-hand table rows which have no matching equivalent in the right-hand table.

  • Mr. fisher you are correct. You can't eliminate the WHERE all the time. If I gave that impression then I apologize for a misstatement.

    I'm not against any clause in SQL, so avoiding any of them is not a thing for me. Des the query return the correct data and use resources as wisely as possible? Then it's good. If part of that turns out that there is no WHERE then fine.

    I find that pre-filtering using things like CTEs to be helpful but there are still folks, including me, that have SQL 2000 instances.

    ATBCharles Kincaid

  • Itzik Ben-Gan saves the day once again!

    I was bothered that the join clause for the right table of a left outer join was being ignored. Little did I know...

    There are three logical steps to a join:

    1. cartesian product

    2. ON filter

    3. Add Outer rows

    The physical steps must produce the output that the logical steps would produce. Specifically, the right table filter on the left outer join was not being ignored in this article...the outer rows were being added back in on step 3.

Viewing 4 posts - 61 through 63 (of 63 total)

You must be logged in to reply to this topic. Login to reply