SQLServerCentral Article

Outer Join Trouble

,

Recently someone sent me a note about an issue with an outer join. I'd run into a similar situation before and thought that I'm sure others have as well, especially as more and more people move to outer ANSI style joins.

For those of you still writing queries like this:

select c.customerid, o.orderid
 from orders o, customers c
 where o.customerid = c.customerid

you probably won't have this same type of problem, but those styles of queries are difficult to read, especially as you get 4 or 5 tables in there and implement any type of outer join.

The recommended way of writing a query like this is as follows:

select c.customerid, o.orderid
 from orders o
inner join customers c
on o.customerid = c.customerid

Note that the type of join is specified and there is a new clause, the ON clause, that defines how the two tables are joined together. This clause, however, isn't a WHERE clause, although it is a qualifier. In other words, it qualifies, or limits which rows will be returned. It is a separate clause, however, and the its impact on the query is very apparent when an outer join is involved.

Let's look at a simple example. Here's some DDL and DML to build a few small tables for this sample.

create table Customers
(CustomerID int identity(1,1)
, CustomerName varchar(50)
)
go
create table Orders
(OrderID int identity(1,1)
, CustomerID int
, OrderPlaceDate datetime
, OrderShipDate datetime
)
go
insert Customers select 'Brian'
insert Customers select 'Andy'
insert Customers select 'Steve'
insert Customers select 'Delaney'
insert Orders select 1, '9-1-05', '9-4-05'
insert Orders select 1, '10-1-05', '10-7-05'
insert Orders select 1, '11-1-05', '11-24-05'
insert Orders select 2, '9-11-05', '9-15-05'
insert Orders select 2, '9-3-05', '9-24-05'
insert Orders select 2, '9-30-05', '10-9-05'
insert Orders select 3, '10-1-05', '10-7-05'

This sets up a small series of customers and orders. Three of the four customers have orders across a few months of this past year. Now suppose that the sales manager wants a report of the customers and which ones did and did not place orders from October of last year. There were only 2 orders places in October and getting a list of customers along with orders is easy, right?

Let's try this one:

select c.customerid
, c.customername
, o.orderid
, o.orderplacedate
 from customers c
inner join orders o
on c.customerid = o.customerid
 where datepart(m, o.orderplacedate) = 10

This obviously gets you the two orders from October, one from Steve and one from Brian as shown below.

customerid  customername               orderid     orderplacedate                   
----------- -------------------------- ----------- ---------------------------- 
1           Brian                      2           2005-10-01 00:00:00.000
3           Steve                      7           2005-10-01 00:00:00.000
(2 row(s) affected)

This does show the two orders, but what about all the other customers? The sales manager wanted to know who did and did not place orders. In other words, it's a perfect fit for an outer join, with customers being the outer table. So we modify our join to look like this:

select c.customerid
, c.customername
, o.orderid
, o.orderplacedate
 from customers c
left outer join orders o
on c.customerid = o.customerid
 where datepart(m, o.orderplacedate) = 10

and we run this, all ready to save the results into an Excel file and email it along. Our results look like this:

customerid  customername               orderid     orderplacedate                   
----------- -------------------------- ----------- ---------------------------- 
1           Brian                      2           2005-10-01 00:00:00.000
3           Steve                      7           2005-10-01 00:00:00.000
(2 row(s) affected)

Huh?

We did specify an outer join, right? Let's double check. Yep, sure enough, look up the page and it's definitely an outer join listed there. And no, I didn't cut and paste the wrong data. So what has happened?

Before we examine this in detail, let's change the query slightly and run it. What if we run this query instead? We've moved the qualification for October from the WHERE clause to the ON clause in the join syntax.

select c.customerid
, c.customername
, o.orderid
, o.orderplacedate
 from customers c
left outer join orders o
on c.customerid = o.customerid
and datepart(m, o.orderplacedate) = 10

This will return the following data:

customerid  customername             orderid     orderplacedate                                         
----------- ------------------------ ----------- ----------------------------- 
1           Brian                    2           2005-10-01 00:00:00.000
2           Andy                     NULL        NULL
3           Steve                    7           2005-10-01 00:00:00.000
4           Delaney                  NULL        NULL

Now notice that we actually see all 4 customers, which is what we expected. The two customers, Andy and Delaney, that did not place an order in October, now show up with a NULL value for the order table information. In a real report, we could easily use ISNULL() in the SELECT list to better format the information.

In an outer join, the WHERE clause actually acts as a filter to the result set. We can see that in the execution plan below. Notice that there is a filter condition applied to the result set before it is returned, limiting results to those rows that have a match with October.

This query plan was from the first query. Now the second query, using the ON clause to limit results to October, shows up a little different.

In this second query plan, again, there is a filter for the OrderPlaceDate month to be limited to October. However in this case, it's applied as the rows are pulled from the Orders table. This means that only two rows are used in the Nested loop join. However, since this is an outer join, all the Customers rows pass through along with the 2 matches from Orders.

In the first query, the outer join is performed and if we could see the intermediate results, we'd see the same 4 rows: 2 with matches, 2 without. However, after the result set is built, the filter is applied last, which removes the two rows that have NULL in the OrderPlaceDate column.

The results are even more strange when a not equals (<>) clause is used. Here is a short example that was actually sent to me by David Poole. It shows two basic tables and an outer join between them:

create table A
( id int
)
go
create table B
( id int
)
go
insert A select 1
insert A select 2
insert A select 3
insert A select 4
insert A select 5
insert A select 6
insert A select 7
insert A select 8
insert b select 2
insert b select 4
insert b select 6
insert b select 8
select *
 from a
left outer join b
on a.id = b.id
 where b.id <> 8
id          id          
----------- ----------- 
2           2
4           4
6           6
(3 row(s) affected)
select *
 from a
left outer join b
on a.id = b.id
  and b.id <> 8
id          id          
----------- ----------- 
1           NULL
2           2
3           NULL
4           4
5           NULL
6           6
7           NULL
8           NULL
(8 row(s) affected)

The same thing is happening here, although the execution plans look incredibly similar. I am guessing here as to what happens because the execution plans do not look any different.

My theory here, which seems to bear out in practice, is that again in the first query (using the WHERE clause), the outer join takes place correctly, but then the WHERE clause is applied. Since the NULLs are not known to be unequal to 8, they are removed. I'm not sure I like that logic, but it seems to be what happens. Otherwise, how can you explain that those rows are removed? One other thing to notice is that this query matches up using an inner join (the first plan).

In the case of the second query (the ON clause), the outer join is applied with all rows that are not 8 from table B. That is rows 2, 4, and 6 and these are correctly outer joined to Table A, resulting in the complete list of items from Table A.

Conclusions

I have seen this behavior stump more than a few DBAs, myself included at various times. Especially when you're in a hurry and trying to answer some strange question that lends itself to an outer join.

It doesn't seem to be that hard to catch in this example, but remember that this is a very small data set. If we were dealing with hundreds of customers, then we might not notice that we'd missed some data. Especially if there were few rows that did not have matches. Which might mean that the end user would make a decision based on incomplete information.

Writing outer joins is a skill in itself. They are not quite the same as an inner join and the logic needed to build the query is different enough that it is worth practicing a bit. I hope this article makes sense and helps you to be wary of how you build those outer joins.

Rate

5 (6)

You rated this post out of 5. Change rating

Share

Share

Rate

5 (6)

You rated this post out of 5. Change rating