Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

Outer Join Trouble

By Steve Jones,

Recently someone sent me a note about an issue with an outer join. I'd run into a similar situation before and thought that I'm sure others have as well, especially as more and more people move to outer ANSI style joins.

For those of you still writing queries like this:

select c.customerid, o.orderid
 from orders o, customers c
 where o.customerid = c.customerid
you probably won't have this same type of problem, but those styles of queries are difficult to read, especially as you get 4 or 5 tables in there and implement any type of outer join.

The recommended way of writing a query like this is as follows:

select c.customerid, o.orderid
 from orders o
	inner join customers c
		on o.customerid = c.customerid
Note that the type of join is specified and there is a new clause, the ON clause, that defines how the two tables are joined together. This clause, however, isn't a WHERE clause, although it is a qualifier. In other words, it qualifies, or limits which rows will be returned. It is a separate clause, however, and the its impact on the query is very apparent when an outer join is involved.

Let's look at a simple example. Here's some DDL and DML to build a few small tables for this sample.

create table Customers
(	CustomerID int identity(1,1)
	, CustomerName varchar(50)
)
go
create table Orders
(	OrderID int identity(1,1)
	, CustomerID int
	, OrderPlaceDate datetime
	, OrderShipDate datetime
)
go
insert Customers select 'Brian'
insert Customers select 'Andy'
insert Customers select 'Steve'
insert Customers select 'Delaney'

insert Orders select 1, '9-1-05', '9-4-05'
insert Orders select 1, '10-1-05', '10-7-05'
insert Orders select 1, '11-1-05', '11-24-05'

insert Orders select 2, '9-11-05', '9-15-05'
insert Orders select 2, '9-3-05', '9-24-05'
insert Orders select 2, '9-30-05', '10-9-05'

insert Orders select 3, '10-1-05', '10-7-05'
This sets up a small series of customers and orders. Three of the four customers have orders across a few months of this past year. Now suppose that the sales manager wants a report of the customers and which ones did and did not place orders from October of last year. There were only 2 orders places in October and getting a list of customers along with orders is easy, right?

Let's try this one:

select c.customerid
	, c.customername
	, o.orderid
	, o.orderplacedate
 from customers c
	inner join orders o
		on c.customerid = o.customerid
 where datepart(m, o.orderplacedate) = 10
This obviously gets you the two orders from October, one from Steve and one from Brian as shown below.
customerid  customername               orderid     orderplacedate                   
----------- -------------------------- ----------- ---------------------------- 
1           Brian                      2           2005-10-01 00:00:00.000
3           Steve                      7           2005-10-01 00:00:00.000

(2 row(s) affected)
This does show the two orders, but what about all the other customers? The sales manager wanted to know who did and did not place orders. In other words, it's a perfect fit for an outer join, with customers being the outer table. So we modify our join to look like this:
select c.customerid
	, c.customername
	, o.orderid
	, o.orderplacedate
 from customers c
	left outer join orders o
		on c.customerid = o.customerid
 where datepart(m, o.orderplacedate) = 10
and we run this, all ready to save the results into an Excel file and email it along. Our results look like this:
customerid  customername               orderid     orderplacedate                   
----------- -------------------------- ----------- ---------------------------- 
1           Brian                      2           2005-10-01 00:00:00.000
3           Steve                      7           2005-10-01 00:00:00.000

(2 row(s) affected)
Huh?

We did specify an outer join, right? Let's double check. Yep, sure enough, look up the page and it's definitely an outer join listed there. And no, I didn't cut and paste the wrong data. So what has happened?

Before we examine this in detail, let's change the query slightly and run it. What if we run this query instead? We've moved the qualification for October from the WHERE clause to the ON clause in the join syntax.

select c.customerid
		, c.customername
		, o.orderid
		, o.orderplacedate
	 from customers c
		left outer join orders o
			on c.customerid = o.customerid
			and datepart(m, o.orderplacedate) = 10
This will return the following data:
customerid  customername             orderid     orderplacedate                                         
----------- ------------------------ ----------- ----------------------------- 
1           Brian                    2           2005-10-01 00:00:00.000
2           Andy                     NULL        NULL
3           Steve                    7           2005-10-01 00:00:00.000
4           Delaney                  NULL        NULL

Now notice that we actually see all 4 customers, which is what we expected. The two customers, Andy and Delaney, that did not place an order in October, now show up with a NULL value for the order table information. In a real report, we could easily use ISNULL() in the SELECT list to better format the information.

In an outer join, the WHERE clause actually acts as a filter to the result set. We can see that in the execution plan below. Notice that there is a filter condition applied to the result set before it is returned, limiting results to those rows that have a match with October.

This query plan was from the first query. Now the second query, using the ON clause to limit results to October, shows up a little different.

In this second query plan, again, there is a filter for the OrderPlaceDate month to be limited to October. However in this case, it's applied as the rows are pulled from the Orders table. This means that only two rows are used in the Nested loop join. However, since this is an outer join, all the Customers rows pass through along with the 2 matches from Orders.

In the first query, the outer join is performed and if we could see the intermediate results, we'd see the same 4 rows: 2 with matches, 2 without. However, after the result set is built, the filter is applied last, which removes the two rows that have NULL in the OrderPlaceDate column.

The results are even more strange when a not equals (<>) clause is used. Here is a short example that was actually sent to me by David Poole. It shows two basic tables and an outer join between them:

create table A
( id int
)
go
create table B
( id int
)
go
insert A select 1
insert A select 2
insert A select 3
insert A select 4
insert A select 5
insert A select 6
insert A select 7
insert A select 8
insert b select 2
insert b select 4
insert b select 6
insert b select 8

select *
 from a
	left outer join b
		on a.id = b.id
 where b.id <> 8

id          id          
----------- ----------- 
2           2
4           4
6           6

(3 row(s) affected)

select *
 from a
	left outer join b
		on a.id = b.id
  		and b.id <> 8

id          id          
----------- ----------- 
1           NULL
2           2
3           NULL
4           4
5           NULL
6           6
7           NULL
8           NULL

(8 row(s) affected)

The same thing is happening here, although the execution plans look incredibly similar. I am guessing here as to what happens because the execution plans do not look any different.

My theory here, which seems to bear out in practice, is that again in the first query (using the WHERE clause), the outer join takes place correctly, but then the WHERE clause is applied. Since the NULLs are not known to be unequal to 8, they are removed. I'm not sure I like that logic, but it seems to be what happens. Otherwise, how can you explain that those rows are removed? One other thing to notice is that this query matches up using an inner join (the first plan).

In the case of the second query (the ON clause), the outer join is applied with all rows that are not 8 from table B. That is rows 2, 4, and 6 and these are correctly outer joined to Table A, resulting in the complete list of items from Table A.

Conclusions

I have seen this behavior stump more than a few DBAs, myself included at various times. Especially when you're in a hurry and trying to answer some strange question that lends itself to an outer join.

It doesn't seem to be that hard to catch in this example, but remember that this is a very small data set. If we were dealing with hundreds of customers, then we might not notice that we'd missed some data. Especially if there were few rows that did not have matches. Which might mean that the end user would make a decision based on incomplete information.

Writing outer joins is a skill in itself. They are not quite the same as an inner join and the logic needed to build the query is different enough that it is worth practicing a bit. I hope this article makes sense and helps you to be wary of how you build those outer joins.

Total article views: 18421 | Views in the last 30 days: 10
 
Related Articles
FORUM

Customize SQL Query

Customize

FORUM

Outer Join

Not all records selected from a left outer join

FORUM

Custom order in SELECT

Hi, How could we change the order in SELECT command based on a custom semicolon delimited phrase....

FORUM

Adding a trigger for new Customer Order

I need a trigger for when a new Customer Order Header (table) is added, then add a new record onto a...

FORUM

using outer query column name in subquery while doing order by

using outer query column name in subquery while doing order by

Tags
sql puzzles    
t-sql    
 
Contribute

Join the most active online SQL Server Community

SQL knowledge, delivered daily, free:

Email address:  

You make SSC a better place

As a member of SQLServerCentral, you get free access to loads of fresh content: thousands of articles and SQL scripts, a library of free eBooks, a weekly database news roundup, a great Q & A platform… And it’s our huge, buzzing community of SQL Server Professionals that makes it such a success.

Join us!

Steve Jones
Editor, SQLServerCentral.com

Already a member? Jump in:

Email address:   Password:   Remember me: Forgotten your password?
Steve Jones