Understanding Outer Joins in SQL

,

Most of the time developers that need to get data will use a SELECT statement to retrieve the data. The query that a developer will use often will use a join between tables. In this article, we will look at how an outer join is different from an inner join and what this means for your queries.

The Difference Between Inner and Outer Joins

If your data is in multiple tables, you will often will use a join between the tables in a form like this:

SELECT
 u.DisplayName
 , b.Name AS BadgeName
 FROM dbo.Users AS u
 INNER JOIN dbo.Badges AS b 
 ON u.Id = b.Id;
 GO

In this case, we are looking at the StackOverflow database (which I downloaded from links that Brent Ozar has on his site), and getting a list of users and their badges. This means that for each user that has earned a badge, I get a row back with the user DisplayName and the BadgeName. This is an inner join, and is an intersection of the data in the tables. This is shown in the image below.

Venn diagram of intersection

In this diagram, we see the area where the circles overlap, with the values of John, Mary, Teacher, and Stellar Question, is an intersection. This is the inner join, where items in both sets are included in the results.

An outer join will include all data from one set. For example, if I recolor the diagram slightly, we see that a left outer join would be the area below in yellow. All the values from the Users circle, plus the values from Badges circle the overlap, are included.

left outer join venn diagram

The left in this case refers to the actual item on the left. A right outer join would include the circle on the right, with the overlapped area in the middle. In T-SQL code, we would see the left outer join expressed in our code like this:

SELECT
 u.DisplayName
 , b.Name AS BadgeName
 FROM dbo.Users AS u
 LEFT OUTER JOIN dbo.Badges AS b 
 ON u.Id = b.Id;
 GO

In this case, we get all the users with their matching badges. However, if there is a user without a badge, we get their user DisplayName and a NULL for the badge.

How To Write These Joins

One of the problems many developers have with this construct is they confuse the way to build these joins. They often confuse the structure and order of execution of the outer join. This isn't that different from an inner join, but we often think about it differently.

A normal, or inner join, is the default. If you just include the JOIN keyword, the default is an inner join. For example, this produces just the same results as the query at the top of this article:

SELECT 
u.Id,
 u.DisplayName
 , b.Name AS BadgeName
 FROM dbo.Users AS u
 JOIN dbo.Badges AS b 
 ON u.Id = b.Id
 GO

When we add the OUTER keyword, we need to add a LEFT, RIGHT, or FULL before it. This determines the type of join. In the case of a RIGHT or LEFT OUTER JOIN, the table on the right or left (literally) of the clause will return all its rows. For example, this query returns all rows from the Users table.

SELECT
 u.DisplayName
 , b.Name AS BadgeName
 FROM dbo.Users AS u
 LEFT OUTER JOIN dbo.Badges AS b 
 ON u.Id = b.Id;
 GO

This is because the Users table is on the left. If I write the code differently, this is easy to see.

SELECT
 u.DisplayName
 , b.Name AS BadgeName
 FROM dbo.Users AS u LEFT OUTER JOIN dbo.Badges AS b 
 ON u.Id = b.Id;
 GO

If we change to a RIGHT OUTER JOIN, then we would write this, and get all rows in the Badges table.

SELECT
 u.DisplayName
 , b.Name AS BadgeName
 FROM dbo.Users AS u RIGHT OUTER JOIN dbo.Badges AS b 
 ON u.Id = b.Id;
 GO

There isn't much of a difference between these two, other than you must look to the right or left of the clause to determine which table is returning all its rows. For the values that don't have a match in the ON clause, NULL is returned for the other table. An example is shown here, where some of the users do not have a badge.

NULL returned for Badges table

If we scroll further down, you will see there are rows with matching values between the tables that include a BadgeName, and rows that don't, with NULL.

Results of inner and outer joins

A FULL OUTER JOIN includes all rows from both tables, but with the NULL Values for each side when the rows don't match. A quick repro of this is shown below with this code:

CREATE TABLE dbo.LeftTable
(Id INT NOT NULL, UserName VARCHAR(100) NOT NULL)
GO
INSERT dbo.LeftTable (Id, UserName) 
 VALUES (1, 'Bob'), (2, 'Bill'), (3, 'Susan')
GO
CREATE TABLE dbo.RightTable
(Id INT NOT NULL, BadgeName VARCHAR(100) NOT NULL)
GO
INSERT dbo.RightTable (Id, BadgeName) VALUES (2, 'Teacher'), (3, 'Scholar'), (4, 'Master')
GO
SELECT *
 FROM dbo.LeftTable AS lt
 FULL OUTER JOIN dbo.RightTable AS rt
 ON lt.Id = rt.Id

And the results of the query are shown here:

Full outer join results with nulls

There are NULL Values from both tables where there are not matching rows.

A Gotcha

One of the common problems people have with outer joins is that they expect to qualify or filter rows in the ON clause. This isn't the place for filtering. Instead, this is the place where we determine the matches and whether or not to show nulls for columns being selected. I have often found junior developers writing a query like this, with the expectation that they will find values that have nulls.

SELECT *
 FROM dbo.LeftTable AS lt
 LEFT OUTER JOIN dbo.RightTable AS rt
 ON lt.Id = rt.Id
 AND rt.Id IS NULL

I have used my smaller example, as the data is easier to understand. What results would you expect here? Just row 1? Instead, here is what is returned:

outer join with all results from left table

This is what we get with the regular left join. What most people expect with the above query is just the row with Id = 1.

Left join with on clause just on ID column

Why is this? It's a quirk that we get all the rows from the left table, but no matches on the right table. After all, no rows in the Badges table have NULL. As a result, we still get all the rows from the left table, with NULLs for the non-matching rows on the right, which is all the rows.

What we want to do is filter our results, which is done in the WHERE clause. We should write the code like this:

SELECT *
 FROM dbo.LeftTable AS lt
 LEFT OUTER JOIN dbo.RightTable AS rt
 ON lt.Id = rt.Id
 WHERE rt.Id IS NULL

This gives me the results I expect, which is the Users that do not have any badges.

Results of users without badges

Summary

Outer joins are a way of returning all data from a table along with matching rows from another table. This is often used to see all customers, along with their orders, but ensuring I see customers without orders. This is also common when we want to see all classes with students, but also the classes without any students.

A common mistake is putting filtering in the ON clause instead of the WHERE clause. These are not interchangeable, and execute at different times, so be sure that if you are trying to filter rows from a result set, you do so in the proper place, the WHERE clause.

One final note on performance. While outer joins can be very helpful and handy, they do come with a performance penalty of sorts. Since all rows must be read from one table, this means a complete scan of the table. While reducing the columns returned to those in a non-clustered index can help, this is often an expensive operation, especially on large tables. Use these queries judiciously in your application.

Rate

3.85 (13)

Share

Share

Rate

3.85 (13)