SQLServerCentral Article

A Refresher on Joins

,

In this article we will look at JOINS. We will primarily focus towards beginners, but this may be a refresher for the experienced. We will see how each type of JOIN works. Specifically we will discuss these: INNER JOIN, LEFT OUTER JOIN, RIGHT OUTER JOIN, FULL OUTER JOIN , SELF JOIN and then wind up with CROSS JOINS.

Note there are so many other terminologies used, like equi join, non-equi join, inner join etc, but there are only the 6 types of joins as mentioned above.

Let's start by creating the tables/data required to understand JOINS.

Step - 1

Use the following script to create the table and data. You can just cut and paste this code into Query Analyzer and execute it.

drop table t1,t2,t3
create table t1(f1 int)
create table t2(f1 int)
create table t3(f1 int)
--===========================
set nocount on
declare @j int
set @j = 1
while @j <= 10
begin
insert into t1 values(@j)
set @j = @j + 1
end
--===========================
set nocount on
declare @k int
set @k = 6
while @k <= 15
begin
insert into t2 values(@k)
set @k = @k + 1
end
--===========================
insert into t3 values(6)
insert into t3 values(2)
insert into t3 values(3)
insert into t3 values(12)
insert into t3 values(13)
--===========================

You will now have data like this in the three tables. Please note that the empty spaces in rows are only to show a pictorial mapping. There are only 10 records in t1, 10 records in t2 and 5 records in t3 (no empty rows or NULL values in any field)

t1t2t3
f1f1f1
------
1
22
33
4
5
666
77
88
99
1010
11
1212
1313
14
15

Step - 2

Try these queries

select * from
t1 JOIN t2
on t1.f1 = t2.f1
select * from
t1 INNER JOIN t2
on t1.f1 = t2.f1
select * from
t1,t2
where
t1.f1 = t2.f1

All the three queries above are essentially the same. The last one is referred to as the equi-join. What if you want all rows in t1 that are not in t2 (non-equi joins ?). Will this query work?

select * from
t1,t2
where
t1.f1 <> t2.f1

It did not quite do what you expected, huh?

Here is one way to achieve this:

select * from t1
where t1.f1 not in (select t2.f1 from t2)

Step - 3 - LEFT OUTER JOIN

Now suppose you want all the records from t1, whether they have a corresponding entry in t2 or not. Why will we ever want to do this? A simple situation could be that, you want a list of all the departments (t1) whether there are employees or not in that department (t2). For this example assume t1.f2 holds department ids, and t2.f1 as department ids fk'd into t1.f1. Here is the query

select t1.f1, t2.f1
from t1 LEFT OUTER JOIN t2
on (t1.f1 = t2.f1)

This is exactly similar in functionality to the above.

select t1.f1, t2.f1
from t1 LEFT JOIN t2
on (t1.f1 = t2.f1)
t1.f1   t2.f1
=====   =====
1       NULL
2       NULL
3       NULL
4       NULL
5       NULL
6       6
7       7
8       8
9       9
10      10

Step - 4 - RIGHT OUTER JOIN

Ok, now the reverse, assume we want all the employees, whether or not they have been assigned to any departments. Both the queries below are similar again

select t1.f1, t2.f1
from t1 RIGHT OUTER JOIN t2
on (t1.f1 = t2.f1)
select t1.f1, t2.f1
from t1 RIGHT JOIN t2
on (t1.f1 = t2.f1)
t1.f1 t2.f1
===== =====
6     6
7     7
8     8
9     9
10    10
NULL  11
NULL  12
NULL  13
NULL  14
NULL  15

Step - 5 - FULL OUTER JOIN

Ah, what if you now want a list of all rows in t1 (whether or not there are rows in t2) and all the rows in t2 (whether or not there are rows in t1)

select t1.f1, t2.f1

from t1 FULL OUTER JOIN t2

on (t1.f1 = t2.f1)

order by t1.f1

select t1.f1, t2.f1

from t1 FULL JOIN t2

on (t1.f1 = t2.f1)

order by t1.f1

t1.f1 t2.f1

===== =====

NULL 11

NULL 12

NULL 13

NULL 14

NULL 15

1 NULL

2 NULL

3 NULL

4 NULL

5 NULL

6 6

7 7

8 8

9 9

10 10

Step - 6

JOINS make much sense between two tables, but can obviously be extended to

more than two tables. However, just to demonstrate, as to how

complicated/confusing they can become, here is a FULL OUTER JOIN example

between t1, t2 and t3.

select a.f1, b.f1, c.f1
from t1 a
FULL OUTER JOIN t2 b on (a.f1 = b.f1)
FULL OUTER JOIN t3 c on (a.f1 = c.f1)
t1.f1 t2.f1 t3.f1
===== ===== =====
6     6     6
7     7     NULL
8     8     NULL
9     9     NULL
10    10    NULL
NULL  11    NULL
NULL  12    NULL
NULL  13    NULL
NULL  14    NULL
NULL  15    NULL
5     NULL  NULL
4     NULL  NULL
3     NULL  3
2     NULL  2
1     NULL  NULL
NULL  NULL  13
NULL  NULL  12

Observe the query and output carefully...and then see if you can get the following output, which you will agree makes more sense.

t1.f1 t2.f1 t3.f1
===== ===== =====
6     6     6
7     7     NULL
8     8     NULL
9     9     NULL
10    10    NULL
NULL  11    NULL
NULL  12    12
NULL  13    13
NULL  14    NULL
NULL  15    NULL
5     NULL  NULL
4     NULL  NULL
3     NULL  3
2     NULL  2
1     NULL  NULL

That's it for OUTER JOINS for now.

Step - 7 - SELF JOINS

When the same table is used in a query with two different aliases, such a join is referred to as a self-join. Let us see this case with an example. Create the following table and data.

create table jk_Workers(Worker int, Manager int)
insert into jk_Workers values(111,NULL)
insert into jk_Workers values(222,111)
insert into jk_Workers values(333,222)
insert into jk_Workers values(444,222)
insert into jk_Workers values(555,222)
insert into jk_Workers values(666,111)
insert into jk_Workers values(777,111)
insert into jk_Workers values(888,333)
insert into jk_Workers values(999,222)

All the workers with their respective managers are stored in this table. And if we wish to list out the managers and the workers working under them, we could use a query similar to the following.

select b.Manager,b.Worker
from jk_Workers a, jk_Workers b
where a.Worker = b.Manager
or b.manager is null and a.Manager is null
order by b.Manager
Manager Worker
NULL    111
111     222
111     666
111     777
222     999
222     333
222     444
222     555
333     888

Step - 8 - CROSS JOINS

A cross join is referred to as a Cartesian product, which means, for each element in set-A pick all the values from set-B

select t1.f1, t2.f1
from t1 cross join t2
select t1.f1, t2.f1
from t1,t2

Both the above queries are same. The output of this query will be 100 rows. This is because, for each row in t1, all the rows in t2 would be matched. Unless you know what you are doing, you got to be careful in building your queries, to ensure you are not working on Cartesian products.

I am not sure of a real business need for a Cartesian product (I can be wrong as ever), but I know mathematically it is valid to have the same y values for different values of x (m=0), basically a horizontal line on the Cartesian plane.

Joins could be as complicated/confusing in real world scenarios, but if your fundamental understanding of how they behave is clear, you should be able to handle any situation.

Thankz for reading!

Rate

4.4 (5)

You rated this post out of 5. Change rating

Share

Share

Rate

4.4 (5)

You rated this post out of 5. Change rating