Using Exotic Joins in SQL – Part 2
Chris Cubley, MCSD
www.queryplan.com
In the previous article, you saw how the BETWEEN operator
could be used in joins to solve problems dealing with range-based data. In this article, I will show you how to take
joins even further by using multiple criteria in joins as well as using the
greater than, less than, and not equals operators in joins.
Compound Joins
Compound joins are joins which use multiple criteria
combined with a logical operator such as AND.
This is a relatively simple concept and is commonly
used in database systems that employ compound primary keys.
For a simple example of a database schema in which compound
joins are necessary, consider a school management system where one of the
features is tracking which classes are taught in which
classrooms. The system must match up the
features of the classrooms to the needs of the classes. In order to perform these functions, the
following two tables are defined:
CREATE TABLE tb_Classroom(
BuildingName char(10) NOT
NULL,
RoomNumber int NOT
NULL,
RoomCapacity int NOT
NULL,
HasLabEquip smallint NOT NULL,
CONSTRAINT
PK_Classroom PRIMARY KEY(BuildingName, RoomNumber)
)
CREATE TABLE tb_ClassSection(
CourseID char(5) NOT NULL,
SectionNumber smallint NOT
NULL,
BuildingName char(10) NOT
NULL,
RoomNumber int NOT
NULL,
InstructorID int NOT
NULL,
ScheduleID int NOT
NULL,
SectionCapacity int NOT
NULL,
RequiresLabEquip smallint NOT
NULL,
CONSTRAINT
PK_ClassSection PRIMARY KEY(CourseID, SectionNumber),
CONSTRAINT
FK_ClassSection_Classroom
FOREIGN
KEY(BuildingName, RoomNumber)
REFERENCES
tb_Classroom(BuildingName, RoomNumber)
)
In this example, the tb_Classroom table defines a list of
classrooms in which classes are taught. The tb_ClassSection table contains instances
of various courses taught at the school.
A class section is taught in a particular classroom by
an instructor according to a standard class schedule. Both the tb_Classroom and tb_ClassSection
tables use natural compound primary keys.
One of the reports in the school management system lists the
class sections being taught along with the capacity of
their respective classrooms. In order to
construct this report, the tb_ClassSection table must be joined the tb_Classroom table based upon the compound primary key of
the tb_Classroom table. This can be accomplished by using a compound join to return rows
where both the BuildingName AND RoomNumber columns match.
SELECT
s.CourseID,
s.SectionNumber,
c.RoomCapacity
FROM
tb_ClassSection s
INNER JOIN
tb_Classroom c
ON(
s.BuildingName
= c.BuildingName
AND
s.RoomNumber
= s.RoomNumber
)
This query is relatively straightforward. If you’ve been using
SQL for a while, chances are you’ve seen queries like it. The query is a simple equijoin
that uses the AND logical operator to include multiple criteria. Despite its simplicity, this example provides
the basis for a much more powerful query construction tool.
Joins Using Inequality Comparison Operators
The school management system from the first example also
contains a report listing all class sections in which the classroom is not
large enough to accommodate the maximum number of students for the class
section. To determine which class
sections meet these criteria, the system must compare the class section’s
capacity to the capacity of the classroom in which it is
being taught. If the classroom’s
capacity is less than the class section’s capacity, then the class section
should be included in the result set.
With this query, the trick is to first join each class
section to the classroom in which is being taught and then add the additional
criterion that the classroom’s capacity is less than that of the class
section. To do this, simply take the
query from the last example and add the additional criterion.
SELECT
s.CourseID,
s.SectionNumber,
c.RoomCapacity,
s.SectionCapacity
FROM
tb_ClassSection s
INNER JOIN
tb_Classroom c
ON(
s.BuildingName
= c.BuildingName
AND
s.RoomNumber
= s.RoomNumber
AND
c.RoomCapacity
< s.SectionCapacity
)
A common mistake when constructing queries such as this is
not including the equijoin criteria necessary to
match up the rows to be compared by the inequality
operator. If only the inequality
comparison is included in the criteria, the query returns all the rows where a
classroom’s capacity is less than that of any class section, regardless of
whether or not the class section was taught in that classroom.
Not Equals Joins
You may be wondering how in the world it could be useful to
use the not equals operator in a join.
For an example, consider another report in the school management system
in which you must the misallocation of laboratory-equipped classrooms. This report must list all of the class
sections that require laboratory equipment, but are scheduled
to be taught in classrooms that do not have it. The report should also include all non-lab
class sections being taught in laboratory
classrooms. In the tables, the class
sections that require laboratory equipment are indicated with a
RequiresLabEquip value of 1, and the classrooms equipped with laboratory
equipment are indicated with a HasLabEquip value of 1.
This problem follows a similar pattern to that of the
capacity problem. The only difference is
the use of the not equals operator in place of the less than operator. After matching the class section with the
classroom in which it is being taught, the value of the RequiresLabEquip column
must be compared with the HasLabEquip column.
If these values are not equal, there is a laboratory equipment
allocation problem and the class section should be included on the report. Applying these criteria result in the
following query:
SELECT
s.CourseID,
s.SectionNumber,
c.HasLabEquip,
s.RequiresLabEquip
FROM
tb_ClassSection s
INNER JOIN
tb_Classroom c
ON(
s.BuildingName
= c.BuildingName
AND
s.RoomNumber
= s.RoomNumber
AND
c.HasLabEquip
<> s.RequiresLabEquip
)
When using the not equals operator in joins, it is even more
vital to remember to use additional join criteria than it is when using the greater
than and less than operators. In this case, if only the not equals
criterion was specified, the query would perform a cross join
and then exclude only the class section-classroom pairs where the laboratory
indicator was not equal. If there were
100 classrooms and 500 class sections, this could possibly return a result set
of 25,000 - 50,000 rows – definitely not what was intended.
Beyond the Basics
Compound joins, especially those employing the inequality
and not equals operators, can be used with other advanced SQL techniques to
construct queries that are even more powerful.
The key to leveraging these advanced joins is to spell out the
requirements of the problem clearly in terms of relating the available sets of
data. Once the relationships between the
sets of data are understood, these relationships can
be encoded into join criteria. This
technique, along with testing and validation of the output, will enable you to
solve complicated problems effectively.