Stairway to T-SQL DML

Stairway to T-SQL DML Level 5: The Mathematics of SQL: Part 2

,

In my last stairway article (Level 4: The Mathematics of SQL – Part 1) I introduced you to the concept of a Venn diagram, and discussed using the INTERSECT, UNION and EXCEPT operators. This stairway will continue exploring the mathematics of SQL by discussing the two different join operators, INNER JOIN and OUTER JOIN.

Joining tables is a crucial concept to understanding data relationships in a relational database. When you are working with your SQL Server data, you will often need to join tables to produce the results your application requires. Having a good understanding of set theory, and the mathematical operators available and how they are used to join tables will make it easier for you to retrieve the data you need from SQL Server.

Joining Two Sets

A set is a well-defined, unordered list of members. When two sets are joined together the results of the joined sets contains three different parts, as represented by the Venn diagram in Figure 1. The INNER JOIN and OUTER JOIN operations can be used to return the different parts of the Venn diagram in Figure 1. The INNER JOIN operator can be used to return the members or rows represented by part 2 of Venn diagram in Figure 1. The OUTER JOIN operator can be used to return the members from the other two parts (part 1 and 3) in Figure 1.

Figure 1: Different Parts of two-joined sets

To show what is returned when using these two different join operators, I am going to use two sets (implemented as SQL Server tables) that are represented in Figure 2.

Figure 2: Sample data for Set A and Set B

In Figure 2, Set A contains 9 members represented by the numbers ranging from 1 to 6, where some members are unique and others are not. Set B has 6 members ranging from 5 to 8, and also has some unique members and some that are not. I am going join these two sets on the members in Set A that are equal to the members in Set B, which is represented by the Venn diagram in Figure 3.

Figure3: Set A and Set B joined on members that are equal.

By looking at the Venn diagram of the joined sets in Figure 3, you can visually see that it is joined based on the member values that are same between SET A and SET B. This is represented by the center section, or part 2 of the Venn diagram. The members in Set A that are not included in Set B have member values 1 through 4 and are represented by the left part of the Venn diagram, or what I am calling part 1. The members in Set B that are not in Set A are those members in the right (or red) section of the Venn diagram in Figure 3. These members are the values 7 and 8.. I will be showing you how to use the INNER JOIN and OUTER JOIN operators to return the members in each of these different parts.

Creating Sample Data for Set A and Set B

To demonstrate how to return these different parts of joined sets using the different join operators I will need to build a couple of SQL Server tables that contain the members of the different sets described in Figure 2 and Figure 3.To do that I will use the code in Listing 1.

USE tempdb;
GO
SET NOCOUNT ON;
-- Create Set A
CREATE TABLE [SET A] (Member TINYINT);
INSERT INTO [SET A] VALUES (1);
INSERT INTO [SET A] VALUES (1);
INSERT INTO [SET A] VALUES (2);
INSERT INTO [SET A] VALUES (3);
INSERT INTO [SET A] VALUES (4);
INSERT INTO [SET A] VALUES (4);
INSERT INTO [SET A] VALUES (5);
INSERT INTO [SET A] VALUES (6);
INSERT INTO [SET A] VALUES (6);
-- Create Set B
CREATE TABLE [SET B] (Member TINYINT);
INSERT INTO [SET B] VALUES (5);
INSERT INTO [SET B] VALUES (5);
INSERT INTO [SET B] VALUES (6);
INSERT INTO [SET B] VALUES (7);
INSERT INTO [SET B] VALUES (7);
INSERT INTO [SET B] VALUES (8);

Listing 1: Creating sample data for Set A and Set B

INNER JOIN Operator

The INNER JOIN operator is used to return part 2 of the Venn diagram in Figure 3. By using the INNER JOIN operator to join [SET A] and [SET B] I can return the member values of 5 and 6. Listing 2 contains to T-SQL code to perform this INNER JOIN operation.

-- INNER JOIN
SELECT [SET A].Member AS [SET A Member]
      ,[SET B].Member AS [SET B Member]
FROM [SET A] INNER JOIN [SET B] 
ON [SET A].Member = [SET B].Member;

Listing 2: Demonstration of INNER JOIN

When the code in listing 2 is run the output in Report 1 is returned.

SET A Member SET B Member
------------ ------------
5            5
5            5
6            6
6            6

Report 1: Output when running code in Listing 2

The INNER JOIN operator returns the member values of 5 and 6 and doesn’t remove duplicates. This INNER JOIN operation does not return 6 different rows as shown in the Venn diagram in Figure 3. The reason for this is the Venn diagram is only the logical representation of the INNER JOIN operation, whereas the output in Report 1 is the result of the actual INNER JOIN operation. When the SQL Server engine performs an INNER JOIN operation it takes each member of [SET A] and matches it with members from [SET B] that have the same value. The only members that have matching values between [Set A] and [SET B] are the member with a value of 5 or 6. The INNER JOIN operation works like this, it first takes the single member with a value of 5 from [SET A] and matches members in [SET B] that also have a value of 5. For each match between two sets one row is returned. This yields the two rows that have member values of 5 in Report 1. . The engine also performs the same INNER JOIN operations for each member in [SET A] that has a matching member in [SET B] with value of 6. Since there are two members in [SET A] with a value of 6 and only one member in [SET B] with a value of 6 only two rows are returned and displayed in Report 1.

OUTER JOIN Operator

To return part 1 and part 3 of the Venn diagram in figure 3 we need to use two different variations of the OUTER JOIN operator. To return part 1 we need to use a LEFT OUTER JOIN operation, but to return part 3 we use a RIGHT OUTER JOIN operation. To demonstrate what is returned using the LEFT OUTER JOIN operations I run the code in Listing 3.

SELECT [SET A].Member AS [SET A Member]
      ,[SET B].Member AS [SET B Member]
FROM [SET A] LEFT OUTER JOIN [SET B] 
ON [SET A].Member = [SET B].Member;

Listing 3: Using a LEFT OUTER JOIN operation

When the code in Listing 3 is run the output in Report 2 is returned.

SET A Member SET B Member
------------ ------------
1            NULL
1            NULL
2            NULL
3            NULL
4            NULL
4            NULL
5            5
5            5
6            6
6            6

Report 2: Output from basic LEFT OUTER JOIN operation

When the LEFT OUTER JOIN operation of Listing 3 is performed, the members in [SET A] are matched to the members in [SET B] by using the column identified in the ‘ON’ clause. If a match is not found then the value of the ‘SET B Member’ column in the output is set to NULL. In my example, this happens when trying to match member values of 1 through 4 in [Set A] with [SET B]. When a match is found between the members of [SET A] and [SET B] then the member value of [SET B] is returned, which is why the output for members values of 5 and 6 in column ‘SET A Members’ have matching values for the ‘SET B Member’ column. In order to return only the members in Part 1 of the Venn diagram in Figure 3 I have to add a WHERE predicate to the code in Listing 3, as can be found in Listing 4.

SELECT [SET A].Member AS [SET A Member]
      ,[SET B].Member AS [SET B Member]
FROM [SET A] LEFT OUTER JOIN [SET B] 
ON [SET A].Member = [SET B].Member  
WHERE [SET B].Member IS NULL;

Listing 4: Returning Part 1 using a LEFT OUTER JOIN operator

When the code in Listing 4 is run the output in Report 3 is displayed. By looking at the ‘SET A Member’ column in this output you can see that only the members from [SET A] that are in part 1 of the joined set are returned.

SET A Member SET B Member
------------ ------------
1            NULL
1            NULL
2            NULL
3            NULL
4            NULL
4            NULL

Report 3: Part 1 output from LEFT OUTER JOIN operation with WHERE predicate

To get the results of part 3 of figure 3 I just need to specify a RIGHT OUTER JOIN operation with a WHERE predicate, as shown in the code in listing 5.

SELECT [SET A].Member AS [SET A Member]
      ,[SET B].Member AS [SET B Member]
FROM [SET A] RIGHT OUTER JOIN [SET B] 
ON [SET A].Member = [SET B].Member  
WHERE [SET A].Member IS NULL;

Listing 5: Returning Part 3 using a RIGHT OUTER JOIN operator

If you run this code, you will see that the member values of 7 and 8 are returned under the ‘SET B Member’ column heading, but the values under ‘SET A Member’ column heading contains the NULL value.

Joining two sets with Different Number of Columns

My examples above showed joining two sets that only have a single column, but the join operators can handle joining sets with multiple columns. It can even handle joining two sets that have different numbers of columns. To demonstrate this I will create a couple of new tables ([SET C] and [SET D]) that have multiple columns. The Venn diagram shown in Figure 4 represents the new tables I will be creating and illustrates how the two sets will be joined.

Figure 4: Sets with multiple columns

To create these two tables I will be using the code in Listing 6.

SET NOCOUNT ON;
-- Create Set C
CREATE TABLE [SET C] (Id_C TINYINT, Code CHAR(1));
INSERT INTO [SET C] VALUES (1,'A');
INSERT INTO [SET C] VALUES (1,'B');
INSERT INTO [SET C] VALUES (2,'C');
INSERT INTO [SET C] VALUES (3,'D');
INSERT INTO [SET C] VALUES (4,'E');
INSERT INTO [SET C] VALUES (4,'F');
INSERT INTO [SET C] VALUES (5,'G');
INSERT INTO [SET C] VALUES (6,'H');
INSERT INTO [SET C] VALUES (6,'I');
-- Create Set D
CREATE TABLE [SET D] (Id_D TINYINT, CodeX CHAR, CodeY TINYINT);
INSERT INTO [SET D] VALUES (5,'Z',1);
INSERT INTO [SET D] VALUES (5,'Y',2);
INSERT INTO [SET D] VALUES (6,'X',3);
INSERT INTO [SET D] VALUES (7,'W',4);
INSERT INTO [SET D] VALUES (7,'U',5);
INSERT INTO [SET D] VALUES (8,'T',6);

Listing 6: Creating sample tables with multiple columns

To demonstrate an INNER JOIN operation to return part 2 of Figure 4 I will run the code in Listing 7.

SELECT * FROM [SET C] INNER JOIN [SET D] 
        ON [SET C].Id_C = [SET D].Id_D;

Listing 7: INNER JOIN operation on sets with different number of columns

When the code in Listing 7 is run the output in Report 4 is returned. By reviewing the output in Report 4, you can see rows matched between [SET C] and [SET D] are based on member values of 5 and 6 which is similar to my prior example that produced Report 1. However, this time the other columns contained in [Set C] and [Set D] tables are also returned

Id_C Code Id_D CodeX CodeY
---- ---- ---- ----- -----
5    G    5    Z     1
5    G    5    Y     2
6    H    6    X     3
6    I    6    X     3

Report 4: Output of INNER JOIN operation of two sets with different number of columns

If I were to modify the code in Listing 7 to perform the outer join operations you would see that these additional columns are returned for the outer join queries as well. I will leave it up to you to build and run these outer join queries.

Set Theory and the Mathematics of Joining SQL Server Tables

When writing application code you will be joining sets to return the data your applications need. Understanding set theory and the mathematics of joining SQL Server tables is a key part to writing good application code. You will find that the INNER JOIN and OUTER JOIN operators are probably the most frequently used operators you will be using to join two set when writing T-SQL code. Therefore understanding how both JOIN operators work, along with the INTERSECT, UNION and EXCEPT operators will allow you to quickly and easily write the code to return the data you need from your SQL Server databases.

This article is part of the parent stairway Stairway to T-SQL DML

Rate

5 (1)

You rated this post out of 5. Change rating

Share

Share

Rate

5 (1)

You rated this post out of 5. Change rating