21 April 2022

14051 views

12 0

The ins and outs of joining tables in SQL Server

Most database queries require joining tables. In this article, Greg Larsen explains the ins and outs of joining tables in SQL Server.

There are times when you need to join multiple tables to retrieve the result set needed. SQL allows for multiple tables to be included in the FROM clause along with the criteria for how to join tables. In SQL Server there are a number of different ways to join tables. In this article, I will be discussing the ins and outs of joining two tables in SQL Server using the inner and outer join operators.

The Three Parts of a Join Operations

When two tables are joined using a join operation, the records of the two sets are logically separated into three different buckets. Those different buckets can be represented using a Venn diagram like the one in Figure 1.

An images showing a Venn diagram. The left is Yellow holding rows from Set A, the right is blue holding rows from Set b. The intersection is brown.

Figure 1: Three parts of the JOIN Operation

Each section of the Venn diagram is represented by a different color: yellow, brown, and blue. The members of Set A that don’t match with SET B will be contained in the yellow bucket. Members of SET A and B that match end up in the brown bucket. Lastly, the members in the blue bucket are those members in SET B that don’t match with SET A. The Venn diagram can visually represent the resulting sets created when an inner or outer join operation is performed.

Test Data

To demonstrate how the different inner and outer join operations work, the examples use two test data tables created in the tempdb database. These two tables are SetA and SetB. If you want to follow along and execute the examples in this article, you can create the two test tables using the code in Listing 1.

Listing 1: Creating SetA and SetB

USE tempdb;

CREATE TABLE SetA (Col1 tinyint, Col2 char(1));

INSERT INTO SetA VALUES

(1, 'A'), (2, 'B');

CREATE TABLE SetB (ColA tinyint, ColB char(1), ColC char(1));

INSERT INTO SetB VALUES

(1, 'A', 'B'), (2, 'C', 'D'), (3, 'E', 'F');

The two tables created when Listing 1 is run can be represented by the Venn diagram in Figure 2.

An image showing a Venn diagram. Set A is a yellow circle. Set B is a blue circle. They do not intersect.

Figure 2: Tables SetA and SetB

Table SetA consists of a set with two columns (Col1 and Col2) and two different rows. Table SetB has three columns (ColA, ColB, and ColC) and contains three different rows.

INNER JOIN operator

The INNER JOIN operator is probably the most used join operation found in T-SQL code. This operator returns rows that match when two sets are joined based on one or more search criteria. The brown section in Figure 3 represents the result set of an INNER JOIN operation between the two test data tables, SetA and SetB, based on the search criterion SetA.Col2 = SetB.ColB.

An images showing a Venn diagram. The yellow left section has rows that do not match from SetA 2,B. The middle brown section has values that match, the right blue section has rows that do not match
Figure 3: Venn Diagram representing an INNER JOIN

The T-SQL code in Listing 2 demonstrates one way to write an INNER JOIN operation to return the matching rows found in the brown section of the Venn diagram in Figure 3.

Listing 2: Performing INNER JOIN operation

USE tempdb;

SELECT * FROM SetA INNER JOIN SetB

ON SetA.Col2 = SetB.ColB;

Report 1 shows the results when the code in Listing 2 is executed.

Report 1: Output produced when Listing 2 is run

The code in Listing 2 identifies the two tables, SetA and SetB, to be joined in the FROM clause, with the INNER JOIN operator between them, followed by an ON clause. The ON clause identifies the search criteria to be used when joining the two tables. In Listing 2, the ON clause specifies SetA.Col2 = SetB.ColB as the search criterion. This criterion specifies that the INNER JOIN operation is to find all the rows in SetA that have corresponding matching rows in SetB, where Col2 in SetA is equal to ColB in SetB. Let’s walk through how this join operation works.

While the technical details of how the engine joins the tables together are beyond the scope of this article, logically, SQL Server has to compare each row in SetA with each row in SetB to determine which rows match the join condition. To perform this comparison, SQL Server first finds the value for Col2 in the first row of SetA, which has a value of A. Next SQL Server scans the rows in SetB to determine if any ColB values match the value of A. SQL Server only finds a single row in SetB with a value of A in ColB. SQL server then repeats this compare process between SetA and SetB for each row in SetA, trying to find matches. The second row in SetA doesn’t match with any rows in SetB. Once all the comparisons between SetA and SetB are completed, the column values from matching rows between SetA and SetB are merged, and the final results of the INNER JOIN are returned to the calling program.

The matching search criterion used in Listing 2 uses SQL-92 join syntax. This is not the only join syntax supported by SQL Server.

Comparing join syntax between versions of SQL

In the early days of SQL, the various database vendors started implementing SQL code differently. This led to inconsistencies in how SQL code was written. In the 1980s, the American National Standards Institute (ANSI) started working on a SQL language standard. Over time the SQL standards changed the way joins could be written.

The code in Listing 2 above uses SQL-92 join syntax, which requires a join operator to be specified between the joined sets and an ON clause to identify the join criteria. Prior to SQL-92, a different syntax for joining data was used which is shown in Listing 3.

Listing 3: Performing INNER JOIN using SQL-86 join syntax

USE tempdb;

SELECT * FROM SetA, SetB

WHERE SetA.Col2 = SetB.ColB;

The pre-SQL-92 syntax required a comma to be placed between tables being joined and required the join criteria be specified in the WHERE clause. By moving the join conditions to the ON clause, the SQL-92 standards separated the join conditions from other search conditions a query might have in the WHERE clause. SQL Server still supports the older method of doing the INNER JOIN operation using pre-SQL-92 syntax, but it no longer supports the outer join method using the pre-SQL-92 syntax. SQL-92 standard is now the recommended standard for doing join operations. For the rest of the join examples in this article, only the ANSI SQL-92 syntax will be shown.

The INNER JOIN operation returns rows with matching members in both the left and right sides of joined sets. However, there are times when rows might need to be included in result sets even when they don’t match. This is where the outer join operations can help.

LEFT OUTER JOIN

The LEFT OUTER JOIN operation returns all the rows from the table on the left, even when they don’t match any of the rows on the right based on the search criteria. The Venn diagram in Figure 4 logically shows the results of a LEFT OUTER JOIN operation between SetA and SetB based the search criterion of SetA.Col2 = SetB.ColB.

An image showing a Venn diagram. The left circle contains all rows from SetA matching SetB and including rows from SetA that don't match. The right section is blue representing SetB rows that don't match

Figure 4: Venn diagram representing LEFT OUTER JOIN

By comparing Figure 4 to the Venn diagram for the INNER JOIN found in Figure 3, the matched row of SetA, in the brown section, and the unmatched members in the yellow section of SetA have been combined into a single yellow circle. The members of this yellow circle are the members of SetA and SetB that will be returned from the LEFT OUTER JOIN operations. Listing 4 contains the T-SQL code to produce the LEFT OUTER JOIN results represented by Figure 4.

Listing 4: LEFT OUTER JOIN

USE tempdb;

SELECT * FROM SetA LEFT OUTER JOIN SetB

ON SetA.Col2 = SetB.ColB;

Report 2 shows the output when Listing 4 is executed.

Report 2: Output from running Listing 4

The results in Report 2 have one row that matched the search criterium and one that didn’t match. The row that didn’t match the column values for the table on the right side of the LEFT OUTER JOIN operation is set to NULL. It is easy to identify those rows that did not match the search criterium by checking to see if any of the columns from the right table is null, as done in Listing 5.

Listing 5: Finding rows on the left that don’t match rows on the right

USE tempdb;

SELECT * FROM SetA LEFT OUTER JOIN SetB

ON SetA.Col2 = SetB.ColB

WHERE SetB.ColA is null;

When the code in Listing 5 is run, only the last row in Report 2 will be returned.

RIGHT OUTER JOIN

The RIGHT OUTER JOIN operation is similar to the LEFT OUTER JOIN operation. The difference is that all the rows in the right table are returned, whether or not they match the rows in the left table. To show this consider the code in Listing 6.

Listing 6: Right outer join example

USE tempdb;

SELECT * FROM SetA

RIGHT OUTER JOIN SetB

ON SetA.Col2 = SetB.ColB;

Report 3 shows the results when Listing 6 is executed.

Report 3: Results when Listing 6 is run

Report 3 shows that all the rows from the right table, SetB, are returned. For those rows in SetB that don’t match SetA, the NULL value was returned for the SetA columns in the result set.

If you need to find all the rows in SetB that don’t match the search criterion of SetA.Col2 = SetB.ColB, then the code in Listing 7 can be run.

Listing 7: Finding rows in SetB that don’t match SetA

USE tempdb;

SELECT * FROM SetA

RIGHT OUTER JOIN SetB

ON SetA.Col2 = SetB.ColB

WHERE SetA.Col1 is null;

Report 4 shows the results produced when Listing 7 is executed.

Report 4: Results from running Listing 7

FULL OUTER JOIN

The FULL OUTER JOIN operation combines both the LEFT and RIGHT OUTER JOIN and the INNER JOIN in a single operation. Listing 8 uses the FULL OUTER JOIN operator to find all the rows that match and don’t match the search criterion SetA.Col2 = SetB.ColB.

Listing 8: Finding all matching and unmatched rows

USE tempdb;

SELECT * FROM SetA FULL OUTER JOIN SetB

ON SetA.Col2 = SetB.ColB;

Report 5 shows the results when Listing 7 is run.

Report 5: Output from Listing 8

The first row in Report 5 is the only row that meets the search criterion identified in Listing 8, hence why each column has a value. The second row is the left row that doesn’t match any rows on the right; it could be found using the LEFT OUTER JOIN. The third and fourth rows are the rows from the right that don’t match the rows from the left; they could be found using a RIGHT OUTER JOIN operation.

The code in Listing 9 shows how to find only those rows in either the left or right table that don’t match the search criterion.

Listing 9: Using the FULL OUTER JOIN

USE tempdb;

SELECT * FROM SetA FULL OUTER JOIN SetB

ON SetA.Col2 = SetB.ColB

WHERE SetA.Col1 is null or SetB.ColA is null;

Report 6 shows the output when Listing 9 is executed.

Report 6: Output when Listing 9 is run

The rows from the left table that don’t match rows in the right table will have nulls in the columns from the right table. Inversely, rows from the right table that don’t match rows in the left table will have nulls identified for all the columns from the left table.

Using Multiple comparison operations

The comparison between two sets in a join operation can also have multiple comparison operations, like the code in Listing 10.

Listing 10: Using multiple comparison operators in a join operation

USE tempdb;

SELECT * FROM SetA INNER JOIN SetB

ON SetA.Col1 = SetB.ColA AND SetA.Col2 = SetB.ColB;

When Listing 10 is executed the results in Report 7 is produced.

Report 7: Output when Listing 10 is run

In Listing 10 only the rows where both Col1 from Set1 matches ColA from SetB and Col2 from SetA matches ColB from SetB.