A relational database contains tables that relate to each other by key values. When querying data from these related tables you may choose to select data from a single table or many tables. If you select data from many tables, you normally join those tables together using specified join criteria. The concepts of selecting data from tables and joining tables together is all about managing and manipulating sets of data. In Level 4 of this Stairway I will be exploring the concepts of set theory and mathematical operators to join, merge, and return data from multiple SQL Server tables.
What is a table?
What is a table, really? A table is just a bunch of rows that contain values in one or more columns. A table can be thought of as a set containing zero, one, or many records. Each row in a table is considered a member of the table set. The concept of sets and members of sets are the basic elements of set theory. By performing set operations, such as intersection and union against different sets (or against different tables in a SQL Server database) we are able to retrieve the data we need to support our applications.
Basic Set Theory
In the late 1800’s Georg Cantor formally defined a set. He defined a set as “any collection into a whole M of definite and separate objects m.”, where the separate objects m are elements of the set M. . Cantor’s work eventually grow into what we now call set theory. Edgar Codd used the concept of set theory to explain the relationships between data in database tables, which formed the basis of our all familiar relational database. The relational database concept that Edgar Codd developed was just a projection of the mathematical concepts of set theory for retrieving data stored in a database. By better understanding set theory, we can better understand and manipulate our data that is stored in our relational databases.
In addition to a set, there is also the concept of a subset. A subset is just a collection of members of a set where the number of members in the subset is less than the number members of the whole set. A subset can be thought of as a partial list of members of a set. In SQL Server, a subset of a table would be a set of rows, where the total number of rows in the subset has fewer rows than the original table.
To retrieve data for your application you are typically doing more than just retrieving a set of records from a single SQL Server table. You will commonly need to join one set with another set, and then finally use some parts of that joined set to support the data that your application might need. To explore the idea of joining sets together a little more I will be looking at a concept known as Venn diagrams.
A Venn diagram is a graphical representation of how two or more sets intersect each other. A Venn diagram makes it easy to visualize and describe the different parts of intersecting sets. The Venn diagramming concept was invented by John Venn, although he never called it a Venn diagram. Figure 1 shows a simple Venn diagram that represents the intersection of two sets, ‘Set A’ and ‘Set B’.
Set A is represented by the light blue circle, and Set B is represented by the red circle. I have labeled the different parts of the joined set with 1, 2, and 3, as well as shaded each part with a different color, so I can discuss each of these parts. The light blue part of Figure 1, that is labeled ‘1’, represent the members in Set A that do not match, or intersect with any members in Set B. The purple part, labeled ‘2’, contains the members of Set A that intersect, or have the matching members in Set B. The red part labeled ‘3’ is the part of Set B that has no matching, or intersecting members with Set A.
A Venn diagram can be used to visually represent how all kinds of different sets might intersect. Figure 2 shows a Venn diagram that represents where Set B is contained within Set A, so B is a subset of A.
Venn diagrams can even be more complex, they can represent the intersection of multiple sets. Figure 3 shows the intersection of three different sets.
Understanding of the different parts of intersecting sets as represented using Venn diagrams can give you a better sense for how joined sets can be used to support retrieving data from relational databases.
You might remember INTERSECT, and UNION operators from your mathematical studies of algebra. These mathematical operators have been around since set theory was introduced, and they not only apply to joining sets, but they also apply to joining SQL Server tables. There is also another operator that is available in SQL Server that I will be discussing that was not part of your algebraic mathematical studies, and that is the EXCEPT operator. This operator is useful in returning a part of an intersecting set and therefore is worthy of discussion along with INTERSECT and UNION. There are also a couple of additional operators, such as INNER JOIN and OUTER JOIN that can be used to join two sets. I will be discussing these operators in my next level.
To help demonstrate how each of these operators work I will walk you through various SELECT statements that use the INTERSECT, UNION and EXCEPT operators. I will also use Venn diagrams and a few simple SQL Server tables to explore how these operators can be used to return a set of rows when joining SQL Server tables.
For the first few examples, I will be using two different sets: Set A and Set B. These two sets and their members are shown in Figure 4.
In Figure 4, Set A contains 9 members consisting of numbers ranging in value from 1 to 6. Some members are unique and others are not. Set B has 6 members, ranging in value from 5 to 8, and also has some unique members and some that are not unique. The Venn diagram in Figure 5 shows the intersection of these two set based on those members in Set A that have matching values to the members in set B.
In Figure 5, you can visually see the different parts of the two intersecting sets. Part 1 consists of members with a values of 1 through 4, part 2 members have values of 5 or 6, and part 3 contains members with the values of 7 and 8. I will demonstrate how the INTERSECT, UNION and EXCEPT operators can be used to return these different parts using rows in SQL Server tables. To do this, I will first build SQL Server tables that contain the two sets and their members identified in Figure 4. The code is shown in Listing 1.
USE tempdb; GO SET NOCOUNT ON; -- Create Set A CREATE TABLE [SET A] (Member TINYINT); INSERT INTO [SET A] VALUES (1); INSERT INTO [SET A] VALUES (1); INSERT INTO [SET A] VALUES (2); INSERT INTO [SET A] VALUES (3); INSERT INTO [SET A] VALUES (4); INSERT INTO [SET A] VALUES (4); INSERT INTO [SET A] VALUES (5); INSERT INTO [SET A] VALUES (6); INSERT INTO [SET A] VALUES (6); -- Create Set B CREATE TABLE [SET B] (Member TINYINT); INSERT INTO [SET B] VALUES (5); INSERT INTO [SET B] VALUES (5); INSERT INTO [SET B] VALUES (6); INSERT INTO [SET B] VALUES (7); INSERT INTO [SET B] VALUES (7); INSERT INTO [SET B] VALUES (8);
The INTERSECT operator combines two sets together based on the common members between the two sets, just like the Venn diagram did in Figure 5. The result of the INTERSECT operation is a list of the common members between the two intersecting sets. In my example I will use the INTERSECT operator to returns the members in part 2 of the Venn diagram in Figure 5, except the members will be distinct (duplicates will be removed), so the result set only returns a single row for every unique member value. Removing the duplicates is something that the INTERSECT operator does automatically. To demonstrate how the INTERSECT operator works; I run will execute the code in Listing 2.
USE tempdb; GO SELECT Member FROM [SET A] INTERSECT SELECT Member FROM [SET B];
The results of running listing 2 can be found in Report 1. By reviewing Report 1 you can see how only two rows are returned, and the values returned are those distinct values that are common between tables [SET A ]and [SET B].
Member ------ 5 6
The UNION operator brings together all three different parts (1, 2, and 3) from the Venn diagram in Figure 5. It returns every member value in both [SET A] and [SET B] and de-duplicates the members while the union operation is performed. To demonstrate how the UNION operator works I will run the code in Listing 3.
SELECT Member FROM [SET A] UNION SELECT Member FROM [SET B];
When this code is run, the results in Report 2 are produced. If you look at the results in this report, you can see all the member values are returned, but all the duplicate member values in the two different sets have been removed.
Member ------ 1 2 3 4 5 6 7 8
It is possible to retain the duplicate values while the union operation is performed against two sets. This can be done by using the UNION ALL operator. The ‘ALL’ part of the operator tell SQL Server to not remove the duplicates. To demonstrate the UNION ALL operator I will run the code in Listing 4.
SELECT Member FROM [SET A] UNION ALL SELECT Member FROM [SET B];
When I run the code in Listing 4, the results in Report 3 are produced. You can see that all the members of both tables ([SET A] and [SET B are returned, including the duplicates.
Member ------ 1 1 2 3 4 4 5 6 6 5 5 6 7 7 8
All of my examples so far have only shown INTERSECT and UNION operations being performed against two sets. It is possible to use INTERSECT or UNION to combine more than just two sets. To demonstrate this let me first add another table called [SET C] by running the code in Listing 5.
CREATE TABLE [SET C] (Member TINYINT); INSERT INTO [SET C] VALUES (5); INSERT INTO [SET C] VALUES (5); INSERT INTO [SET C] VALUES (9); INSERT INTO [SET C] VALUES (8);
By using the tables [SET A], [SET B] and [SET C] and joining these tables together using the INTERSECT operator in affect I am producing the result set represented by the center section in the Venn diagram shown in Figure 3. To demonstrate intersecting my three different sets I will run the code in Listing 6.
SELECT Member FROM [SET A] INTERSECT SELECT Member FROM [SET B] INTERSECT SELECT Member FROM [SET C];
The results of running the code in Listing 6 can be found in Report 4. You can see that only one member with a value of 5 is returned. The member value of 5 is the only member that is common across all three sets.
Member ------ 5
If you are looking to test what you have learned so far you can try to build the code to UNION together [SET A], [SET B] and [SET C]. This will be relatively easy to do, by just adding another UNION operator and a SELECT statement to the code in Listing 3.
The EXCEPT operator is quite different then either INTERSECT or UNION, in that it brings back only the left portion of two joined sets, or part 1 of the Venn diagram in Figure 1. However, there is one similarity between the EXCEPT operator and the INTERSECT and UNION operators, and that is the EXCEPT operator de-duplicates the members in its result set. To demonstrate this I can run the code in Listing 7.
SELECT Member FROM [SET A] EXCEPT SELECT Member FROM [SET B];
When I run the code in Listing 7, we get the results in Report 5. The results in the report only show the unique member values from [SET A] that are not in [SET B], or the members in part 1 of the Venn diagram in Figure 5.
Member ------ 1 2 3 4
If you want to bring back part 3 of the Venn diagram in Figure 5 you can do that, but the SELECT from table [SET B] needs to be listed first. and the SELECT from table [SET A] needs to be listed second, with the EXCEPT operator between the two SELECTs. The order of the SELECTs needs to be reversed since the EXCEPT operator only brings back members from the first SELECT, that are not contained in the second SELECT. I will leave it up to you to build and execute the code to return part 3 of the Venn diagram in Figure 5.
Joining Tables that Contain More than a Single Column
The INTERSECT and UNION statements can be used join tables that have more than a single column, as my prior examples have shown. The only criteria to make this work are that each SELECT list contains the same number of columns. To demonstrate this I will first create two tables ([SET D] and [SET E]) that contain multiple columns by using the code found in Listing 8.
SET NOCOUNT ON; -- Create Set D CREATE TABLE [SET D] (Col1 TINYINT, Col2 CHAR(1)); INSERT INTO [SET D] VALUES (1,'A'); INSERT INTO [SET D] VALUES (1,'B'); INSERT INTO [SET D] VALUES (2,'A'); INSERT INTO [SET D] VALUES (3,'A'); INSERT INTO [SET D] VALUES (4,'A'); INSERT INTO [SET D] VALUES (4,'A'); INSERT INTO [SET D] VALUES (5,'A'); INSERT INTO [SET D] VALUES (6,'A'); INSERT INTO [SET D] VALUES (6,'A'); -- Create Set E CREATE TABLE [SET E] (Col1 TINYINT, Col2 CHAR(1), Col3 CHAR(1)); INSERT INTO [SET E] VALUES (5,'Z','A'); INSERT INTO [SET E] VALUES (5,'Y','C'); INSERT INTO [SET E] VALUES (6,'X','B'); INSERT INTO [SET E] VALUES (7,'W','A'); INSERT INTO [SET E] VALUES (7,'V','A'); INSERT INTO [SET E] VALUES (8,'U','A');
After creating these two tables, I will then use the code in Listing 9 to intersect tables [SET D] and [SET E]. Note that I have not used ‘Col2’ of [SET E] but instead used ‘Col3’. I did this to demonstrate that the INTERSECT operation will still work, even without both SELECTs returning the same column names. The UNION and INTERSECT will work as long as both SELECTs used return the same number of columns.
SELECT Col1, Col2 FROM [SET D] INTERSECT SELECT Col1, Col3 FROM [SET E];
When I run the code in Listing 9, I get the results found in Report 6.
Col1 Col2 ---- ---- 5 A
The de-duplication process removes duplicates by looking at all the columns returned for a row. That is, two result rows are only considered to be duplicates if all the columns in each of the rows have matching values. By reviewing the output in Report 6, you can see that only the unique rows between [SET D] and [SET E] were returned when I ran the code in Listing 9. In addition, something worth noting is that the column headers displayed take the names associated with the columns identified in the first SELECT.
Set Theory and SQL Server
Set theory, which has been since the late 1800’s, laid the groundwork for selecting information out of SQL Server databases. With the use of Venn diagrams, we are able to show visually how two or more sets intersect. Knowing how our 19th-century mathematicians defined and joined sets is useful in helping us understand how to retrieving data from SQL Server tables using the INTERSECT, UNION, UNION ALL and EXCEPT operators.
Level 5 of this T-SQL Stairway will be Part 2 of “The Mathematics of SQL”, in which we. In Part 2, I will be exploring the INNER JOIN and OUTER JOIN operators, and how these operators can be used to bring back different parts of a Venn diagram as a result set.