Introduction
You have been asked for a report of those employees who have not completed the required company courses. You have a table with all the employees, a table with the required courses and another table with the courses that have been completed. It is easy to see what courses has been completed with a simple inner join, but how do you see what courses have not been completed. That is where the outer join comes into play.
Creating the Objects
Let's start by creating the needed tables. Here is the script to create the three tables.
IF OBJECT_ID('tempdb..#Employee','U') IS NOT NULL DROP TABLE #Employee; GO CREATE TABLE #Employee ([EmployeeId] [int] NOT NULL ,[FirstName] [nvarchar](25) NOT NULL ,[LastName] [nvarchar](25) NOT NULL ,CONSTRAINT [PK_Employee] PRIMARY KEY CLUSTERED ([EmployeeId] ASC) ); GO IF OBJECT_ID('tempdb..#Course','U') IS NOT NULL DROP TABLE #Course; GO CREATE TABLE #Course ([CourseId] [int] NOT NULL ,[CourseName] [varchar](25) NOT NULL ,CONSTRAINT [PK_Course] PRIMARY KEY CLUSTERED ([CourseId] ASC) ); GO IF OBJECT_ID('tempdb..#CourseCompleted','U') IS NOT NULL DROP TABLE #CourseCompleted; GO CREATE TABLE #CourseCompleted ([EmployeeId] [int] NOT NULL ,[CourseId] [int] NOT NULL ,[DateComplete] [datetime] NOT NULL ,CONSTRAINT [PK_CourseCompleted] PRIMARY KEY CLUSTERED ([EmployeeId] ASC,[CourseId] ASC) ); GO
Populate Test Data
Next we will need to populate the tables with data. I have two variables that you can change to determine how many employees and courses to create. The last table to get populated shows what courses have been completed. It is randomly populating data based off the employee and course tables.
/*Set number of courses and number of Employees to create*/DECLARE @NumberOfCourse INT = 30; DECLARE @NumberOfEmployees INT = 50; DECLARE @Number INT; SET @Number = CASE WHEN @NumberOfCourse >= @NumberOfEmployees THEN @NumberOfCourse ELSE @NumberOfEmployees END; IF OBJECT_ID('tempdb..#Number','U') IS NOT NULL DROP TABLE #Number; SELECT TOP (@Number) IDENTITY( INT,1,1 ) AS Number INTO #Number FROM sys.all_columns AS ac CROSS JOIN sys.all_columns AS ac1; INSERT INTO #Course (CourseId ,CourseName ) SELECT TOP (@NumberOfCourse) Number ,'Course ' + CONVERT(VARCHAR(10),Number) FROM #Number; INSERT INTO #Employee (EmployeeId ,FirstName ,LastName ) SELECT TOP (@NumberOfEmployees) Number ,LEFT(NEWID(),20) ,LEFT(NEWID(),20) FROM #Number; INSERT INTO #CourseCompleted (EmployeeId ,CourseId ,DateComplete ) SELECT DISTINCT ABS(CHECKSUM(NEWID())) % @NumberOfEmployees + 1 AS EmployeeId ,ABS(CHECKSUM(NEWID())) % @NumberOfCourse + 1 AS CourseId ,CURRENT_TIMESTAMP FROM #Course CROSS JOIN #Employee;
Solution
There are a couple of possible solutions. We will look at outer joinsand cross joins.
Outer Join
We will start by looking at what courses a single employee has not completed. If we pass in an EmployeeId, a list of courses that have not been completed comes up. This query will use an outer join and a where clause where the EmployeeId is null.
/*List of courses not completed by one Employee*/DECLARE @EmployeeId INT = 1; SELECT Course.CourseId , Course.CourseName FROM #Course AS Course LEFT OUTER JOIN #CourseCompleted AS CourseCompleted ON CourseCompleted.CourseId = Course.CourseId AND CourseCompleted.EmployeeId = @EmployeeId WHERE CourseCompleted.EmployeeId IS NULL;
The results show all the courses that employee 1 has not taken.
Cross Join
A cross join with two tables without a where clause will produce a Cartesian product. If you have 20 records in one table and 30 records in the second table you will return 600 records. So you need to be very careful when using a cross join on large tables. Below is an example of using a cross join
SELECT * FROM #Employee CROSS JOIN #Course WHERE #Employee.EmployeeId = 1
This returns a row for each course for employee 1. If there was no Where clause it would return a row for each employee for each class.
Now we will combine the outer join with the cross join. The first query will show all the employees and all of the courses regardless if the course has been completed. The courses that have been completed have a date for when they were completed. The second query only shows the courses that have not been completed. The only difference between the two queries is the where clause.
In these queries I am cross joining the course table to the employee table. This way I can get a list of every employee for every course. Then when I only want to see the courses that have not been completed I use an outer join on the course completed table and a where clause where the CourseId is null. This way I only get a list of the courses that have not been completed.
/*List all employees and all courses*/SELECT Employee.EmployeeId ,Employee.FirstName ,Employee.LastName ,Course.CourseId ,Course.CourseName ,CourseCompleted.DateComplete FROM #Employee AS Employee CROSS JOIN #Course AS Course LEFT OUTER JOIN #CourseCompleted AS CourseCompleted ON CourseCompleted.EmployeeId = Employee.EmployeeId AND CourseCompleted.CourseId = Course.CourseId ORDER BY Employee.EmployeeId ,Course.CourseId; /*List of employees and courses they have not completed*/SELECT Employee.EmployeeId ,Employee.FirstName ,Employee.LastName ,Course.CourseId ,Course.CourseName FROM #Employee AS Employee CROSS JOIN #Course AS Course LEFT OUTER JOIN #CourseCompleted AS CourseCompleted ON CourseCompleted.EmployeeId = Employee.EmployeeId AND CourseCompleted.CourseId = Course.CourseId WHERE CourseCompleted.CourseId IS NULL ORDER BY Employee.EmployeeId
Make sure you understand the results you will return when using a cross join. If both tables are very large you might return more data that you want.
By using an outer join and a simple where clause you can easily see what data is missing from the second table.