SQLServerCentral Article

Outer Join – What am I missing (SQL Spackle)

,

Introduction

You have been asked for a report of those employees who have not completed the required company courses. You have a table with all the employees, a table with the required courses and another table with the courses that have been completed. It is easy to see what courses has been completed with a simple inner join, but how do you see what courses have not been completed. That is where the outer join comes into play. 

Creating the Objects

Let's start by creating the needed tables. Here is the script to create the three tables.

IF OBJECT_ID('tempdb..#Employee','U') IS NOT NULL
  DROP TABLE #Employee;
GO
CREATE TABLE #Employee
  ([EmployeeId] [int] NOT NULL
  ,[FirstName] [nvarchar](25) NOT NULL
  ,[LastName] [nvarchar](25) NOT NULL
  ,CONSTRAINT [PK_Employee] PRIMARY KEY CLUSTERED ([EmployeeId] ASC)
  );
GO
IF OBJECT_ID('tempdb..#Course','U') IS NOT NULL
  DROP TABLE #Course;
GO
CREATE TABLE #Course
  ([CourseId] [int] NOT NULL
  ,[CourseName] [varchar](25) NOT NULL
  ,CONSTRAINT [PK_Course] PRIMARY KEY CLUSTERED ([CourseId] ASC)
  ); 
GO
IF OBJECT_ID('tempdb..#CourseCompleted','U') IS NOT NULL
  DROP TABLE #CourseCompleted;
GO
CREATE TABLE #CourseCompleted
  ([EmployeeId] [int] NOT NULL
  ,[CourseId] [int] NOT NULL
  ,[DateComplete] [datetime] NOT NULL
  ,CONSTRAINT [PK_CourseCompleted] PRIMARY KEY CLUSTERED ([EmployeeId] ASC,[CourseId] ASC)
  ); 
GO

Populate Test Data

Next we will need to populate the tables with data. I have two variables that you can change to determine how many employees and courses to create. The last table to get populated shows what courses have been completed. It is randomly populating data based off the employee and course tables.

/*Set number of courses and number of Employees to create*/DECLARE @NumberOfCourse INT = 30;
DECLARE @NumberOfEmployees INT = 50;
DECLARE @Number INT;
SET @Number = CASE WHEN @NumberOfCourse >= @NumberOfEmployees THEN @NumberOfCourse
                   ELSE @NumberOfEmployees
              END;
IF OBJECT_ID('tempdb..#Number','U') IS NOT NULL
  DROP TABLE #Number;
SELECT TOP (@Number)
        IDENTITY( INT,1,1 ) AS Number
INTO    #Number
FROM    sys.all_columns AS ac
        CROSS JOIN sys.all_columns AS ac1;
INSERT  INTO #Course
        (CourseId
        ,CourseName
        )
        SELECT TOP (@NumberOfCourse)
                Number
               ,'Course ' + CONVERT(VARCHAR(10),Number)
        FROM    #Number;
INSERT  INTO #Employee
        (EmployeeId
        ,FirstName
        ,LastName
        )
        SELECT TOP (@NumberOfEmployees)
                Number
               ,LEFT(NEWID(),20)
               ,LEFT(NEWID(),20)
        FROM    #Number;
INSERT  INTO #CourseCompleted
        (EmployeeId
        ,CourseId
        ,DateComplete
        )
        SELECT DISTINCT
                ABS(CHECKSUM(NEWID())) % @NumberOfEmployees + 1 AS EmployeeId
               ,ABS(CHECKSUM(NEWID())) % @NumberOfCourse + 1 AS CourseId
               ,CURRENT_TIMESTAMP
        FROM    #Course
                CROSS JOIN #Employee;

Solution

There are a couple of possible solutions. We will look at outer joinsand cross joins. 

Outer Join

We will start by looking at what courses a single employee has not completed. If we pass in an EmployeeId, a list of courses that have not been completed comes up. This query will use an outer join and a where clause where the EmployeeId is null.

/*List of courses not completed by one Employee*/DECLARE @EmployeeId INT = 1;
SELECT Course.CourseId
      , Course.CourseName
    FROM #Course AS Course
        LEFT OUTER JOIN #CourseCompleted AS CourseCompleted
        ON CourseCompleted.CourseId = Course.CourseId
           AND CourseCompleted.EmployeeId = @EmployeeId
    WHERE CourseCompleted.EmployeeId IS NULL; 

The results show all the courses that employee 1 has not taken. 

Cross Join

A cross join with two tables without a where clause will produce a Cartesian product. If you have 20 records in one table and 30 records in the second table you will return 600 records. So you need to be very careful when using a cross join on large tables. Below is an example of using a cross join 

  SELECT  *
  FROM    #Employee
          CROSS JOIN #Course
  WHERE   #Employee.EmployeeId = 1

This returns a row for each course for employee 1. If there was no Where clause it would return a row for each employee for each class.

Now we will combine the outer join with the cross join. The first query will show all the employees and all of the courses regardless if the course has been completed. The courses that have been completed have a date for when they were completed. The second query only shows the courses that have not been completed. The only difference between the two queries is the where clause.

In these queries I am cross joining the course table to the employee table. This way I can get a list of every employee for every course. Then when I only want to see the courses that have not been completed I use an outer join on the course completed table and a where clause where the CourseId is null. This way I only get a list of the courses that have not been completed.

/*List all employees and all courses*/SELECT  Employee.EmployeeId
       ,Employee.FirstName
       ,Employee.LastName
       ,Course.CourseId
       ,Course.CourseName
       ,CourseCompleted.DateComplete
FROM    #Employee AS Employee
        CROSS JOIN #Course AS Course
        LEFT OUTER JOIN #CourseCompleted AS CourseCompleted 
         ON CourseCompleted.EmployeeId = Employee.EmployeeId
         AND CourseCompleted.CourseId = Course.CourseId
ORDER BY Employee.EmployeeId
       ,Course.CourseId;
/*List of employees and courses they have not completed*/SELECT  Employee.EmployeeId
       ,Employee.FirstName
       ,Employee.LastName
       ,Course.CourseId
       ,Course.CourseName
FROM    #Employee AS Employee
        CROSS JOIN #Course AS Course
        LEFT OUTER JOIN #CourseCompleted AS CourseCompleted 
         ON CourseCompleted.EmployeeId = Employee.EmployeeId
         AND CourseCompleted.CourseId = Course.CourseId
WHERE   CourseCompleted.CourseId IS NULL
ORDER BY Employee.EmployeeId

Make sure you understand the results you will return when using a cross join. If both tables are very large you might return more data that you want. 

By using an outer join and a simple where clause you can easily see what data is missing from the second table.

Rate

3.46 (37)

You rated this post out of 5. Change rating

Share

Share

Rate

3.46 (37)

You rated this post out of 5. Change rating