• @deroby,

    Don't feel bad. In fact I think it's very important to explain to newbies that while this is an example of where LEFT OUTER JOIN can be used to accomplish the task at hand and it may be easy to understand it is generally not a good practice to rely on it since it is notoriously inefficient and should be avoided especially on large data sets. In todays world data sets are becoming very large very fast and it is important to be conscious of inefficient queries from the very beginning. Since data generally grows over time a query that works just fine when written often becomes a performance killer months or even weeks later.

    Another (more efficient) way to solve this problem without using a LEFT OUTER JOIN or even WHERE NOT EXISTS (which btw is even less efficient) is to use the power of indexes. This method while similar to the NOT EXIST clause explicitly collects the cartesian set of indexes of employees and courses EXCEPT those where courses have been completed joining on the #CourseCompleted table. Since we are only collecting the indexes and not the related data we avoid returning unnecessary data and the looping inherent in the NOT EXIST clause.

    To illustrate my point I simply used the original data creation script to increase the record count to 300 courses and 50000 employees.

    DECLARE @NumberOfCourse INT = 300;

    DECLARE @NumberOfEmployees INT = 50000;

    Using the Left Outer Join query:

    SELECT Employee.EmployeeId

    ,Employee.FirstName

    ,Employee.LastName

    ,Course.CourseId

    ,Course.CourseName

    FROM #Employee AS Employee

    CROSS JOIN #Course AS Course

    LEFT OUTER JOIN #CourseCompleted AS CourseCompleted

    ON CourseCompleted.EmployeeId = Employee.EmployeeId

    AND CourseCompleted.CourseId = Course.CourseId

    WHERE CourseCompleted.CourseId IS NULL

    ORDER BY Employee.EmployeeId

    -- Returns 5516909 Records in 01:07

    -- --------------------------------------------------------------

    -- Using the NOT EXIST QUERY

    SELECT Employee.EmployeeId

    ,Employee.FirstName

    ,Employee.LastName

    ,Course.CourseId

    ,Course.CourseName

    FROM #Employee AS Employee

    CROSS JOIN #Course AS Course

    WHERE NOT EXISTS ( SELECT *

    FROM #CourseCompleted AS CourseCompleted

    WHERE CourseCompleted.EmployeeId = Employee.EmployeeId

    AND CourseCompleted.CourseId = Course.CourseId )

    ORDER BY Employee.EmployeeId

    -- Returns 5516909 Records in 03:33

    -- --------------------------------------------------------------

    While the EXCEPT clause is a great way to simplify things it can be a real killer when returning a lot of data since the EXCEPT data set has to match the top SELECT clause and the query engine is matching everything as if it were a comjpound key to see what matches exactly. When I ran the EXCEPT query posted by Bill Tadada above I had to kill it after 7 minutes. But using it on just the indexes improves the query substantially.

    -- --------------------------------------------------------------

    Using the following script:

    IF OBJECT_ID('tempdb..#NotComplete','U') IS NOT NULL

    DROP TABLE #NotComplete;

    -- get all of the keys of a cartesian set into a temp table {15000000)

    SELECT Employee.employeeid, Course.courseId into #NotComplete

    from #Employee AS Employee

    CROSS JOIN #Course AS Course

    EXCEPT

    SELECT CourseCompleted.EmployeeId, CourseCompleted.CourseId

    FROM #CourseCompleted AS CourseCompleted

    -- the remaining keys can be used to get the data for the report (5516909 )

    SELECT Employee.EmployeeId

    ,Employee.FirstName

    ,Employee.LastName

    ,Course.CourseId

    ,Course.CourseName

    FROM #NotComplete AS NotComplete

    INNER JOIN #Employee AS Employee ON Employee.EmployeeId = NotComplete.EmployeeId

    INNER JOIN #Course AS Course ON Course.CourseId = NotComplete.CourseId

    -- Returns 5516909 Records 00:36 (-00:31, a 50% improvement)

    -- --------------------------------------------------------------

    ...and to your point deroby I did use the FREEPROCCACHE and DROPCLEANBUFFERS statements prior to execution of each. I'm using SS2014 on a laptop w/16GB ram and an SSD drive. As you say these are only indicators at best but it is consistent with my experience. I'm not sure why your results should be so different .:-)

    Ron Cicotte
    ron.cicotte@gmail.com
    Data Transformations, LLC