How to get the students that have taken all classes in a list.

  • The basic structure of the significant tables is:  Student ---(1,M)-- Takes ---(M,1)---Course
    How would I answer a question like this:
    Show me all the students who have taken all classes in this list (Course1, Course2, Course3, Course4). I can use DelimitedSplit8K to break the list into a table no problem... 

    The only way I've been able to come up with the answer is along the lines of 
    SELECT s.StudentID, c.CourseID
    FROM Student s CROSS JOIN Course c
    WHERE c.CourseID IN ('Course1', 'Course2', 'Course3')

    Then do I need to use Takes MINUS the above to find missing?
    Here's how I tried to do it... seems like a really ugly way!

    use tempdb;
    go

    CREATE TABLE Person(
        PersonID INT IDENTITY,
        FirstName VARCHAR(10),
        LastName VARCHAR(10)
    CONSTRAINT pkPerson PRIMARY KEY (PersonID));

    CREATE TABLE Resources(
        ResourceID CHAR(3) PRIMARY KEY
    );

    CREATE TABLE PR (
        PersonID INT,
        ResourceID CHAR(3)
    CONSTRAINT pkHas PRIMARY KEY (PersonID, ResourceID)
    CONSTRAINT fkPersonID FOREIGN KEY (PersonID) REFERENCES Person(PersonID));
    GO

    INSERT INTO Person(FirstName, LastName) VALUES ('Bill', 'Smith'),('John','Jones');
    INSERT INTO Resources (ResourceID) VALUES ('ABC'),('DEF'),('HIJ');
    INSERT INTO PR (PersonID, ResourceID) VALUES (1,'ABC'),(1,'DEF'),(2,'ABC'), (2,'HIJ'), (1,'HIJ');
    -- both Bill and John have (ABC, HIJ).

    DECLARE    @ResourceList VARCHAR(15) = 'ABC,DEF'

    -- show all people with all skills in list @ResourceList

    SELECT *
    FROM Person px
    WHERE px.PersonID NOT IN (  -- missing a "course"
                                SELECT PersonID
                                FROM (
                                        SELECT p.PersonID
                                            , split.Item
                                            , pr.ResourceID
                                        FROM Person p
                                            CROSS APPLY Scratchpad.dbo.DelimitedSplit8K(@ResourceList,',') split 
                                LEFT JOIN dbo.PR ON pr.PersonID = p.PersonID AND pr.ResourceID = split.Item
    ) x
    WHERE ResourceID IS NULL );

  • Here are a couple other query forms that can be used for this sort of problem.

    Which one performs best is, well, the usual "It depends..." 🙂

    SELECT p.PersonID, p.FirstName, p.LastName
    FROM Person P
    CROSS APPLY
    DelimitedSplit8K(@ResourceList,',') split
    LEFT JOIN
    dbo.PR ON PR.PersonID=P.PersonID AND pr.ResourceID=split.Item
    GROUP BY p.PersonID, p.FirstName, p.LastName
    HAVING COUNT(pr.ResourceID)=COUNT(split.Item)
    OPTION (RECOMPILE);

    SELECT p.*
    FROM Person P
    WHERE NOT EXISTS (SELECT P.PersonID, Resource_ID=Item FROM DelimitedSplit8K(@ResourceList,',') split
      EXCEPT
      SELECT P.PersonID, PR.ResourceID FROM PR WHERE PR.PersonID=P.PersonID);

    Cheers!

  • Oh, cool!  I knew there had to be a less messy way of doing it! Thanks!

  • Here's an ITVF that does the job.  It uses the number of classes in the list that the person has taken and compares it to the number of classes in the list.  Where there's a match, the person has taken all of them in the list.  Yes, I tried it with a GROUP BY in cteCoursesTaken, but it consistently performed slower because of a double-dip on dbo.PR.

    IF OBJECT_ID('dbo.StudentsTookAllClasses', 'if') IS NOT NULL DROP FUNCTION dbo.StudentsTookAllClasses;
    go

    CREATE FUNCTION dbo.StudentsTookAllClasses(@CourseList Varchar(200)) RETURNS TABLE
    AS

    RETURN (
      WITH cteCourses AS (
        SELECT s.Item
          FROM util.dbo.DelimitedSplit8K(@CourseList, ',') s
      ),
      cteCoursesTaken AS (
      SELECT PersonID, ResourceID, RN = ROW_NUMBER() OVER(PARTITION BY PersonID ORDER BY (SELECT NULL))
       FROM dbo.PR
          CROSS APPLY cteCourses c
       WHERE dbo.PR.ResourceID = c.Item
      )
      SELECT PersonID
        FROM cteCoursesTaken ct
        GROUP BY PersonID
        HAVING MAX(RN) = (SELECT COUNT(*) FROM cteCourses)
    );

    Using it is pretty simple:

    SELECT *
      FROM dbo.StudentsTookAllClasses('ABC,DEF,HIJ')
      ORDER BY PersonID;

    SELECT *
      FROM dbo.StudentsTookAllClasses('ABC,HIJ')
      ORDER BY PersonID;

    Will this work for you?

  • Jacob Wilkins - Friday, July 14, 2017 8:39 PM

    Here are a couple other query forms that can be used for this sort of problem.

    Which one performs best is, well, the usual "It depends..." 🙂

    SELECT p.PersonID, p.FirstName, p.LastName
    FROM Person P
    CROSS APPLY
    DelimitedSplit8K(@ResourceList,',') split
    LEFT JOIN
    dbo.PR ON PR.PersonID=P.PersonID AND pr.ResourceID=split.Item
    GROUP BY p.PersonID, p.FirstName, p.LastName
    HAVING COUNT(pr.ResourceID)=COUNT(split.Item)
    OPTION (RECOMPILE);

    SELECT p.*
    FROM Person P
    WHERE NOT EXISTS (SELECT P.PersonID, Resource_ID=Item FROM DelimitedSplit8K(@ResourceList,',') split
      EXCEPT
      SELECT P.PersonID, PR.ResourceID FROM PR WHERE PR.PersonID=P.PersonID);

    Cheers!

    I'd be careful joining to DS8K - there's usually a performance price to pay for it.

  • Ed Wagner - Friday, July 14, 2017 10:03 PM

    Jacob Wilkins - Friday, July 14, 2017 8:39 PM

    Here are a couple other query forms that can be used for this sort of problem.

    Which one performs best is, well, the usual "It depends..." 🙂

    SELECT p.PersonID, p.FirstName, p.LastName
    FROM Person P
    CROSS APPLY
    DelimitedSplit8K(@ResourceList,',') split
    LEFT JOIN
    dbo.PR ON PR.PersonID=P.PersonID AND pr.ResourceID=split.Item
    GROUP BY p.PersonID, p.FirstName, p.LastName
    HAVING COUNT(pr.ResourceID)=COUNT(split.Item)
    OPTION (RECOMPILE);

    SELECT p.*
    FROM Person P
    WHERE NOT EXISTS (SELECT P.PersonID, Resource_ID=Item FROM DelimitedSplit8K(@ResourceList,',') split
      EXCEPT
      SELECT P.PersonID, PR.ResourceID FROM PR WHERE PR.PersonID=P.PersonID);

    Cheers!

    I'd be careful joining to DS8K - there's usually a performance price to pay for it.

    I guess I can't disagree with "Be careful"...there are always such considerations as "be careful", and there are plenty of reasons I didn't say "Use all of these query forms indiscriminately!" 🙂

    Personally, I'd be careful no matter what I'm doing, but that's just me 😛

    Cheers!

  • Jacob Wilkins - Friday, July 14, 2017 10:35 PM

    Ed Wagner - Friday, July 14, 2017 10:03 PM

    Jacob Wilkins - Friday, July 14, 2017 8:39 PM

    Here are a couple other query forms that can be used for this sort of problem.

    Which one performs best is, well, the usual "It depends..." 🙂

    SELECT p.PersonID, p.FirstName, p.LastName
    FROM Person P
    CROSS APPLY
    DelimitedSplit8K(@ResourceList,',') split
    LEFT JOIN
    dbo.PR ON PR.PersonID=P.PersonID AND pr.ResourceID=split.Item
    GROUP BY p.PersonID, p.FirstName, p.LastName
    HAVING COUNT(pr.ResourceID)=COUNT(split.Item)
    OPTION (RECOMPILE);

    SELECT p.*
    FROM Person P
    WHERE NOT EXISTS (SELECT P.PersonID, Resource_ID=Item FROM DelimitedSplit8K(@ResourceList,',') split
      EXCEPT
      SELECT P.PersonID, PR.ResourceID FROM PR WHERE PR.PersonID=P.PersonID);

    Cheers!

    I'd be careful joining to DS8K - there's usually a performance price to pay for it.

    I guess I can't disagree with "Be careful"...there are always such considerations as "be careful", and there are plenty of reasons I didn't say "Use all of these query forms indiscriminately!" 🙂

    Personally, I'd be careful no matter what I'm doing, but that's just me 😛

    Cheers!

    I'm saying the different query forms have a drastic difference the number of reads and in performance.

    I know you know this, Pietlinden, but be sure to test each approach over a high number of rows.  Run it up to 1M and really test them.

  • Ed Wagner - Saturday, July 15, 2017 8:36 AM

    Jacob Wilkins - Friday, July 14, 2017 10:35 PM

    Ed Wagner - Friday, July 14, 2017 10:03 PM

    Jacob Wilkins - Friday, July 14, 2017 8:39 PM

    Here are a couple other query forms that can be used for this sort of problem.

    Which one performs best is, well, the usual "It depends..." 🙂

    SELECT p.PersonID, p.FirstName, p.LastName
    FROM Person P
    CROSS APPLY
    DelimitedSplit8K(@ResourceList,',') split
    LEFT JOIN
    dbo.PR ON PR.PersonID=P.PersonID AND pr.ResourceID=split.Item
    GROUP BY p.PersonID, p.FirstName, p.LastName
    HAVING COUNT(pr.ResourceID)=COUNT(split.Item)
    OPTION (RECOMPILE);

    SELECT p.*
    FROM Person P
    WHERE NOT EXISTS (SELECT P.PersonID, Resource_ID=Item FROM DelimitedSplit8K(@ResourceList,',') split
      EXCEPT
      SELECT P.PersonID, PR.ResourceID FROM PR WHERE PR.PersonID=P.PersonID);

    Cheers!

    I'd be careful joining to DS8K - there's usually a performance price to pay for it.

    I guess I can't disagree with "Be careful"...there are always such considerations as "be careful", and there are plenty of reasons I didn't say "Use all of these query forms indiscriminately!" 🙂

    Personally, I'd be careful no matter what I'm doing, but that's just me 😛

    Cheers!

    I'm saying the different query forms have a drastic difference the number of reads and in performance.

    I know you know this, Pietlinden, but be sure to test each approach over a high number of rows.  Run it up to 1M and really test them.

    Of course, and yours performs very nicely in many cases, so it's a welcome addition to the list of possible approaches.

    Now if only I had said that the ones I posted were just a couple other forms and that what particular query form would perform best would depend...oh wait 😛

    Cheers!

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply