Hierarchical AND/OR Path Sets

  • I am THIS [---] close to figuring out the last step for an ETL project moving data from an old application into a new one. The current schema was dictated to me (but there is flexibility I hope) so feel free to fire away!

    The data represents university Program data for available Course combinations that meets the Requirement list for a specific degree. The three source tables with data are Program, CourseGroup, and Course. The two target tables are Requirement and RequirementCourse. For each Program there is a root CourseGroup record which in turn has 0:N Course and 0:N CourseGroup descendant records. Each CourseGroup descendant record also has its own 0:N Course and 0:N CourseGroup descendant records repeat.

    A CourseGroup with GroupType 'AND' means all descendants are required, and GroupType 'OR' means only one descendant is application for that level. The root CourseGroup record GroupType is always 'AND', meaning all direct descendant Course and CourseGroup records will determine the number of Requirement records to insert per Program. Each Requirement record is the starting point for the data that needs to be parsed and grouped. This is the part where I am stuck, mainly because the 'OR' logic dictates multiple paths or groups are needed, and so everything must be duplicated.

    *Attaching a plain text XML file for easier visual representation too.


    --Source Tables in DB
    DECLARE @Program TABLE (
                    [ProgramId] INT
                 ,[Description] NVARCHAR(100)
                )

    DECLARE @CourseGroup TABLE (
                    [CourseGroupId] INT
                 ,[ParentCourseGroupId] INT
                 ,[GroupType] NVARCHAR(3)
                 ,[ProgramId] INT
                )

    DECLARE @Course TABLE (
                    [CourseId] INT
                 ,[CourseNumber] NVARCHAR(8)
                 ,[CourseGroupId] INT
                )

    --Target Tables in DB
    DECLARE @Requirement TABLE (
                    [RequirementId] INT PRIMARY KEY IDENTITY
                 ,[ProgramId] INT
                 ,[Sequence] INT
                )

    DECLARE @RequirementCourse TABLE (
                    [RequirementCourseId] INT PRIMARY KEY IDENTITY
                 ,[RequirementId] INT
                 ,[Group] INT
                 ,[CourseNumber] NVARCHAR(8)
                )

    --Helper Tables in PROC
    DECLARE @CourseGroupHierarchy TABLE (
                    [CourseGroupId] INT
                 ,[ParentCourseGroupId] INT
                 ,[GroupType] NVARCHAR(3)
                 ,[TreeLevel] HIERARCHYID
                 ,[ProgramId] INT
                )

    DECLARE @RequirementMapping TABLE (
                    [RequirementId] INT
                 ,[ProgramId] INT
                 ,[Sequence] INT
                 ,[CourseNumber] NVARCHAR(8)
                 ,[TreeLevel] HIERARCHYID
                 ,[CourseGroupId] INT
                 ,[GroupType] NVARCHAR(3)
                 ,[ActionTaken] NVARCHAR(10)
                )

    --Source Data in DB
    INSERT INTO @Program (
                [ProgramId]
             ,[Description])
        VALUES (1,N'Learning SQL')
    ;

    INSERT INTO @CourseGroup (
                [CourseGroupId]
             ,[ParentCourseGroupId]
             ,[GroupType]
             ,[ProgramId])
        VALUES (1,NULL,N'AND',1)
             ,(2,1,N'OR',NULL)
             ,(3,1,N'OR',NULL)
             ,(4,1,N'OR',NULL)
             ,(5,1,N'OR',NULL)
             ,(6,1,N'OR',NULL)
             ,(7,1,N'OR',NULL)
             ,(8,2,N'AND',NULL)
             ,(9,6,N'AND',NULL)
             ,(10,7,N'AND',NULL)
             ,(11,7,N'AND',NULL)
             ,(12,7,N'AND',NULL)
             ,(13,8,N'AND',NULL)
             ,(14,8,N'AND',NULL)
             ,(15,14,N'AND',NULL)
             ,(16,14,N'AND',NULL)
    ;

    INSERT INTO @Course (
                [CourseId]
             ,[CourseNumber]
             ,[CourseGroupId])
        VALUES (1,N'ABC1001',1)
             ,(2,N'ABC2001',3)
             ,(3,N'ABC2011',3)
             ,(4,N'ABC3001',4)
             ,(5,N'ABC3011',4)
             ,(6,N'ABC4001',5)
             ,(7,N'ABC4011',5)
             ,(8,N'LMN1001C',6)
             ,(9,N'LMN1001',9)
             ,(10,N'LMN1001L',9)
             ,(11,N'LMN2001C',10)
             ,(12,N'LMN2001',11)
             ,(13,N'LMN2001L',11)
             ,(14,N'LMN3001',12)
             ,(15,N'LMN3001L',12)
             ,(16,N'XYZ1001C',13)
             ,(17,N'XYZ1002C',13)
             ,(18,N'XYZ1001',15)
             ,(19,N'XYZ1001L',15)
             ,(20,N'XYZ1002',16)
             ,(21,N'XYZ1002L',16)
    ;

    --Hierarchy Recursive CTE into helper table
    WITH CGCte AS
        (SELECT
                [CGAnchor].[CourseGroupId]
             ,[CGAnchor].[ParentCourseGroupId]
             ,[CGAnchor].[GroupType]
             ,'/' + CONVERT(NVARCHAR(MAX),ROW_NUMBER() OVER (ORDER BY [CGAnchor].[CourseGroupId])) + '/' [TreeLevel]
             ,[CGAnchor].[ProgramId]
            FROM @CourseGroup [CGAnchor]
            WHERE [CGAnchor].[ParentCourseGroupId] IS NULL
            UNION ALL
            SELECT
                [CGDesc].[CourseGroupId]
             ,[CGDesc].[ParentCourseGroupId]
             ,[CGDesc].[GroupType]
             ,[CGCte].[TreeLevel] + CONVERT(NVARCHAR(MAX),ROW_NUMBER() OVER (PARTITION BY [CGDesc].[ParentCourseGroupId] ORDER BY [CGDesc].[CourseGroupId])) + '/' [TreeLevel]
             ,[CGCte].[ProgramId]
            FROM @CourseGroup [CGDesc]
                INNER JOIN [CGCte] ON [CGDesc].[ParentCourseGroupId] = [CGCte].[CourseGroupId])
    INSERT INTO @CourseGroupHierarchy (
                [CourseGroupId]
             ,[ParentCourseGroupId]
             ,[GroupType]
             ,[TreeLevel]
             ,[ProgramId])
        SELECT
            [CourseGroupId]
         ,[ParentCourseGroupId]
         ,[GroupType]
         ,[TreeLevel]
         ,[ProgramId]
        FROM [CGCte]
    ;

    --Merge into Requirement target table
    /*NOTE:
        I know this can simply be an INSERT statement as it appears, but the
        MERGE functionality is necessary for deleting requirements later
        with future business logic.
    */
    MERGE @Requirement AS [Target]
    USING (
        SELECT
            [SourceQuery].[ProgramId]
         ,ROW_NUMBER() OVER (PARTITION BY [SourceQuery].[ProgramId] ORDER BY [SourceQuery].[TreeLevel].GetLevel(),[SourceQuery].[TreeLevel]) [Sequence]
         ,[SourceQuery].[CourseNumber]
         ,[SourceQuery].[TreeLevel]
         ,[SourceQuery].[CourseGroupId]
         ,[SourceQuery].[GroupType]
        FROM (
                --Course requirements
                SELECT
                    [CGH].[CourseGroupId]
                 ,[CGH].[GroupType]
                 ,[CGH].[TreeLevel]
                 ,[CGH].[ProgramId]
                 ,[C].[CourseId]
                 ,[C].[CourseNumber]
                FROM @CourseGroupHierarchy [CGH]
                    INNER JOIN @Course [C] ON [CGH].[CourseGroupId] = [C].[CourseGroupId]
                WHERE [CGH].[TreeLevel].GetLevel() = 1
                UNION ALL
                --CourseGroup requirements
                SELECT
                    [CGH].[CourseGroupId]
                 ,[CGH].[GroupType]
                 ,[CGH].[TreeLevel]
                 ,[CGH].[ProgramId]
                 ,NULL
                 ,NULL
                FROM @CourseGroupHierarchy [CGH]
                WHERE [CGH].[TreeLevel].GetLevel() = 2) [SourceQuery]) AS [Source] ([ProgramId],[Sequence],[CourseNumber],[TreeLevel],[CourseGroupId],[GroupType])
    ON ([Target].[ProgramId] = [Source].[ProgramId]
        AND [Target].[Sequence] = [Source].[Sequence])
    WHEN NOT MATCHED BY Target
        THEN INSERT (
                [ProgramId]
             ,[Sequence])
                VALUES ([Source].[ProgramId],[Source].[Sequence])
    OUTPUT [INSERTED].[RequirementId]
         ,[INSERTED].[ProgramId]
         ,[INSERTED].[Sequence]
         ,[Source].[CourseNumber]
         ,[Source].[TreeLevel]
         ,[Source].[CourseGroupId]
         ,[Source].[GroupType]
         ,$ACTION INTO @RequirementMapping
    ;

    --View results of where I am thus far.
    /*The target table with data*/
    SELECT
        [RequirementId]
     ,[ProgramId]
     ,[Sequence]
     ,[CourseNumber]
     ,[TreeLevel]
     ,[CourseGroupId]
     ,[GroupType]
     ,[ActionTaken]
    FROM @RequirementMapping

    /*The data that needs to be parsed for the second target table*/
    SELECT
        [CGH].[CourseGroupId]
     ,[CGH].[ParentCourseGroupId]
     ,[CGH].[GroupType]
     ,[CGH].[TreeLevel]
     ,[CGH].[TreeLevel].ToString()
     ,[C].[CourseId]
     ,[C].[CourseNumber]
     ,[P].[ProgramId]
     ,[P].[Description]
    FROM @Program [P]
        INNER JOIN @CourseGroupHierarchy [CGH] ON [P].[ProgramId] = [CGH].[ProgramId]
        LEFT OUTER JOIN @Course [C] ON [CGH].[CourseGroupId] = [C].[CourseGroupId]

    --View results of where I need to get still.
    SELECT
        [T].[RequirementCourseId]
     ,[T].[RequirementId]
     ,[T].[Group]
     ,[T].[CourseNumber]
    FROM (VALUES(1,1,1,'ABC1001')
             ,(2,2,1,'XYZ1001C')
             ,(3,2,1,'XYZ1002C')
             ,(4,2,2,'XYZ1001')
             ,(5,2,2,'XYZ1001L')
             ,(6,2,2,'XYZ1002')
             ,(7,2,2,'XYZ1002L')
             ,(8,3,1,'ABC2001')
             ,(9,3,2,'ABC2011')
             ,(10,4,1,'ABC3001')
             ,(11,4,2,'ABC3011')
             ,(12,5,1,'ABC4001')
             ,(13,5,2,'ABC4011')
             ,(14,6,1,'LMN1001C')
             ,(15,6,2,'LMN1001')
             ,(16,6,2,'LMN1001L')
             ,(17,7,1,'LMN2001C')
             ,(18,7,2,'LMN2001')
             ,(19,7,2,'LMN2001L')
             ,(20,7,3,'LMN3001')
             ,(21,7,3,'LMN3001L')
        ) [T] ([RequirementCourseId],[RequirementId],[Group],[CourseNumber])

    *EDIT. Old and New images added for desired data differences.

    Hope this helps clarify a little. The old application displayed all course groups with courses in a single treeview, and the new application will display each requirement in its on section with each course combination meeting the requirement criteria. 

    In the old treeview, the blue circled node would be turned into a requirement in the new system, and the new photo shows the breakdown.

    *EDIT. The very bottom SELECT of the current query displays the data the RequirementCourse table could contain, if I could figure out how to move the data into the data.

  • I think this would be a lot easier with several specific examples rather general hand-waving.  That being said, I think that if you stop thinking about AND/OR and instead think of counts it will help.  Specifically where you have AND, you need a count of all prerequisites and a person has to match that count; and where you have an OR hard-code the count to 1 and the person has to match or exceed that count.

    In pseudo-code that would be

    COUNT( <taken prerequisites> ) >= CASE WHEN 'AND' THEN COUNT( <all prerequisites> ) ELSE 1 END

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • What would be a better specific example? I ask because I changed the course numbers, but aside from that, this is literally the data I am working with right now haha. Well, this is one of a few hundred Programs.

    I like the concept of thinking about the prerequisites with the COUNT applied though, I have been trying to use ranking functions with no success past the first 2 levels. Also, there is no input currently for taken prerequisites. Thank you for the feedback.

  • RBA - Tuesday, August 21, 2018 3:36 PM

    What would be a better specific example? I ask because I changed the course numbers, but aside from that, this is literally the data I am working with right now haha. Well, this is one of a few hundred Programs.

    I like the concept of thinking about the prerequisites with the COUNT applied though, I have been trying to use ranking functions with no success past the first 2 levels. Also, there is no input currently for taken prerequisites. Thank you for the feedback.

    Your example makes sense to you, because you are familiar with the topic.  I have no idea which courses are required by which other courses.  I want something like
    Course C requires Course A and B.
    Course D requires Course A or B.
    Course E requires Course C or D.
    You know, in English rather than in computerese.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • drew.allen - Wednesday, August 22, 2018 8:53 AM

    Your example makes sense to you, because you are familiar with the topic.  I have no idea which courses are required by which other courses.  I want something like
    Course C requires Course A and B.
    Course D requires Course A or B.
    Course E requires Course C or D.

    OH! There are no hierarchy dependencies like that. Every course is able to be taken by itself without having taken anything else. So the AND/OR paths are the possible definitions of each requirement.

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply