Pivot Data based on the date range

  • Sandeeppunreddy2

    SSC Veteran

    Points: 203

    Hi All,

    I have a dataset which has information about students and their registered courses. following is a temp table with a sample structure and data.

    DROP TABLE IF EXISTS #StudentData

    CREATE TABLE #StudentData (StudentID INT,CourseType VARCHAR(50),CourseID INT,SubscriptionStartDate DATE, SubscriptioneEndDate DATE)

    INSERT INTO #StudentData VALUES (1,'Basic',50,'10-15-2016','10-15-2017')

    INSERT INTO #StudentData VALUES (1,'Advanced',72,'10-15-2016','2018-12-31')

    INSERT INTO #StudentData VALUES (1,'Basic',50,'11-13-2017','12-31-2018')

    INSERT INTO #StudentData VALUES (1,'Intermediate',66,'01-10-2017','07-30-2017')

    INSERT INTO #StudentData VALUES (1,'Intermediate',68,'08-01-2017','12-30-2017')

    INSERT INTO #StudentData VALUES (1,'Advanced',85,'01-10-2019',NULL)

    INSERT INTO #StudentData VALUES (1,'Basic',57,'10-15-2020','10-15-2021')

    SELECT * FROM #StudentData

    ORDER BY CourseType,SubscriptionStartDate

    I am trying to create a query that can group the data for different bands(based on the start and end dates), folloiwng is the desired output for the sample provided above.

    DROP TABLE IF EXISTS #DesiredOutput

    CREATE TABLE #DesiredOutput (StudentID INT,CourseType VARCHAR(50),[16-17] VARCHAR(50),[17-18] VARCHAR(50),[18-19] VARCHAR(50),[19-20] VARCHAR(50),[20-21] VARCHAR(50))

    INSERT INTO #DesiredOutput VALUES (1,'Basic',50,50,null,null,57)

    INSERT INTO #DesiredOutput VALUES (1,'Intermediate',null,'66,68',null,null,null)

    INSERT INTO #DesiredOutput VALUES (1,'Advanced',72,72,null,85,85)

    SELECT * FROM #DesiredOutput


    I tried Pivot and was trying to use case whens but had no luck and hoping someone can help me. any help is highly appreciated.

    Thanks in advance.

     

  • Jeff Moden

    SSC Guru

    Points: 996046

    It would help a whole lot if you explained the basis of the pivot.  For example, what does the column labels of [xx-yy] equate to in the original data?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
    "If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
    "Change is inevitable... change for the better is not."

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • BTylerWhite

    Right there with Babe

    Points: 770

    This is quite a tricky task. It also looks like from your provided desired output you want some string aggregation with the CourseIDs. I'm not really sure if a PIVOT is the best case for this, since we aren't really trying to perform any aggregate functions against a true aggregation column.

    Also, some of the Start and EndDates don't match up quite like the desired output. For example, I see some 2017-2017 ranges.

    This might help you get started:

    SELECT SD.StudentID,
    SD.CourseType,
    CONCAT(FORMAT(SD.SubscriptionStartDate, N'yy'), '-', FORMAT(SD.SubscriptioneEndDate, N'yy')) AS SubscriptionYears,
    STRING_AGG(SD.CourseID, ',') WITHIN GROUP(ORDER BY SD.CourseType) AS CourseIds
    FROM #StudentData AS SD
    GROUP BY SD.StudentID,
    SD.CourseType,
    CONCAT(FORMAT(SD.SubscriptionStartDate, N'yy'), '-', FORMAT(SD.SubscriptioneEndDate, N'yy'));

    I used FORMAT for readability, but you may want to consider another way to extract these years for better performance for your columns.

  • Sandeeppunreddy2

    SSC Veteran

    Points: 203

    Hi Jeff,

    The years are basically enrollment years like the medical enrollment years provided by employers to their employees which can start from any month of the year but usually are 12 months in length (but that is not standard as there can be programs that can start and end within the same year) so those are standard. (like [2016-2017],[2017-2018], etc.. basically each year.)

    as an example, if a student enrolled for a course that starts on 01/01/2016 and the course is valid for 30 months with an end date of 06/30/2018.there will be one row in the source, in the final output for that student the course should appear in [16-17], [17-18] and [18-19] bands or in other words, the course was valid for at least 1 day in the years 2016,2017 and 2018. and as we do not have a bucket that shows only [18] we display it in the band [18-19] along with the other 2 bands. I think it will be simpler if I change those bands to be just years like [2016], [2017], [2018],[2019],[2020] and if a course is valid for at least 1 day in that year then the course ID should appear in the appropriate column.

    It is possible that a student registers for different courses within each course type in the same year and that is the 'Intermediate' course type in the sample data. the only rule is that with in the same course type, a student can not be registered in more than one course at the same time.

    I appreciate the help.

  • Sandeeppunreddy2

    SSC Veteran

    Points: 203

    Hi Tyler,

    Agreed, it is difficult to get this done using the Pivot. spent a lot of time trying to get that work.

    and Yes, it is possible for a student to register in 2 different courses with in the same course type in the same year. but the rule is that the 2 courses should not overlap so the 1st course will end before the second course is started so the SubscriptionEndDate for the fist course will always be before the Subscription Start date of the second course (again, with in the same Course Type. 'Intermediate' in the sample I provided originally as an example). so, in this case, the band [2017-2018] should display both the courses as they have a start date in 2017. notice the comma-separated courses 66,68 in the band [17-18]  in the desired output resultset.

  • scdecade

    Say Hey Kid

    Points: 709

    This is very tricky.  I'm this far and my @$%@$ pivot isn't work for some reason and I can't figure out why.  If (or when) it works my plan is/was to make it dynamic and exec sp_executesql

    set ansi_warnings off;

    declare
    @bands table(start_yr int,
    end_yr int,
    band char(7));
    declare
    @cols varchar(max);

    declare
    @sql nvarchar(max);

    with
    min_max_yr_cte(min_yr, max_yr) as (
    select min(year(SubscriptionStartDate)), max(year(SubscriptioneEndDate)) from #StudentData)
    insert @bands
    select
    (mmy.min_yr+t.n-1), (mmy.min_yr+t.n),
    concat('[',right(convert(varchar, (mmy.min_yr+t.n-1)), 2),'-', right(convert(varchar, (mmy.min_yr+t.n)), 2), ']')
    from
    min_max_yr_cte mmy
    cross apply
    dbo.tally(mmy.max_yr-mmy.min_yr) t;

    select
    @cols=string_agg(band, ', ')
    from
    @bands;

    drop table if exists #rows_to_pivot;
    create table #rows_to_pivot(
    band char(7),
    StudentID INT,
    CourseType VARCHAR(50),
    CourseID INT);

    insert #rows_to_pivot
    select
    b.band,
    sd.StudentID,
    sd.CourseType,
    sd.CourseID
    from
    @bands b
    join
    #StudentData sd on b.start_yr>=year(sd.SubscriptionStartDate)
    and b.end_yr<=isnull(year(sd.SubscriptioneEndDate),b.end_yr)
    union all
    select
    b.band,
    sd.StudentID,
    sd.CourseType,
    sd.CourseID
    from
    @bands b
    join
    #StudentData sd on b.start_yr=year(sd.SubscriptionStartDate)
    and b.start_yr=isnull(year(sd.SubscriptioneEndDate),b.end_yr)
    where
    year(sd.SubscriptionStartDate)=isnull(year(sd.SubscriptioneEndDate),b.end_yr);

    select * from #rows_to_pivot order by 4,1;

    select StudentID, CourseType, [16-17], [17-18], [18-19], [19-20], [20-21]
    from
    (select StudentID, CourseType, CourseID, band from #rows_to_pivot) rtp
    pivot
    (max(rtp.CourseID) for rtp.band in([16-17], [17-18], [18-19], [19-20], [20-21])) pvt

    • This reply was modified 1 week, 2 days ago by  scdecade.

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • scdecade

    Say Hey Kid

    Points: 709

    Because the start date, end date combinations aren't uniformly spanning 2 years the first part of the script creates the year bands using a tally table.  Then the StudentData is bucketed into the year bands into the temp table #rows_to_pivot.  Then for some reason the pivot (which to my eye should work fine) isn't working.  Right now the aggregate function is max but it should switched to string_agg().

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • Sandeeppunreddy2

    SSC Veteran

    Points: 203

    @scdecade - Thanks for the attempt. I will look into the approach.

    I have finally managed to complete the query.

    1. changed the bands to use simple years assuming 2016 is [16-17], 2017 as [17-18], etc.. got this idea when I was trying to explain the year band to Jeff in this same post.
    2. I converted the student data to be more granular from the original source by using CTEs to create one row per StudentID, CourseType, CourseID and Year the course is active for. as following

    ; WITH CTE AS

    (

    SELECT studentID,coursetype,courseid, datepart(year, SubscriptionStartDate) as yr,SubscriptioneEndDate,SubscriptionStartDate

    FROM #StudentData

    UNION ALL

    SELECT studentID,coursetype,courseid, yr + 1,SubscriptioneEndDate,SubscriptionStartDate

    FROM CTE

    WHERE yr < datepart(year, isnull(SubscriptioneEndDate,getdate()))

    )

    SELECT * FROM (

    SELECT studentID,coursetype,yr,STRING_AGG(courseid,',') AS Courses

    FROM (

    SELECT DISTINCT studentID,coursetype,yr,courseid

    FROM (

    select studentID,coursetype,courseid,yr

    from CTE

    ) as a

    ) AS B

    GROUP BY studentID,coursetype,yr

    ) AS C

    PIVOT

    (

    MIN(Courses)

    for yr in ([2016],[2017],[2018],[2019],[2020],[2021])

    ) as pvt


    After the granularity change and the tip from SCdecade to use the STRING_AGG() to aggregate , I used Pivot and it worked great!

    Thanks for all the help guys.

  • Jeff Moden

    SSC Guru

    Points: 996046

    Consider that when we run your code, it produces 68 reads (pages), which equates to 557,056 bytes of memory I/O just to handle 7 very narrow rows of data for one student.  Here's the output from statistics I/O...

    Table 'Worktable'. Scan count 2, logical reads 67, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#StudentData________________________________________________________________________________________________________00000000001B'.

    Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Using the fnTally function (see my signature line below for a like-named link) and some ancient "Black Arts" T-SQL known as a CROSSTAB, we can get it down 1 read.  It also gets rid of the NULLs and sorts the output in a more naturally expected order.

    WITH
    cteDates AS
    (--==== Create all the year numbers for each date range.
    SELECT DISTINCT
    StudentID, CourseType, CourseID
    ,SubYear = DATENAME(yy,DATEADD(yy,t.N,SubscriptionStartDate))
    FROM #StudentData sd
    CROSS APPLY dbo.fnTally(0,DATEDIFF(yy,SubscriptionStartDate,ISNULL(SubscriptioneEndDate,GETDATE()))) t
    )
    ,ctePreAgg AS
    (--==== Aggregate multiple courses per year for each student and course type
    SELECT StudentID, CourseType, SubYear
    ,CourseIDs = STRING_AGG(CourseID,',') WITHIN GROUP (ORDER BY CourseID)
    FROM cteDates
    GROUP BY StudentID, CourseType, SubYear
    )--==== This ancient art form is known as a CROSSTAB
    SELECT StudentID
    ,CourseType
    ,[2016] = MAX(CASE WHEN SubYear = 2016 THEN CourseIDs ELSE '' END)
    ,[2017] = MAX(CASE WHEN SubYear = 2017 THEN CourseIDs ELSE '' END)
    ,[2018] = MAX(CASE WHEN SubYear = 2018 THEN CourseIDs ELSE '' END)
    ,[2019] = MAX(CASE WHEN SubYear = 2019 THEN CourseIDs ELSE '' END)
    ,[2020] = MAX(CASE WHEN SubYear = 2020 THEN CourseIDs ELSE '' END)
    ,[2021] = MAX(CASE WHEN SubYear = 2021 THEN CourseIDs ELSE '' END)
    FROM ctePreAgg
    GROUP BY StudentID,CourseType
    ORDER BY StudentID, CASE CourseType WHEN 'Basic' THEN 1 WHEN 'Intermediate' THEN 2 ELSE 3 END
    ;

    Here's the output from statistics I/O...

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#StudentData________________________________________________________________________________________________________00000000001B'.

    Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Personally, I try to avoid incremental rCTEs and PIVOTs at all costs.  Please see the following articles for why (performance, of course!).

    https://www.sqlservercentral.com/articles/hidden-rbar-counting-with-recursive-ctes

    https://www.sqlservercentral.com/articles/cross-tabs-and-pivots-part-1-%e2%80%93-converting-rows-to-columns-1

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
    "If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
    "Change is inevitable... change for the better is not."

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 9 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply