Trying to create Nested SUM OVER Function for backwards running total.

  • RocksteadyDev

    SSC Veteran

    Points: 220

    Here is sample code from my running total, now I need to reverse it and add a where exists clause to subtract rather than add from the initial value (in this case enrolled students from day 1). I need to add some sort of where clause or update for endDates to show drop offs from the initial value. Is it even possible with sum over or do I need to write a cursor or use something else? Do I have options?

    table #enrollments


    uniqueID INT

    , startDate DATE

    , endDate DATE


    --My running total to get total enrolled.

    SELECT startDate, sum(count(DISTINCT uniqueID)) OVER (PARTITION BY startDate ORDER BY startDate) AS enrolled

    FROM dbo.enrollments

    GROUP BY startDate;

    --result set, date and active students....

    2001-01-10   150

    2001-01-11   150


    2001-03-09   149


    2001-04-10   148


  • pietlinden

    SSC Guru

    Points: 62452

    If you're trying to get a count of enrollments for each day, then you would need a Calendar table of some kind. Are you trying to count the number of empty "seats" in a course section or something like that? (Course has a max of 30 enrollees, and counts down until there are no free seats.)

    use tempdb;

    CREATE TABLE Enrollment (
    StudentID CHAR(10)
    ,EnrollDate DATE
    ,LeaveDate DATE );

    INSERT INTO Enrollment(StudentID, EnrollDate, leaveDate)
    VALUES ('0000000132','1/1/2019','1/31/2019'),

    CREATE TABLE #Dates(TheDate DATE);
    DECLARE @StartDate DATE,
    @EndDate DATE,
    @TheDate DATE;

    SELECT @TheDate = MIN(EnrollDate)
    , @EndDate = MAX(LeaveDate)
    FROM Enrollment;

    WHILE @TheDate <= @EndDate
    INSERT INTO #Dates(TheDate) VALUES (@TheDate);
    SET @TheDate = DATEADD(day,1,@TheDate);

    , COUNT(e.StudentID) AS EnrollCount
    , 30-COUNT(e.StudentID) AS SeatsLeft -- 30 is an arbitrary maximum for the countdown.
    FROM #Dates d LEFT JOIN Enrollment e
    ON (d.TheDate >= e.EnrollDate AND
    d.TheDate <= e.LeaveDate)
    GROUP BY d.TheDate
    ORDER BY d.TheDate;


    • This reply was modified 4 weeks, 1 day ago by  pietlinden.
  • RocksteadyDev

    SSC Veteran

    Points: 220

    This is pretty damn close. The numbers don't add up though because you are using an arbitrary total amount of seats, when it has to be the initial count of unique id's (students). When I arbitrarily put in the numbers to make the first set right, it destroys the rest. Also adding a day to the initial startDate throws it all off. I also put a CREATE INSERT attachment so you can take a look at the actual data, since it is just dummy data for my own learning process.

    SELECT  d.TheDate AS [Dates ]
    , 55 + COUNT(e.unique_id) AS Term_Enrollment --how many enrolled on the first day
    , COUNT(e.unique_id) AS Currently_Enrolled /*subtracted from term to show how many are still
    enrolled from day one, should be 133 like term*/
    FROM #Dates d LEFT JOIN #active_enrollments e
    ON (d.TheDate >= e.start_date AND
    d.TheDate <= e.end_date)
    GROUP BY d.TheDate
    ORDER BY d.TheDate;

    /*Technically I don't need to keep track of the first day term enrollment, but I do need to see it drop when a child drops out ( or reaches end_date).*/



    You must be logged in to view attached files.
  • drew.allen

    SSC Guru

    Points: 76580

    From your data, I think what you're really dealing with is a overlapping intervals problem.  You have overlapping intervals (student start and end dates) and you need a running count based on the intervals.

    Try this solution, which only gives dates where the running total changes.  If you need all dates, you can use a calendar table in conjunction with this solution.  If you do, note that the order gets a bit tricky.

    , SUM(de.enroll_status) AS daily_total
    , SUM(SUM(de.enroll_status)) OVER(ORDER BY de.enroll_status_dt ROWS UNBOUNDED PRECEDING) AS running_total
    FROM #enrollments AS e
    CROSS APPLY ( VALUES(1, e.start_date), (-1, COALESCE(e.end_date, '9999-12-31'))) de(enroll_status, enroll_status_dt)
    GROUP BY de.enroll_status_dt
    ORDER BY de.enroll_status_dt;

    Also, when working with open or half-open intervals, it is best to use arbitrary end points for the unknown values rather than using a NULL value.  I tend to use 1900-01-01 as an arbitrary start value and 9999-12-30 (or 9999-12-31 or 9999-01-01) as an arbitrary end value unless the data could realistically fall outside of that range.


    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

Viewing 4 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply