Recursive CTE performance improvement

  • I have a table valued function to count student truant days in any passed 30 days that met threshold of 7 days and return the earliest passed threshold date.

    I have to use a recursive CTE in the table valued function to count these. The truant table will have records for each student , each absent date.

    The table valued function is run every day, because the TruantDay table is updated every day with new attendance data in case student is excused from absent.

    Currently the query runs fine, it takes about 20 seconds. But I am a little worried it will increase as the TruantDay table increased. If anything to improve for the speed?

    WITH FirstLastAbsentDates AS

    (

    SELECT a.StudentId

    ,MIN(a.Att_Date) AS FirstDay

    ,MAX(a.Att_Date) AS EndDay

    FROM TruantDay a

    WHERE a.IsAbsent = 1

    AND a.IsCleared = 0

    GROUP BY a.StudentId

    HAVING MAX(a.Att_Date) > MIN(a.Att_Date)

    )

    , CTEDateRanges (StudentId, EndDay) AS

    (

    SELECT a.StudentId

    ,a.EndDay

    FROM FirstLastAbsentDates a

    UNION ALL

    SELECT a.StudentId

    ,DATEADD(d, -1, c.EndDay) AS EndDay

    FROM FirstLastAbsentDates a

    JOIN CTEDateRanges c

    ON c.StudentId = a.StudentId

    WHERE DATEADD(d, -1, c.EndDay) > a.FirstDay

    )

    , MTDOver6Days AS

    (

    SELECT a.StudentId

    , a.EndDay AS ThresholdDate

    , COUNT(*) AS AbsenceCount

    FROM CTEDateRanges a

    JOIN TruantDay b

    ON b.StudentId = a.StudentId

    WHERE b.IsAbsent = 1

    AND b.IsCleared = 0

    AND b.Att_Date BETWEEN DATEADD(d, -30, a.EndDay)

    AND a.EndDay

    GROUP BY a.StudentId, a.EndDay

    HAVING COUNT(*) > 6

    )

    SELECT a.StudentId

    ,a.ThresholdDate

    ,a.AbsenceCount

    FROM MTDOver6Days a

    WHERE a.ThresholdDate = (SELECT MIN(b.ThresholdDate)

    FROM MTDOver6Days b

    WHERE b.StudentId = a.StudentId

  • sqlfriends (10/27/2016)


    I have a table valued function to count student truant days in any passed 30 days that met threshold of 7 days and return the earliest passed threshold date.

    I have to use a recursive CTE in the table valued function to count these. The truant table will have records for each student , each absent date.

    The table valued function is run every day, because the TruantDay table is updated every day with new attendance data in case student is excused from absent.

    Currently the query runs fine, it takes about 20 seconds. But I am a little worried it will increase as the TruantDay table increased. If anything to improve for the speed?

    WITH FirstLastAbsentDates AS

    (

    SELECT a.StudentId

    ,MIN(a.Att_Date) AS FirstDay

    ,MAX(a.Att_Date) AS EndDay

    FROM TruantDay a

    WHERE a.IsAbsent = 1

    AND a.IsCleared = 0

    GROUP BY a.StudentId

    HAVING MAX(a.Att_Date) > MIN(a.Att_Date)

    )

    , CTEDateRanges (StudentId, EndDay) AS

    (

    SELECT a.StudentId

    ,a.EndDay

    FROM FirstLastAbsentDates a

    UNION ALL

    SELECT a.StudentId

    ,DATEADD(d, -1, c.EndDay) AS EndDay

    FROM FirstLastAbsentDates a

    JOIN CTEDateRanges c

    ON c.StudentId = a.StudentId

    WHERE DATEADD(d, -1, c.EndDay) > a.FirstDay

    )

    , MTDOver6Days AS

    (

    SELECT a.StudentId

    , a.EndDay AS ThresholdDate

    , COUNT(*) AS AbsenceCount

    FROM CTEDateRanges a

    JOIN TruantDay b

    ON b.StudentId = a.StudentId

    WHERE b.IsAbsent = 1

    AND b.IsCleared = 0

    AND b.Att_Date BETWEEN DATEADD(d, -30, a.EndDay)

    AND a.EndDay

    GROUP BY a.StudentId, a.EndDay

    HAVING COUNT(*) > 6

    )

    SELECT a.StudentId

    ,a.ThresholdDate

    ,a.AbsenceCount

    FROM MTDOver6Days a

    WHERE a.ThresholdDate = (SELECT MIN(b.ThresholdDate)

    FROM MTDOver6Days b

    WHERE b.StudentId = a.StudentId

    It seems to me that you don't need a recursive CTE. However, I´m not sure if the sample data that I created is correct.

    CREATE TABLE TruantDay(

    StudentId int,

    Att_Date date,

    IsAbsent bit,

    IsCleared bit

    );

    INSERT INTO TruantDay

    VALUES

    ( 1, '20161001', 1, 0),

    ( 1, '20161011', 1, 0),

    ( 1, '20161012', 1, 0),

    ( 1, '20161015', 1, 0),

    ( 1, '20161021', 1, 0),

    ( 1, '20161022', 1, 0),

    ( 1, '20161025', 1, 0)

    INSERT INTO TruantDay

    VALUES

    ( 2, '20151001', 1, 0),

    ( 2, '20151011', 1, 0),

    ( 2, '20151012', 1, 0),

    ( 2, '20151015', 1, 0),

    ( 2, '20151021', 1, 0),

    ( 2, '20151222', 1, 0),

    ( 2, '20151225', 1, 0);

    SELECT StudentId, MAX( Att_Date), COUNT(*) Absences

    FROM TruantDay t

    WHERE IsAbsent = 1

    AND IsCleared = 0

    AND Att_Date > (SELECT DATEADD(d, -30, MAX( i.Att_Date))

    FROM TruantDay i

    WHERE t.StudentId = i.StudentId)

    GROUP BY StudentId

    HAVING COUNT(*) > 6;

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Thank you, I run on my table, the result seems not correct. Your table looks good but for each student there is only one met threshold, there should be any 30 days counting back from today.

    I would like to return the first date that met the threshold 7 in every passed 30 days, since the beginning of school year to current date. or since the date the student start to have absent date to current date.

    This process is run every day.

  • Does your TruantDay table only contain absences or does it contain all attendance information? If it's all information, there is a much faster approach.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • It has records like each row for each student, each absent date.

  • I think that this approach may still be faster, especially with the right indexes in place. I created a Student table, a SchoolDay table, and the TruantDay. I created indexes on each of the tables. The main advantage is that it only reads each of the tables once, whereas the subquery has to read the TruantDay table multiple times (15 for this small set of data).

    DROP TABLE #Students, #SchoolCalendar, #TruantDay

    CREATE TABLE #Students (

    StudentID INT NOT NULL

    )

    INSERT #Students(StudentID)

    VALUES(1), (2)

    CREATE TABLE #SchoolCalendar(

    SchoolDay DATE PRIMARY KEY

    )

    ;

    WITH E(n) AS(

    SELECT n FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) E(n)

    )

    , cteTally(n) AS(

    SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) n

    FROM E a, E b, E c

    )

    INSERT #SchoolCalendar(SchoolDay)

    SELECT DATEADD(DAY, n-1, '20151001')

    FROM cteTally

    CREATE TABLE #TruantDay(

    StudentId int,

    Att_Date date,

    IsAbsent bit,

    IsCleared bit

    );

    INSERT INTO #TruantDay

    VALUES

    ( 1, '20161001', 1, 0),

    ( 1, '20161011', 1, 0),

    ( 1, '20161012', 1, 0),

    ( 1, '20161015', 1, 0),

    ( 1, '20161021', 1, 0),

    ( 1, '20161022', 1, 0),

    ( 1, '20161025', 1, 0)

    INSERT INTO #TruantDay

    VALUES

    ( 2, '20151001', 1, 0),

    ( 2, '20151011', 1, 0),

    ( 2, '20151012', 1, 0),

    ( 2, '20151015', 1, 0),

    ( 2, '20151021', 1, 0),

    ( 2, '20151222', 1, 0),

    ( 2, '20151225', 1, 0);

    CREATE CLUSTERED INDEX PK_TruantDay ON #TruantDay(StudentId, Att_Date)

    ;

    WITH absence_counts AS (

    SELECT s.StudentID, sc.SchoolDay, td.Att_Date, COUNT(td.Att_Date) OVER(PARTITION BY s.StudentID ORDER BY sc.SchoolDay ROWS BETWEEN 30 PRECEDING AND CURRENT ROW) AS running_absences

    FROM #Students s

    CROSS JOIN #SchoolCalendar sc

    LEFT OUTER JOIN #TruantDay td

    ON s.StudentID = td.StudentId

    AND sc.SchoolDay = td.Att_Date

    AND td.IsAbsent = 1

    AND td.IsCleared = 0

    )

    SELECT StudentID, Att_Date, running_absences

    FROM absence_counts

    WHERE absence_counts.running_absences > 6

    AND Att_Date IS NOT NULL

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Thanks, let me give it a try

  • Right now I don'thave a school calendar table. But the TruantDay table covers that, it will only have current school year data.

    I would like to modify above query a little to still use first part of my original query , that is to find the first date and last day that the student absent, and count in between any 30 days that has passed the threshold of 7 and return the earliest threshold date for each student.

    Thanks,

  • I´m not sure what to say. You should know the deal by now. We need DDL, sample data and expected results.

    The query that I posted gives the exact same result as the query you posted. You would need to post sample data to demonstrate how the query is malfunctioning and how it should return the results.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply