October 27, 2016 at 2:47 pm
I have a table valued function to count student truant days in any passed 30 days that met threshold of 7 days and return the earliest passed threshold date.
I have to use a recursive CTE in the table valued function to count these. The truant table will have records for each student , each absent date.
The table valued function is run every day, because the TruantDay table is updated every day with new attendance data in case student is excused from absent.
Currently the query runs fine, it takes about 20 seconds. But I am a little worried it will increase as the TruantDay table increased. If anything to improve for the speed?
WITH FirstLastAbsentDates AS
(
SELECT a.StudentId
,MIN(a.Att_Date) AS FirstDay
,MAX(a.Att_Date) AS EndDay
FROM TruantDay a
WHERE a.IsAbsent = 1
AND a.IsCleared = 0
GROUP BY a.StudentId
HAVING MAX(a.Att_Date) > MIN(a.Att_Date)
)
, CTEDateRanges (StudentId, EndDay) AS
(
SELECT a.StudentId
,a.EndDay
FROM FirstLastAbsentDates a
UNION ALL
SELECT a.StudentId
,DATEADD(d, -1, c.EndDay) AS EndDay
FROM FirstLastAbsentDates a
JOIN CTEDateRanges c
ON c.StudentId = a.StudentId
WHERE DATEADD(d, -1, c.EndDay) > a.FirstDay
)
, MTDOver6Days AS
(
SELECT a.StudentId
, a.EndDay AS ThresholdDate
, COUNT(*) AS AbsenceCount
FROM CTEDateRanges a
JOIN TruantDay b
ON b.StudentId = a.StudentId
WHERE b.IsAbsent = 1
AND b.IsCleared = 0
AND b.Att_Date BETWEEN DATEADD(d, -30, a.EndDay)
AND a.EndDay
GROUP BY a.StudentId, a.EndDay
HAVING COUNT(*) > 6
)
SELECT a.StudentId
,a.ThresholdDate
,a.AbsenceCount
FROM MTDOver6Days a
WHERE a.ThresholdDate = (SELECT MIN(b.ThresholdDate)
FROM MTDOver6Days b
WHERE b.StudentId = a.StudentId
October 27, 2016 at 3:31 pm
sqlfriends (10/27/2016)
I have a table valued function to count student truant days in any passed 30 days that met threshold of 7 days and return the earliest passed threshold date.I have to use a recursive CTE in the table valued function to count these. The truant table will have records for each student , each absent date.
The table valued function is run every day, because the TruantDay table is updated every day with new attendance data in case student is excused from absent.
Currently the query runs fine, it takes about 20 seconds. But I am a little worried it will increase as the TruantDay table increased. If anything to improve for the speed?
WITH FirstLastAbsentDates AS
(
SELECT a.StudentId
,MIN(a.Att_Date) AS FirstDay
,MAX(a.Att_Date) AS EndDay
FROM TruantDay a
WHERE a.IsAbsent = 1
AND a.IsCleared = 0
GROUP BY a.StudentId
HAVING MAX(a.Att_Date) > MIN(a.Att_Date)
)
, CTEDateRanges (StudentId, EndDay) AS
(
SELECT a.StudentId
,a.EndDay
FROM FirstLastAbsentDates a
UNION ALL
SELECT a.StudentId
,DATEADD(d, -1, c.EndDay) AS EndDay
FROM FirstLastAbsentDates a
JOIN CTEDateRanges c
ON c.StudentId = a.StudentId
WHERE DATEADD(d, -1, c.EndDay) > a.FirstDay
)
, MTDOver6Days AS
(
SELECT a.StudentId
, a.EndDay AS ThresholdDate
, COUNT(*) AS AbsenceCount
FROM CTEDateRanges a
JOIN TruantDay b
ON b.StudentId = a.StudentId
WHERE b.IsAbsent = 1
AND b.IsCleared = 0
AND b.Att_Date BETWEEN DATEADD(d, -30, a.EndDay)
AND a.EndDay
GROUP BY a.StudentId, a.EndDay
HAVING COUNT(*) > 6
)
SELECT a.StudentId
,a.ThresholdDate
,a.AbsenceCount
FROM MTDOver6Days a
WHERE a.ThresholdDate = (SELECT MIN(b.ThresholdDate)
FROM MTDOver6Days b
WHERE b.StudentId = a.StudentId
It seems to me that you don't need a recursive CTE. However, I´m not sure if the sample data that I created is correct.
CREATE TABLE TruantDay(
StudentId int,
Att_Date date,
IsAbsent bit,
IsCleared bit
);
INSERT INTO TruantDay
VALUES
( 1, '20161001', 1, 0),
( 1, '20161011', 1, 0),
( 1, '20161012', 1, 0),
( 1, '20161015', 1, 0),
( 1, '20161021', 1, 0),
( 1, '20161022', 1, 0),
( 1, '20161025', 1, 0)
INSERT INTO TruantDay
VALUES
( 2, '20151001', 1, 0),
( 2, '20151011', 1, 0),
( 2, '20151012', 1, 0),
( 2, '20151015', 1, 0),
( 2, '20151021', 1, 0),
( 2, '20151222', 1, 0),
( 2, '20151225', 1, 0);
SELECT StudentId, MAX( Att_Date), COUNT(*) Absences
FROM TruantDay t
WHERE IsAbsent = 1
AND IsCleared = 0
AND Att_Date > (SELECT DATEADD(d, -30, MAX( i.Att_Date))
FROM TruantDay i
WHERE t.StudentId = i.StudentId)
GROUP BY StudentId
HAVING COUNT(*) > 6;
October 27, 2016 at 4:01 pm
Thank you, I run on my table, the result seems not correct. Your table looks good but for each student there is only one met threshold, there should be any 30 days counting back from today.
I would like to return the first date that met the threshold 7 in every passed 30 days, since the beginning of school year to current date. or since the date the student start to have absent date to current date.
This process is run every day.
October 27, 2016 at 4:02 pm
Does your TruantDay table only contain absences or does it contain all attendance information? If it's all information, there is a much faster approach.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
October 27, 2016 at 4:05 pm
It has records like each row for each student, each absent date.
October 27, 2016 at 4:15 pm
I think that this approach may still be faster, especially with the right indexes in place. I created a Student table, a SchoolDay table, and the TruantDay. I created indexes on each of the tables. The main advantage is that it only reads each of the tables once, whereas the subquery has to read the TruantDay table multiple times (15 for this small set of data).
DROP TABLE #Students, #SchoolCalendar, #TruantDay
CREATE TABLE #Students (
StudentID INT NOT NULL
)
INSERT #Students(StudentID)
VALUES(1), (2)
CREATE TABLE #SchoolCalendar(
SchoolDay DATE PRIMARY KEY
)
;
WITH E(n) AS(
SELECT n FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) E(n)
)
, cteTally(n) AS(
SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) n
FROM E a, E b, E c
)
INSERT #SchoolCalendar(SchoolDay)
SELECT DATEADD(DAY, n-1, '20151001')
FROM cteTally
CREATE TABLE #TruantDay(
StudentId int,
Att_Date date,
IsAbsent bit,
IsCleared bit
);
INSERT INTO #TruantDay
VALUES
( 1, '20161001', 1, 0),
( 1, '20161011', 1, 0),
( 1, '20161012', 1, 0),
( 1, '20161015', 1, 0),
( 1, '20161021', 1, 0),
( 1, '20161022', 1, 0),
( 1, '20161025', 1, 0)
INSERT INTO #TruantDay
VALUES
( 2, '20151001', 1, 0),
( 2, '20151011', 1, 0),
( 2, '20151012', 1, 0),
( 2, '20151015', 1, 0),
( 2, '20151021', 1, 0),
( 2, '20151222', 1, 0),
( 2, '20151225', 1, 0);
CREATE CLUSTERED INDEX PK_TruantDay ON #TruantDay(StudentId, Att_Date)
;
WITH absence_counts AS (
SELECT s.StudentID, sc.SchoolDay, td.Att_Date, COUNT(td.Att_Date) OVER(PARTITION BY s.StudentID ORDER BY sc.SchoolDay ROWS BETWEEN 30 PRECEDING AND CURRENT ROW) AS running_absences
FROM #Students s
CROSS JOIN #SchoolCalendar sc
LEFT OUTER JOIN #TruantDay td
ON s.StudentID = td.StudentId
AND sc.SchoolDay = td.Att_Date
AND td.IsAbsent = 1
AND td.IsCleared = 0
)
SELECT StudentID, Att_Date, running_absences
FROM absence_counts
WHERE absence_counts.running_absences > 6
AND Att_Date IS NOT NULL
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
October 27, 2016 at 4:28 pm
Thanks, let me give it a try
October 27, 2016 at 4:49 pm
Right now I don'thave a school calendar table. But the TruantDay table covers that, it will only have current school year data.
I would like to modify above query a little to still use first part of my original query , that is to find the first date and last day that the student absent, and count in between any 30 days that has passed the threshold of 7 and return the earliest threshold date for each student.
Thanks,
October 28, 2016 at 7:04 am
I´m not sure what to say. You should know the deal by now. We need DDL, sample data and expected results.
The query that I posted gives the exact same result as the query you posted. You would need to post sample data to demonstrate how the query is malfunctioning and how it should return the results.
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply