% of participants that are retained

  • I am a school teacher very new to MS SQL server. Everyone is suggesting to try this site out. Here goes!

    I am trying to write queries to test different types of outcome measures for the participation in a academic program. There are several different ways to calculate this outcome measurement I would like to try. The outcome which I am trying to Calculate is: What is the % of participants that are retained during six months of the program? I am testing different ways to define participant and different time ranges. There are 4 queries I am trying to produce. Unfortunately, I have to use for different tables: Attendance, Status, Deerolled, Inactive. I have included sample data from each below.

    Queries

    1.A participant is defined as everyone that attended a class at least twice a week for 6 months (181 days total) starting at July 1st 2012 and Ending June 30th 2013, so the length of the fiscal year. If a participant is deenrolled or inactive they are dropped.

    2.A participant is defined as everyone that attended a class at least twice a week for 6 months (181 days total) starting at January 1st 2013. If a participant is deenrolled or becomes inactive they are dropped.

    3.A participant is defined as everyone that attended a class at least twice a week starting at January 1st 2013 until today

    4.A participant is defined as a student's enrollment start date until they are deenrolled or become inactive.

    Participant (Numerator) participant / all students which were served (Denominator)

    The 4 query outputs I am looking for are different versions of this:

    Example:

    Participants Served Percent_Served

    75 100 75%

    I have been messing arouns with different versions of the queries below. As you can see I am very new to SQL server. 🙂

    SELECT

    Count (distinct ID) as Count,

    Count ( DATEADD( dd, -181, DATEADD(wk, DATEDIFF(wk,0,Date), 0)) > 2 as Participants ,

    FROM Attendance

    where Attendence_date date between '07/01/2012' and '06/30/2013'

    and ID not in (Select ID from Inactive)

    or ID not in (select ID from Denenrolled)

    GROUP BY ID

    SELECT

    Count (distinct ID) as Count,

    Count ( DATEADD( dd, -181, DATEADD(wk, DATEDIFF(wk,0,Date), 0)) - Enrolled_Date as Participants ,

    FROM Attendance

    where Attendence_date date between '07/01/2012' and '06/30/2013'

    and ID not in (Select ID from Inactive)

    or ID not in (select ID from Denenrolled)

    GROUP BY ID

    Any programming assistance for these queries is greatly appreciated.

    Below are the sample/exmaple datasets.

    Attendence_date is the date a student participated in one class.

    CREATE TABLE Attendance (

    ID int,

    Attendence_date datetime,

    )

    INSERT INTO ID Attendence_date VALUES

    (4504498, '7/1/2012'),

    (4504498, '7/2/2012'),

    (4504498, '7/3/2012'),

    (4504498, '7/4/2012'),

    (4504498, '7/5/2012'),

    (4504498, '7/8/2012'),

    (4504498, '7/9/2012'),

    (4504498, '7/10/2012'),

    (4504498, '7/11/2012'),

    (4504498, '7/12/2012'),

    (4504498, '7/1/2012'),

    (4504498, '7/2/2012'),

    (4504498, '7/3/2012'),

    (4504498, '7/4/2012'),

    (4504498, '7/5/2012'),

    (4504498, '7/8/2012'),

    (4504498, '7/9/2012'),

    (4504498, '7/10/2012'),

    (4504498, '7/11/2012'),

    (4504498, '7/12/2012'),

    (9201052, '7/15/2012'),

    (9201052, '7/16/2012'),

    (9201052, '7/17/2012'),

    (9201052, '7/17/2012'),

    (9201052, '7/18/2012'),

    (7949745,'7/17/2012'),

    (7949745,'7/18/2012'),

    (7949745,'7/23/2012'),

    (7949745,'7/23/2012'),

    (7949745,'7/24/2012'),

    (7949745,'7/26/2012'),

    (7949745,'7/26/2012'),

    (7949745,'8/8/2012'),

    (7949745,'8/8/2012'),

    (7949745,'11/5/2012'),

    (7949745,'11/5/2012'),

    (7949745,'11/5/2012'),

    (7949745,'11/6/2012'),

    (7949745,'11/6/2012'),

    (7949745,'11/6/2012'),

    (7949745,'11/7/2012'),

    (7949745,'11/7/2012'),

    (7949745,'11/7/2012')

    CREATE TABLE Status (

    ID int,

    Intake_Date datetime ,

    Engaged_Date datetime ),

    Enrolled_Date datetime)

    INSERT INTO ID Intake_Date Engaged_Date Enrolled_Date VALUES

    (7949745, '3/7/2012','7/17/2012','3/8/2012'),

    (4504498, '2/21/2013','3/5/2013','3/22/2013'),

    (1486279, '4/18/2013', '5/7/2013','5/20/2013'),

    (9201052, '5/15/2012','7/13/2012', '5/15/2012'),

    (1722390, '3/5/2012','8/27/2012', '3/8/2012'),

    (7735695, '9/7/2012','9/7/2012', '9/28/2012'),

    (9261549, '3/7/2012','7/24/2012', '3/8/2012'),

    (3857008, '3/15/2013','3/18/2013', '4/3/2013'),

    (8502583, '3/14/2013', '4/15/2013', '5/3/2013'),

    (1209774, '4/19/2012', '1/1/2012' '4/24/2012')

    CREATE TABLE Deenrolled (

    ID int,

    Deenrolled_Date datetime)

    INSERT INTO ID Deenrolled VALUES

    (7949745, '2/4/2013'),

    (5485272, '07/08/2013'),

    (8955628, '01/10/2013'),

    (5123221, '7/8/2013'),

    (5774753, '7/18/2013'),

    (3005451, '2/18/2013'),

    (7518818, '05/29/2013'),

    (9656985, '6/20/2013'),

    (2438101, '7/17/2013'),

    (1437052, '7/25/2013'),

    (9133874, '4/25/2013'),

    (7007375, '6/19/2013'),

    (3178181, '5/24/2013')

    CREATE TABLE Inactive (

    ID int,

    Effect_Date datetime)

    INSERT INTO IDEffect_Date VALUES

    (1209774,'10/12/2012'),

    (5419494,'10/12/2012'),

    (4853049,'10/9/2012'),

    (1453678,'5/23/2013'),

    (1111554,'7/16/2012'),

    (5564128,'2/15/2013'),

    (1769234,'7/16/2012')

  • Hi

    I hope I have understood your requirements for this. Thanks for posting some sample data, however it wasn't really enough to test this properly. Also the statements needed a bit of cleaning up to run.

    I wasn't sure how you defined Served so I just did a count of distinct ids from attendance.

    Query 4 wasn't very in it's definition, so I left that one out.

    To try and keep things reasonably clear I have put the queries within a CTE then combined them in a single query at the end

    WITH Served AS (

    -- count from attendance. Change this to which query gets the number you want

    SELECT COUNT(DISTINCT ID) ServedCount

    FROM Attendance

    )

    ,Q1 AS (

    -- Query 1

    SELECT ID,

    DATEPART(wk,Attendence_date) WeekNum,

    COUNT(*) AS TimesAttended

    FROM Attendance A

    WHERE Attendence_date between '20120701' and '20130630'

    AND not exists (SELECT ID FROM Inactive I WHERE A.ID = I.ID)

    AND not exists (SELECT ID FROM Deenrolled D WHERE A.ID = D.ID)

    GROUP BY ID, DATEPART(wk,Attendence_date)

    HAVING COUNT(*) > 1

    --ORDER BY ID, DATEPART(wk,Attendence_date)

    )

    ,Q2 AS (

    -- Query 2

    SELECT ID,

    DATEPART(wk,Attendence_date) WeekNum,

    COUNT(*) AS TimesAttended

    FROM Attendance A

    WHERE Attendence_date between '20130101' and '20130630'

    AND not exists (SELECT ID FROM Inactive I WHERE A.ID = I.ID)

    AND not exists (SELECT ID FROM Deenrolled D WHERE A.ID = D.ID)

    GROUP BY ID, DATEPART(wk,Attendence_date)

    HAVING COUNT(*) > 1

    --ORDER BY ID, DATEPART(wk,Attendence_date)

    )

    ,Q3 AS (

    -- Query 3

    SELECT ID,

    DATEPART(wk,Attendence_date) WeekNum,

    COUNT(*) AS TimesAttended

    FROM Attendance A

    WHERE Attendence_date between '20130101' and getdate()

    AND not exists (SELECT ID FROM Inactive I WHERE A.ID = I.ID)

    AND not exists (SELECT ID FROM Deenrolled D WHERE A.ID = D.ID)

    GROUP BY ID, DATEPART(wk,Attendence_date)

    HAVING COUNT(*) > 1

    --ORDER BY ID, DATEPART(wk,Attendence_date)

    )

    -- Put them together

    SELECT ServedCount,

    P1, CAST(P1 AS FLOAT) / ServedCount P1_PCT_SERVED,

    P2, CAST(P2 AS FLOAT) / ServedCount P2_PCT_SERVED,

    P3, CAST(P3 AS FLOAT) / ServedCount P3_PCT_SERVED

    FROM Served

    CROSS APPLY (SELECT COUNT(DISTINCT ID) p1 FROM Q1 ) Q1

    CROSS APPLY (SELECT COUNT(DISTINCT ID) p2 FROM Q2 ) Q2

    CROSS APPLY (SELECT COUNT(DISTINCT ID) p3 FROM Q3 ) Q3

    Hope this helps

  • This looks great! Oh man thank you so much! You don't realize how much you have helped me!!! For query 4 for I am looking for is: if a child has a status of enrolled in at the beginning of January or given a status of enrolled during the month of January does child stay enrolled for 181 days (6 months). So the child does not become "Deenrolled" of "Inactive" . Again, your assistance is great! thanks.

  • matt.j.garretson (7/29/2013)


    This looks great! Oh man thank you so much! You don't realize how much you have helped me!!! For query 4 for I am looking for is: if a child has a status of enrolled in at the beginning of January or given a status of enrolled during the month of January does child stay enrolled for 181 days (6 months). So the child does not become "Deenrolled" of "Inactive" . Again, your assistance is great! thanks.

    Glad to help ... For the fourth query you could add something like the following to the query

    , Q4 AS (

    -- Query 4

    SELECT ID

    FROM Status S

    WHERE

    --Enrolled_Date between '20130101' and '20130131' -- Enrolled in January

    Enrolled_Date < '20130201' -- Enrolled before February

    AND not exists (SELECT ID FROM Inactive I WHERE S.ID = I.ID)

    AND not exists (SELECT ID FROM Deenrolled D WHERE S.ID = D.ID and D.Deenrolled_Date >= S.Enrolled_Date)

    )

    SELECT ...

    Also as a note, if your query this query extends for more than a year then you will need group by the year as well.

    SELECT ID,

    DATEPART(yy,Attendence_date) YearNum,

    DATEPART(wk,Attendence_date) WeekNum,

    COUNT(*) AS TimesAttended

    FROM Attendance A

    WHERE Attendence_date between '20120701' and '20130630'

    AND not exists (SELECT ID FROM Inactive I WHERE A.ID = I.ID)

    AND not exists (SELECT ID FROM Deenrolled D WHERE A.ID = D.ID)

    GROUP BY ID, DATEPART(yy,Attendence_date), DATEPART(wk,Attendence_date)

    HAVING COUNT(*) > 1

    Also you may want to put further statements in the WHERE clause for the Inactive and Deenrolled tables to ensure that there dates are in the time frame that you are querying.

    There are quite a few little gotchas when working with dates and I'm definitely not expert on them, so please test these queries thoroughly

    Edit: Changed y to yy in the datepart

  • You are amazing!!!!! How to I give you points on this. Last question I want to be able to look at these using days to So Enrolment date - 181 days what was the class participation?

  • Thanks ... as I said, I'm not an expert on dates, but you probably want to use the dateadd function, something like ...

    ... dateadd(dd,-181,getdate()) ...

  • You are super! Thanks so much! If I wanted to define participation by "An AVERAGE of twice a week per month for 6 months". How would I do that? That's it I promise.

  • Ahh ... one of the tricky ones 🙂

    Assuming that we also want to include the weeks were there is no attendance, I have put in a small tally table to create up to 60 weeks. Then I create a table containing all weeks for each participant. Once that is done you can group by the ID and filter it using a having clause on the averageWITH Tally AS (

    -- Small tally table to use with weeks. 60 values to cover slightly more than a year

    SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) N

    FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),

    (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),

    (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),

    (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),

    (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),

    (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) AS TinyTally(N)

    )

    ,PeriodOfInterestYearAndWeeks AS (

    -- Get the weeks number and years for the period that we are interest in

    SELECT ID,

    DATEPART(yy,DATEADD(wk,N - 1,'20120701')) dpYear,

    DATEPART(wk,DATEADD(wk,N- 1,'20120701')) dpWeek

    From Tally

    CROSS APPLY (SELECT DISTINCT ID FROM #Attendance) a

    WHERE DATEADD(wk,N - 1,'20120701') BETWEEN '20120701' and '20130630'

    )

    ,Q1 AS (

    SELECT ID,

    DATEPART(yy,Attendence_date) dpYear,

    DATEPART(wk,Attendence_date) dpWeek,

    COUNT(*) AS TimesAttended

    FROM #Attendance A

    WHERE Attendence_date between '20120701' and '20130630'

    AND not exists (SELECT ID FROM #Inactive I WHERE A.ID = I.ID)

    AND not exists (SELECT ID FROM #Deenrolled D WHERE A.ID = D.ID)

    GROUP BY ID, DATEPART(yy,Attendence_date), DATEPART(wk,Attendence_date)

    )

    SELECT p.ID

    FROM PeriodOfInterestYearAndWeeks p

    LEFT OUTER JOIN Q1 a ON p.dpYear = a.dpYear and p.dpWeek = a.dpWeek and p.ID = a.ID

    GROUP BY p.ID

    HAVING AVG(ISNULL(CAST(TimesAttended AS FLOAT),0)) > 2

    The OUTER JOIN will cause the TimesAttended to be NULL for weeks without counts. The NULLs are changed to 0 for the AVG and CAST as float, otherwise a integer is returned.

    There is probably a better (and faster) way to do this, but I've got to go at the moment. I'll see if I can spend a bit of time on it later.

  • You are absolutely amazing!!! That you!!

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply