Monthly total of active data

  • For a report I'm trying to write I ideally need a recordset that shows me how many people were on a training course, by month. I have the day they started and the day they left the course, and need to show the monthly total of people actively on the course.

    I'm having trouble getting my head around how I should construct this and suspect the answer isn't as difficult as I'm trying to make it.

    Extremely simplified example data and a summary of what I expect the result to be:

    CREATE TABLE #SampleTrainingData(

    IDINT IDENTITY(1,1),

    NameNVARCHAR(10),

    StartDateDATE,

    LeavingDateDATE

    );

    INSERT INTO #SampleTrainingData

    ( Name, StartDate, LeavingDate )

    VALUES ( N'Fred', '01 Jan 2016', '05 Jan 2016'),

    ( N'John', '01 Jan 2016', NULL),

    ( N'Steve','01 Feb 2016', '03 Mar 2016'),

    ( N'Allan','03 Feb 2016', NULL),

    ( N'Mike', '01 Mar 2016', '05 Apr 2016');

    SELECT * FROM #SampleTrainingData

    --Required Results

    --2016 Jan - 1 --(Fred started and left in the same month. John started in that month)

    --2016 Feb - 3 --(John is still in training in this month. Steve started training in this month. Allan started training in this month)

    --2016 Mar - 3 --(Steve left in this month. John is still in training for this month. Allan is still in training this month. Mike started training this month)

    I would appreciate some help with this.

  • Does this meet your needs? You'll need to change the values of the @StartDate and @EndDate Variables to suit your needs though.

    DECLARE @StartDate DATE = '01-Jan-2016',

    @EndDate DATE = '30-Apr-2016';

    WITH Months (MonthDate)

    AS

    (

    SELECT @StartDate

    UNION ALL

    SELECT DATEADD(MONTH, 1, MonthDate)

    FROM Months

    WHERE DATEADD(MONTH, 1, MonthDate) <= @EndDate

    )

    SELECT FORMAT(M.MonthDate, 'yyyy MMM') as MonthDate,

    (SELECT COUNT([Name])

    FROM #SampleTrainingData sq

    WHERE DATEADD(MONTH, DATEDIFF(MONTH, 0,sq.StartDate),0) <= M.MonthDate

    AND ISNULL(sq.LeavingDate,DATEADD(Month, 1, M.MonthDate)) >= DATEADD(Month, 1, M.MonthDate)) AS Trainees

    FROM Months M

    ORDER BY M.MonthDate;

    Edit:Forgot the variable decs! Whoops!

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • Something like this...

    DECLARE

    @BegDate DATE = '2016-01-05',

    @EndDate DATE = '2016-06-22';

    WITH

    n (n) AS (SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) n (n)),

    cte_MonthStart (MonthBeg) AS (

    SELECT TOP (DATEDIFF(mm, @BegDate, @EndDate) +1)

    CAST(DATEADD(mm, ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) -1, DATEADD(mm, DATEDIFF(mm, '1900-01-01', @BegDate), '1900-01-01')) AS DATE)

    FROM n n1, n n2, n n3, n n4, n n5, n n6

    ),

    cte_MonthBegEnd AS (

    SELECT

    ms.MonthBeg,

    MonthEnd = EOMONTH(ms.MonthBeg)

    FROM

    cte_MonthStart ms

    )

    SELECT

    TheYear = YEAR(mbe.MonthBeg),

    TheMonth = MONTH(mbe.MonthBeg),

    Attended = SUM(stdx.Cnt)

    FROM

    cte_MonthBegEnd mbe

    CROSS APPLY (

    SELECT

    Cnt = 1

    FROM

    #SampleTrainingData std

    WHERE

    mbe.MonthBeg BETWEEN std.StartDate AND ISNULL(std.LeavingDate, '9999-12-31')

    OR

    mbe.MonthEnd BETWEEN std.StartDate AND ISNULL(std.LeavingDate, '9999-12-31')

    OR

    std.StartDate BETWEEN mbe.MonthBeg AND mbe.MonthEnd

    OR

    std.LeavingDate BETWEEN mbe.MonthBeg AND mbe.MonthEnd

    ) stdx

    GROUP BY

    mbe.MonthBeg

    ORDER BY

    mbe.MonthBeg;

    Results...

    TheYear TheMonth Attended

    ----------- ----------- -----------

    2016 1 2

    2016 2 3

    2016 3 4

    2016 4 3

    2016 5 2

    2016 6 2

  • Why are some people with partial months included and others excluded? Is there a minimum number of days required to be counted? Or are you only counting people who ended the month in training? Would Mal be included in February if he was in training starting FEB 27?

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Thom A (10/5/2016)


    Does this meet your needs? You'll need to change the values of the @StartDate and @EndDate Variables to suit your needs though.

    DECLARE @StartDate DATE = '01-Jan-2016',

    @EndDate DATE = '30-Apr-2016';

    WITH Months (MonthDate)

    AS

    (

    SELECT @StartDate

    UNION ALL

    SELECT DATEADD(MONTH, 1, MonthDate)

    FROM Months

    WHERE DATEADD(MONTH, 1, MonthDate) <= @EndDate

    )

    SELECT FORMAT(M.MonthDate, 'yyyy MMM') as MonthDate,

    (SELECT COUNT([Name])

    FROM #SampleTrainingData sq

    WHERE DATEADD(MONTH, DATEDIFF(MONTH, 0,sq.StartDate),0) <= M.MonthDate

    AND ISNULL(sq.LeavingDate,DATEADD(Month, 1, M.MonthDate)) >= DATEADD(Month, 1, M.MonthDate)) AS Trainees

    FROM Months M

    ORDER BY M.MonthDate;

    Edit:Forgot the variable decs! Whoops!

    This is horribly, horribly inefficient.

    1) Using a CTE to create a sequence is horribly inefficient. Use a Tally table instead.

    2) Using FORMAT is horribly inefficient. Just send a representative date and let the presentation layer worry about the formatting.

    3) Using ISNULL in a WHERE clause is not SARGable. It's much better to use an OR or UNION.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • This one is purely for entertainment, it's not very efficient with 4 table scans and an expensive sort in the execution plan. It does on the other hand demonstrate some nice techniques using inline tally and calendar CTEs.

    😎

    USE TEEST;

    GO

    SET NOCOUNT ON;

    IF OBJECT_ID(N'tempdb..#SampleTrainingData') IS NULL

    BEGIN

    CREATE TABLE #SampleTrainingData(

    IDINT IDENTITY(1,1),

    NameNVARCHAR(10),

    StartDateDATE,

    LeavingDateDATE

    );

    INSERT INTO #SampleTrainingData

    ( Name, StartDate, LeavingDate )

    VALUES ( N'Fred', '01 Jan 2016', '05 Jan 2016'),

    ( N'John', '01 Jan 2016', NULL),

    ( N'Steve','01 Feb 2016', '03 Mar 2016'),

    ( N'Allan','03 Feb 2016', NULL),

    ( N'Mike', '01 Mar 2016', '05 Apr 2016');

    END

    ;WITH CALENDAR_CONFIG(MIN_DATE,MAX_DATE,MONTH_COUNT) AS

    (

    SELECT

    MIN(STD.StartDate) AS MIN_DATE

    ,COALESCE(MAX(STD.LeavingDate),MAX(STD.StartDate)) AS MAX_DATE

    ,DATEDIFF(MONTH,MIN(STD.StartDate),COALESCE(MAX(STD.LeavingDate),MAX(STD.StartDate))) + 1 AS MONTH_COUNT

    FROM #SampleTrainingData STD

    )

    ,T(N) AS (SELECT N FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) AS X(N))

    ,NUMS(N) AS (SELECT TOP(SELECT MONTH_COUNT FROM CALENDAR_CONFIG) ROW_NUMBER() OVER (ORDER BY @@VERSION) - 1 AS N FROM T T1,T T2,T T3)

    ,CALENDAR AS

    (

    SELECT

    (YEAR(DATEADD(MONTH,NM.N,CC.MIN_DATE)) * 100) + MONTH(DATEADD(MONTH,NM.N,CC.MIN_DATE)) MONTH_START

    FROM CALENDAR_CONFIG CC

    CROSS APPLY NUMS NM

    )

    ,BASE_DATA AS

    (

    SELECT

    STD.ID

    ,STD.Name

    ,(YEAR(STD.StartDate) * 100) + MONTH(STD.StartDate) AS StartDate

    ,(YEAR(ISNULL(STD.LeavingDate,CC.MAX_DATE)) * 100) + MONTH(ISNULL(STD.LeavingDate,CC.MAX_DATE)) AS LeavingDate

    FROM #SampleTrainingData STD

    CROSS APPLY CALENDAR_CONFIG CC

    )

    SELECT

    CAL.MONTH_START

    ,COUNT(*) TRAINEE_COUNT

    FROM BASE_DATA BD

    CROSS APPLY CALENDAR CAL

    WHERE BD.StartDate < BD.LeavingDate

    AND CAL.MONTH_START >= BD.StartDate

    AND CAL.MONTH_START < BD.LeavingDate

    GROUP BY CAL.MONTH_START;

    Output

    MONTH_START TRAINEE_COUNT

    ----------- -------------

    201601 1

    201602 3

    201603 3

  • My sick and tired (literally, on both of those 🙁 ) brain is telling me that there is a very efficient single-pass solution here using SUM(CASE...). But sadly I don't have time to pursue it at the moment. Perhaps someone else can bring it to life in the mean time?

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Thom A (10/5/2016)


    Does this meet your needs? You'll need to change the values of the @StartDate and @EndDate Variables to suit your needs though.

    DECLARE @StartDate DATE = '01-Jan-2016',

    @EndDate DATE = '30-Apr-2016';

    WITH Months (MonthDate)

    AS

    (

    SELECT @StartDate

    UNION ALL

    SELECT DATEADD(MONTH, 1, MonthDate)

    FROM Months

    WHERE DATEADD(MONTH, 1, MonthDate) <= @EndDate

    )

    SELECT FORMAT(M.MonthDate, 'yyyy MMM') as MonthDate,

    (SELECT COUNT([Name])

    FROM #SampleTrainingData sq

    WHERE DATEADD(MONTH, DATEDIFF(MONTH, 0,sq.StartDate),0) <= M.MonthDate

    AND ISNULL(sq.LeavingDate,DATEADD(Month, 1, M.MonthDate)) >= DATEADD(Month, 1, M.MonthDate)) AS Trainees

    FROM Months M

    ORDER BY M.MonthDate;

    Edit:Forgot the variable decs! Whoops!

    The criteria for overlapping intervals is MUCH simpler than most people assume. It's easier to see this when you start from the criteria for when they do not overlap. They don't overlap when one ends before the other starts.

    -- I'm ignoring NULL values here in order to keep the formulas simpler.

    -- The logic is still the same.

    -- two ranges that do not intersect

    SELECT *

    FROM a

    INNER JOIN b

    WHERE a.end_dt < b.begin_dt OR a.begin_dt > b.end_dt

    -- Two ranges intersecting is just the inverse of the previous query.

    SELECT *

    FROM a

    INNER JOIN b

    WHERE NOT(a.end_dt < b.begin_dt OR a.begin_dt > b.end_dt)

    -- We can then distribute the NOT.

    -- Note that the OR becomes an AND in doing so.

    SELECT *

    FROM a

    INNER JOIN b

    WHERE NOT(a.end_dt < b.begin_dt) AND NOT(a.begin_dt > b.end_dt)

    -- Then use the fact that NOT(x < y) is equivalent to x >= y.

    SELECT *

    FROM a

    INNER JOIN b

    WHERE a.end_dt >= b.begin_dt

    AND a.begin_dt <= b.end_dt

    Drew

    PS: I am assuming closed intervals here. The logic is the same for open and half-closed intervals, it's just that the comparison operators are different.

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Sorry... I missed the expected output...

    A simple modification gets the desired results...

    DECLARE

    @BegDate DATE = '2016-01-05',

    @EndDate DATE = '2016-06-22';

    WITH

    n (n) AS (SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) n (n)),

    cte_MonthStart (MonthBeg) AS (

    SELECT TOP (DATEDIFF(mm, @BegDate, @EndDate) +1)

    CAST(DATEADD(mm, ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) -1, DATEADD(mm, DATEDIFF(mm, '1900-01-01', @BegDate), '1900-01-01')) AS DATE)

    FROM n n1, n n2, n n3, n n4, n n5, n n6

    ),

    cte_MonthBegEnd AS (

    SELECT

    ms.MonthBeg,

    MonthEnd = EOMONTH(ms.MonthBeg)

    FROM

    cte_MonthStart ms

    )

    SELECT

    TheYear = YEAR(mbe.MonthBeg),

    TheMonth = MONTH(mbe.MonthBeg),

    SUM(stdx.Cnt)

    FROM

    cte_MonthBegEnd mbe

    CROSS APPLY (

    SELECT

    Cnt = 1

    FROM

    #SampleTrainingData std

    WHERE

    mbe.MonthEnd BETWEEN std.StartDate AND ISNULL(std.LeavingDate, '9999-12-31')

    ) stdx

    GROUP BY

    mbe.MonthBeg

    ORDER BY

    mbe.MonthBeg

    Results...

    TheYear TheMonth

    ----------- ----------- -----------

    2016 1 1

    2016 2 3

    2016 3 3

    2016 4 2

    2016 5 2

    2016 6 2

  • I assumed a physical tally table, column name "N" (ugh!, but most common I guess), to avoid having to use an inline CTE:

    DECLARE @start_date datetime

    DECLARE @end_date datetime

    SET @start_date = '20160101'

    SET @end_date = '20160630'

    SELECT DATEADD(MONTH, t.N, StartMonth) AS TrainingMonth,

    COUNT(*) AS People_Count

    FROM #SampleTrainingData std

    CROSS APPLY (

    SELECT CAST(DATEADD(MONTH, DATEDIFF(MONTH, 0, std.StartDate), 0) AS date) AS StartMonth

    ) AS assign_alias_names

    INNER JOIN dbo.tally t ON t.N BETWEEN 0 AND

    DATEDIFF(MONTH, std.StartDate,

    CASE WHEN std.LeavingDate IS NULL OR std.LeavingDate > @end_date

    THEN @end_date

    ELSE DATEADD(MONTH, -1, std.LeavingDate) END)

    WHERE std.StartDate <= @end_date AND

    (std.LeavingDate IS NULL OR (std.LeavingDate >= @start_date))

    GROUP BY DATEADD(MONTH, t.N, StartMonth)

    ORDER BY DATEADD(MONTH, t.N, StartMonth)

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • drew.allen (10/5/2016)


    Why are some people with partial months included and others excluded? Is there a minimum number of days required to be counted? Or are you only counting people who ended the month in training? Would Mal be included in February if he was in training starting FEB 27?

    Drew

    Hi Drew,

    That is a very good question and one I'll have to look into. As a bit of background, I'm working at a place where they've never had a team of DBAs and Develoipers before - having always bought in software and paid for some customisation. So now they're discovering the delights of running a team to produce their own software. Man, oh man I miss having a good BA around.

    Little things like this get missed and it can lead to the delights of rework.

  • Thank you for the replies, people. They have helped enormously and shown me that what I was trying to do was ridiculously complex - which is what I thought.

    And there are some nice little bits in the examples I can look at and learn from for other things too. Always a bonus.

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply