Count values for first day of past n months

  • I have calculate how man ids are in a certain status for the first day of each month for the past 36 months. Its hard to explain so I tried to create some sample data and detailed explanation below of what the expected results are. I was able to get the highest by using the row_number over trick. However, I can't figure out how to run that for the first day of each month. The closest I've gotten is attempting some window functions, but I'm still at a loss. Any help or direction is much appreciated!

    For the first day of each month for the past n month, find the ids chgstat=20 with their latest chgdate before or equal to the first day of that month.

    On 2018-09-01 we have...
    id 1: highest chgstat on/before 2018-09-01 is 20 (went into this chgstat on 2018-08-25 and didn't come out until 2018-09-10). This one counts.
    id 2: went into chgstat=20 on 2018-07-26 but went into chgstat=30 on 2018-08-29. Since id 2 highest chgstat on or before 2018-09-01 wasn't chgstat=20, this does NOT count.
    id 3: highest chgstat on/before 2018-09-01 is 30, so this does not count.
    id 4: highest chgstat on/before 2018-09-01 is 30 (on 2018-08-29). This one does not count.
    id 5: highest chgstat on/before 2018-09-01 is 10. The id did go into chgstat=20, but it was on 2018-09-03. Therefore, this one does not count.

    ON 2018-08-01 we have...
    id 1: All chgstats occur after 2018-08-01. Does not count
    id 2: Highest change status on/before 2018-08-01 is 20. Therefore, this one counts.
    id 3: Did go into chgstat=20 before 2018-08-01. However, the highest on/before 2018-08-01 is 30. Does not count
    id 4: Highest change status on/before 2018-08-01 is 20. Therefore, this one counts.
    id 5: Highest chgstat on/before 2018-08-01 is 10. Does not count.
    For the first day of each month for the past n month, find the ids chgstat=20 with their latest chgdate before or equal to the first day of that month.

    CREATE TABLE stathist (
        id INT,
        chgstat INT,
        chgdate DATE,
    )

    INSERT INTO stathist (id, chgstat, chgdate) VALUES
    (1, 30, '2018-09-10'),
    (1, 20, '2018-08-25'),
    (1, 10, '2018-08-23'),
    (2, 30, '2018-08-29'),
    (2, 20, '2018-07-26'),
    (2, 10, '2018-07-25'),
    (3, 30, '2018-07-29'),
    (3, 20, '2018-07-18'),
    (3, 10, '2018-07-15'),
    (4, 30, '2018-08-29'),
    (4, 20, '2018-07-09'),
    (4, 10, '2018-07-05'),
    (5, 30, '2018-09-20'),
    (5, 20, '2018-09-03'),
    (5, 10, '2018-07-05')

  • Here are some options, getting to the final result in 3 stages to make it easier to understand. The third one should meet your requirements.
    You'll need to test it carefully as there is only a little test data.


    -- List of Ids & Dates for each StartOfMonth (so you can check the base data):
    WITH Months AS
     (SELECT 1 as Value, DATEADD(month, DATEDIFF(month, 0, GETDATE()), 0) AS StartOfMonth
      UNION ALL
      SELECT value + 1 AS value, DATEADD(month, -1, StartOfMonth) AS StartOfMonth
      FROM Months
      WHERE Months.value <= 35)
    SELECT M.StartOfMonth, S20.Id, S20.chgdate, S30.Id, S30.chgdate
    FROM Months M
    LEFT JOIN (SELECT * FROM stathist WHERE chgstat=20) S20 ON M.StartOfMonth >= S20.chgdate
    LEFT JOIN (SELECT * FROM stathist WHERE chgstat=30) S30 ON S20.id = S30.id
    WHERE M.StartOfMonth <= S30.chgdate;

    -- Finished query with counts:
    WITH Months AS
     (SELECT 1 as Value, DATEADD(month, DATEDIFF(month, 0, GETDATE()), 0) AS StartOfMonth
      UNION ALL
      SELECT value + 1 AS value, DATEADD(month, -1, StartOfMonth) AS StartOfMonth
      FROM Months
      WHERE Months.value <= 35)
    SELECT M.StartOfMonth, COUNT(S20.id) as IdCount
    FROM Months M
    LEFT JOIN (SELECT * FROM stathist WHERE chgstat=20) S20 ON M.StartOfMonth >= S20.chgdate
    LEFT JOIN (SELECT * FROM stathist WHERE chgstat=30) S30 ON S20.id = S30.id
    WHERE M.StartOfMonth <= S30.chgdate
    GROUP BY StartOfMonth
    ORDER BY StartOfMonth DESC;

    -- Finished query with counts, listing all 36 months even if 0 count:
    WITH Months AS
     (SELECT 1 as Value, DATEADD(month, DATEDIFF(month, 0, GETDATE()), 0) AS StartOfMonth
      UNION ALL
      SELECT value + 1 AS value, DATEADD(month, -1, StartOfMonth) AS StartOfMonth
      FROM Months
      WHERE Months.value <= 35)
    , Data AS
    (
    SELECT M.StartOfMonth, COUNT(S20.id) as IdCount
    FROM Months M
    LEFT JOIN (SELECT * FROM stathist WHERE chgstat=20) S20 ON M.StartOfMonth >= S20.chgdate
    LEFT JOIN (SELECT * FROM stathist WHERE chgstat=30) S30 ON S20.id = S30.id
    WHERE M.StartOfMonth <= S30.chgdate
    GROUP BY StartOfMonth
    )
    SELECT M.StartOfMonth, ISNULL(IdCount, 0) AS IdCount
    FROM Months M
    LEFT JOIN Data D ON D.StartOfMonth = m.StartOfMonth
    ORDER BY M.StartOfMonth DESC;

  • Ah... be careful, Laurie.  You're using rCTEs (Recursive CTEs) that produce an incremental count and they're slower than a well written While Loop and use 9 times the number of reads.  Even seemingly low row counts suffer a lot and you can't predict how many times such code will be hit in the future.  Worse yet, someone in a pinch for something similar may borrow the rCTE methods from your code and end up in deep Kimchi for both performance and resource usage.

    Please see the following article where you'll find 3 other methods that smoke rCTEs in all areas.
    Hidden RBAR: Counting with Recursive CTE's

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • DataAnalyst011 - Thursday, September 27, 2018 9:26 PM

    I have calculate how man ids are in a certain status for the first day of each month for the past 36 months. Its hard to explain so I tried to create some sample data and detailed explanation below of what the expected results are. I was able to get the highest by using the row_number over trick. However, I can't figure out how to run that for the first day of each month. The closest I've gotten is attempting some window functions, but I'm still at a loss. Any help or direction is much appreciated!

    For the first day of each month for the past n month, find the ids chgstat=20 with their latest chgdate before or equal to the first day of that month.

    On 2018-09-01 we have...
    id 1: highest chgstat on/before 2018-09-01 is 20 (went into this chgstat on 2018-08-25 and didn't come out until 2018-09-10). This one counts.
    id 2: went into chgstat=20 on 2018-07-26 but went into chgstat=30 on 2018-08-29. Since id 2 highest chgstat on or before 2018-09-01 wasn't chgstat=20, this does NOT count.
    id 3: highest chgstat on/before 2018-09-01 is 30, so this does not count.
    id 4: highest chgstat on/before 2018-09-01 is 30 (on 2018-08-29). This one does not count.
    id 5: highest chgstat on/before 2018-09-01 is 10. The id did go into chgstat=20, but it was on 2018-09-03. Therefore, this one does not count.

    ON 2018-08-01 we have...
    id 1: All chgstats occur after 2018-08-01. Does not count
    id 2: Highest change status on/before 2018-08-01 is 20. Therefore, this one counts.
    id 3: Did go into chgstat=20 before 2018-08-01. However, the highest on/before 2018-08-01 is 30. Does not count
    id 4: Highest change status on/before 2018-08-01 is 20. Therefore, this one counts.
    id 5: Highest chgstat on/before 2018-08-01 is 10. Does not count.
    For the first day of each month for the past n month, find the ids chgstat=20 with their latest chgdate before or equal to the first day of that month.

    CREATE TABLE stathist (
        id INT,
        chgstat INT,
        chgdate DATE,
    )

    INSERT INTO stathist (id, chgstat, chgdate) VALUES
    (1, 30, '2018-09-10'),
    (1, 20, '2018-08-25'),
    (1, 10, '2018-08-23'),
    (2, 30, '2018-08-29'),
    (2, 20, '2018-07-26'),
    (2, 10, '2018-07-25'),
    (3, 30, '2018-07-29'),
    (3, 20, '2018-07-18'),
    (3, 10, '2018-07-15'),
    (4, 30, '2018-08-29'),
    (4, 20, '2018-07-09'),
    (4, 10, '2018-07-05'),
    (5, 30, '2018-09-20'),
    (5, 20, '2018-09-03'),
    (5, 10, '2018-07-05')

    Please post what the return for that data would look like according to your rules.  Thanks.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply