Please help - complete missing data with latest recorded

  • Hi there,

    I have been stuck on this for days and I need submitting this report in the next two days.

    This is the data I am working with:

    And this is what I need:

    Essentially, If I am reporting for June I need to have all the months even for those institutions who only have data as of February or March perhaps and need to add the same totals for those missing months as the total of the last month.

    Anyone can help please?

    Here is the SQL code to create the data:

    CREATE TABLE #data
    (
        ReportingMonth datetime,
        Institution varchar(4),
        Totals integer
    );
    GO

    INSERT INTO #data
    VALUES
    ('2018-01-01','Ins1',32),
    ('2018-02-01','Ins1',37),
    ('2018-03-01','Ins1',32),
    ('2018-04-01','Ins1',40),
    ('2018-05-01','Ins1',38),
    ('2018-06-01','Ins1',37),
    ('2018-01-01','Ins2',145),
    ('2018-02-01','Ins2',155),
    ('2018-01-01','Ins3',1489),
    ('2018-02-01','Ins3',1487),
    ('2018-03-01','Ins3',1485);
    GO

    SELECT * FROM #data

  • Something like this would work.  Please note, I personally would refine this - I hate looping, I would make it more configurable (i.e. tweaking parameters) and I would investigate why the LAST_VALUE function was not working as I expected.  Anyway, I didn't have more than a few minutes to spend on this.  

    Best wishes on this!

    CREATE TABLE #data
    (
      ReportingMonth datetime,
      Institution varchar(4),
      Totals integer
    );
    GO

    INSERT INTO #data
    VALUES
    ('2018-01-01','Ins1',32),
    ('2018-02-01','Ins1',37),
    ('2018-03-01','Ins1',32),
    ('2018-04-01','Ins1',40),
    ('2018-05-01','Ins1',38),
    ('2018-06-01','Ins1',37),
    ('2018-01-01','Ins2',145),
    ('2018-02-01','Ins2',155),
    ('2018-01-01','Ins3',1489),
    ('2018-02-01','Ins3',1487),
    ('2018-03-01','Ins3',1485);
    GO

    declare
         @ReportDateLimitEnd date = '20180601'
        , @ReportDateLimitStart date = '20180101'
    ;

    declare @DatePeriods table (ReportingDate Date, Institution varchar(4));

    while @ReportDateLimitEnd >= @ReportDateLimitStart
    begin
        insert into @DatePeriods (ReportingDate, Institution)
        select distinct @ReportDateLimitStart, Institution
        from #data;

        set @ReportDateLimitStart = dateadd(month,1,@ReportDateLimitStart);
    end

    select
         dp.ReportingDate
        , dp.Institution
        , case when d.Totals is not null then d.Totals else lt.LastTotal end as Totals
    from
        @DatePeriods dp
        left join #data d on dp.ReportingDate = d.ReportingMonth and dp.Institution = d.Institution
        left join (
            select distinct d.Institution, FIRST_VALUE(d.Totals) over (partition by d.Institution order by d.ReportingMonth desc) as LastTotal
            from #data d
        ) lt on lt.Institution = dp.Institution
    order by
        dp.Institution, dp.ReportingDate
    ;
    drop table #data

  • Thank you heb1014 for this ... will definitely look into refining the code as I have thousands of rows being queried so not sure how the looping will work in this case. However, this is a fantastic start for me, especially since I have been at it for the last few days. Appreciate you taking the time too.
  • Below is a solution without looping. Although I guess this is still not the most optimal way... it's the best I could come up with for now.

    if OBJECT_ID('tempdb..#data') is not null
        drop table #data
    CREATE TABLE #data
    (
      ReportingMonth datetime,
      Institution varchar(4),
      Totals integer
    );
    GO
    -- create a temp table with the sample data
    INSERT INTO #data
    VALUES
    ('20180101','Ins1',32),
    ('20180201','Ins1',37),
    ('20180301','Ins1',32),
    ('20180401','Ins1',40),
    ('20180501','Ins1',38),
    ('20180601','Ins1',37),
    ('20180101','Ins2',145),
    ('20180201','Ins2',155),
    ('20180101','Ins3',1489),
    ('20180201','Ins3',1487),
    ('20180301','Ins3',1485);
    GO

    -- create a TALLY table to list all the required dates
    if OBJECT_ID('tempdb..#tally_date') is not null
        drop table #tally_date
    CREATE TABLE #tally_date (AllMonths datetime)
    INSERT INTO #tally_date
    VALUES
    ('20180101'),
    ('20180201'),
    ('20180301'),
    ('20180401'),
    ('20180501'),
    ('20180601'),
    ('20180701'),
    ('20180801'),
    ('20180901'),
    ('20181001'),
    ('20181101'),
    ('20181201')
    ;

    select all_institutions.Institute as Institution
        , #tally_date.AllMonths
        , COALESCE(#data.Totals, last_reported.Totals) AS Totals
    from #tally_date
        -- use OUTER APLLY to get all months for each Institution
        outer apply (select distinct Institution as Institute from #data) all_institutions
        left join #data
            on #tally_date.AllMonths = #data.ReportingMonth
            and all_institutions.Institute = #data.Institution
        -- use CROSS APPLY to get the last reported value of each Institution
        cross apply (select top 1 Totals
                    from #data
                    where Institution = Institute
                    order by ReportingMonth desc
                    ) as last_reported
    where AllMonths < '20180701'
    order by all_institutions.Institute
        , AllMonths
    ;

    ** Don't mistake the ‘stupidity of the crowd’ for the ‘wisdom of the group’! **
  • The best option would be to use a time/date dimension table if you have one.  If you don't, you can easily find one.  Using it would avoid the loop.  I forgot to mention that.

    With this table, you would have a column for the day number of the month.  You could then query the table like this, "give me all dates where day is 1 and dates are between @DateRangeStart and @DateRangeEnd"

  • Thank you HanShi
    Looking into your solution too.

    You guys have no idea how much I appreciate you taking the time to help me out with this. It's been so frustrating ... it looks simple and yet it hasn't been!

    Thanks

  • Still need to learn LEAD/LAG but would think one of these would be something to use with SQL Server 2012 and newer.

  • romina - Monday, June 25, 2018 8:28 AM

    Thank you HanShi
    Looking into your solution too.

    You guys have no idea how much I appreciate you taking the time to help me out with this. It's been so frustrating ... it looks simple and yet it hasn't been!

    Thanks

    A solution using analytical function max() over(order by). This looks at all the available dates for an institution and generates 12 rows one for each month. I have slightly modified the input DML statements to test various scenarios. More complicated than the earlier solutions i guess but made more generic


    CREATE TABLE data
    (
      ReportingMonth datetime,
      Institution varchar(4),
      Totals integer
    );
    GO

    INSERT INTO data
    VALUES
    ('2018-01-01','Ins1',32),
    ('2018-02-01','Ins1',37),
    ('2018-03-01','Ins1',32),
    ('2018-04-01','Ins1',40),
    ('2018-05-01','Ins1',38),
    ('2018-06-01','Ins1',37),
    ('2018-09-01','Ins1',40),
    ('2019-09-01','Ins1',42),
    ('2018-01-01','Ins2',145),
    ('2018-02-01','Ins2',155),
    ('2018-01-01','Ins3',1489),
    ('2018-02-01','Ins3',1487),
    ('2018-03-01','Ins3',1485);
    GO

    with generate_12_months
    as (
       select dateadd(month
                      ,y.rnk-1
                      ,cast(concat(year_val,'-','01','-','01') as date)
                     ) as running_months
              ,x.institution 
          from (select top 12
                       row_number() over(order by (select null)) as rnk
                  from sys.objects
                )y
           join (select distinct /* for each distinct year in a institution generates 12 months*/
                        datepart(year,reportingmonth) as year_val
                        ,Institution
                    from data
                 )x
               on 1=1
        )
    ,interim_data
     as (
         select cd.running_months
                ,cd.institution
                ,d.totals
                ,d.reportingmonth        
            from generate_12_months cd
       left join data d
              on cd.running_months=d.reportingmonth
             and cd.institution=d.institution
          )
    ,interim_data2
      as(/*the max reportingmonth gets the first not null value of the month ordered by running_months*/
         select *
                ,max(x.reportingmonth) over(partition by x.institution order by x.running_months) as max_val
           from interim_data x
          )
    select *
           ,max(y.totals) over(partition by y.institution,y.max_val) as not_null_totals
      from interim_data2 y

  • Lynn Pettis - Monday, June 25, 2018 11:45 AM

    Still need to learn LEAD/LAG but would think one of these would be something to use with SQL Server 2012 and newer.

    I completely agree.  Here is a solution (untested, since I don't currently have access to a SQL environment).  I used the temp tables supplied earlier.

    if OBJECT_ID('tempdb..#data') is not null
        drop table #data
    CREATE TABLE #data
    (
      ReportingMonth datetime,
      Institution varchar(4),
      Totals integer
    );

    -- create a temp table with the sample data
    INSERT INTO #data
    VALUES
    ('20180101','Ins1',32),
    ('20180201','Ins1',37),
    ('20180301','Ins1',32),
    ('20180401','Ins1',40),
    ('20180501','Ins1',38),
    ('20180601','Ins1',37),
    ('20180101','Ins2',145),
    ('20180201','Ins2',155),
    ('20180101','Ins3',1489),
    ('20180201','Ins3',1487),
    ('20180301','Ins3',1485);

    -- create a TALLY table to list all the required dates
    if OBJECT_ID('tempdb..#tally_date') is not null
        drop table #tally_date
    CREATE TABLE #tally_date (AllMonths datetime)
    INSERT INTO #tally_date
    VALUES
    ('20180101'),
    ('20180201'),
    ('20180301'),
    ('20180401'),
    ('20180501'),
    ('20180601'),
    ('20180701'),
    ('20180801'),
    ('20180901'),
    ('20181001'),
    ('20181101'),
    ('20181201')
    ;

    DECLARE @RefDate DATETIME = '20180701'
    WITH Data_Intervals AS
    (
     SELECT
      d.Institution,
      d.Totals,
      d.ReportingMonth AS StartDate,
      LEAD(d.ReportingMonth, 1, '9000-12-01') OVER( PARTITION BY d.Institution ORDER BY d.ReportingMonth) AS EndDate
     FROM #data d
    )
    SELECT td.AllMonths, di.Institution, di.Totals
    FROM Data_Intervals di
    INNER JOIN #tally_date td
     ON di.StartDate <= di.AllMonths
      AND td.AllMonths < di.EndDate
    WHERE td.AllMonths < @RefDate
    ;

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Here is another option:


    Declare @startDate date = '2018-01-01'
      , @endDate date = '2018-06-01';

     With reportDates
      As (
    Select ReportDate = dateadd(month, -d.rn, @endDate)
     From (Select row_number() over(Order By ac.[object_id]) - 1 As rn From sys.all_columns ac) As d
    Where dateadd(month, -d.rn, @endDate) >= @startDate 
       )
      , distinctInstitutions
      As (
    Select Distinct
       d.Institution
     From #data d
       )
    Select r.ReportDate
      , i.Institution
      , Totals = coalesce(d.Totals, t.Totals, 0)
     From reportDates          r
    Cross Join distinctInstitutions     i
     Left Join #data          d On d.ReportingMonth = r.ReportDate
                     And d.Institution = i.Institution
    Outer Apply (Select Top 1
            d3.Totals
          From #data d3
         Where d3.ReportingMonth < r.ReportDate
          And d3.Institution = i.Institution
          Order By
            d3.ReportingMonth desc)  t
    Order By
       i.Institution
      , r.ReportDate;

    This builds the reporting dates based on the start/end dates you define - this also assumes that you will always have data from the beginning of the date range.  If you don't have data on the first date for each institution those will return a 0 until you have data available.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • drop table temp1;

    create table temp1 ([date] date,Institution varchar(50),Totals int)

    INSERT INTO temp1

    VALUES

    ('2018-01-01','Ins1',32),

    ('2018-02-01','Ins1',37),

    ('2018-03-01','Ins1',32),

    ('2018-04-01','Ins1',40),

    ('2018-05-01','Ins1',38),

    ('2018-06-01','Ins1',37),

    ('2018-01-01','Ins2',145),

    ('2018-02-01','Ins2',155),

    ('2018-01-01','Ins3',1489),

    ('2018-02-01','Ins3',1487),

    ('2018-03-01','Ins3',1485);

    GO

    drop table temp2

    create table temp2 ([date] date)

    declare @startDate date,@endDate date

    set @startDate = '2018-01-01'

    set @endDate = '2018-06-01'

    truncate table temp2

    while (@startDate <= @endDate)

    begin

    insert into temp2 values(@startDate)

    set @startDate = dateadd(dd,1,Eomonth(@startDate))

    end

    drop table #ReportDtl

    ;with cte_tmp as (select * from temp2, (select distinct institution from temp1 where date<=@endDate) t)

    select t2.[date],t2.institution,t1.Totals into #ReportDtl

    from temp1 t1 right outer join cte_tmp t2

    on t1.[date] = t2.[date] and t1.institution = t2.institution order by 2,1

    UPDATE sf

    SET sf.Totals = sf3.Totals

    FROM #ReportDtl sf3,#ReportDtl sf

    WHERE sf3.date = (SELECT MAX(date) FROM #ReportDtl sf2 WHERE sf2.date <

    sf.date and sf2.Totals IS NOT NULL AND sf2.institution = sf.institution)

    and sf3.institution = sf.institution

    and sf.Totals IS NULL;

    select * from #ReportDtl;

  • Sudhakar Ramakrishnan - Wednesday, June 27, 2018 9:10 AM

    drop table temp1;

    create table temp1 ([date] date,Institution varchar(50),Totals int)

    INSERT INTO temp1

    VALUES

    ('2018-01-01','Ins1',32),

    ('2018-02-01','Ins1',37),

    ('2018-03-01','Ins1',32),

    ('2018-04-01','Ins1',40),

    ('2018-05-01','Ins1',38),

    ('2018-06-01','Ins1',37),

    ('2018-01-01','Ins2',145),

    ('2018-02-01','Ins2',155),

    ('2018-01-01','Ins3',1489),

    ('2018-02-01','Ins3',1487),

    ('2018-03-01','Ins3',1485);

    GO

    drop table temp2

    create table temp2 ([date] date)

    declare @startDate date,@endDate date

    set @startDate = '2018-01-01'

    set @endDate = '2018-06-01'

    truncate table temp2

    while (@startDate <= @endDate)

    begin

    insert into temp2 values(@startDate)

    set @startDate = dateadd(dd,1,Eomonth(@startDate))

    end

    drop table #ReportDtl

    ;with cte_tmp as (select * from temp2, (select distinct institution from temp1 where date<=@endDate) t)

    select t2.[date],t2.institution,t1.Totals into #ReportDtl

    from temp1 t1 right outer join cte_tmp t2

    on t1.[date] = t2.[date] and t1.institution = t2.institution order by 2,1

    UPDATE sf

    SET sf.Totals = sf3.Totals

    FROM #ReportDtl sf3,#ReportDtl sf

    WHERE sf3.date = (SELECT MAX(date) FROM #ReportDtl sf2 WHERE sf2.date <

    sf.date and sf2.Totals IS NOT NULL AND sf2.institution = sf.institution)

    and sf3.institution = sf.institution

    and sf.Totals IS NULL;

    select * from #ReportDtl;

    WHILE loops tend to perform horribly.  There were several solutions offered earlier in the thread  which did not require a WHILE loop.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Perhaps the simplest way to demonstrate what is going on is:

    If you need to show each month for each institution, regardless of any activity for given institution in given month,
    you need to generate data set where each month is combined with each institution.

    First, we need a table with all the months we want to report for:?

    IF Object_ID( 'tempdb..#ReportingMonths' ) IS NOT NULL
            DROP TABLE #ReportingMonths
    ;
    GO
    CREATE TABLE #ReportingMonths
    ( ReportingMonth datetime NOT NULL )
    GO
    INSERT INTO #ReportingMonths
    VALUES
    ('2018-01-01'),
    ('2018-02-01'),
    ('2018-03-01'),
    ('2018-04-01'),
    ('2018-05-01'),
    ('2018-06-01')
    ;
    GO
    SELECT * FROM #ReportingMonths
    ;
    GO

    Now you need to combine each row from #ReportingMonths with each Institution. I assume you have a table
    where you keep Institutions, one row per Institution, similar to this:
    --2018-Jun-27 12:29:07--
    IF Object_ID( 'tempdb..#Institution' ) IS NOT NULL
            DROP TABLE #Institutions
    ;
    GO
    SELECT     DISTINCT
            Institution
    INTO #Institutions
    FROM #Data
    ;
    GO
    SELECT * FROM #Institutions;
    GO

    Following query will combine each row from #ReportingMonths with each row from #Institutions:

    SELECT
              D.Institution
        ,     M.ReportingMonth
    FROM     #ReportingMonths AS M
    ,        #Institutions     AS D
    ;

    The  final query would look like this:
    WITH CrossJoinPart
        AS
        (
            SELECT
                      D.Institution
                ,     M.ReportingMonth
            FROM     #ReportingMonths AS M
            ,        #Institutions     AS D
        )
    SELECT
              X.Institution
        ,     X.ReportingMonth
        ,     D.Totals
    FROM         CrossJoinPart     AS X    
    LEFT JOIN     #Data             AS D    
            ON X.ReportingMonth = D.ReportingMonth
                AND X.Institution = D.Institution
    ORDER BY Institution , ReportingMonth
    ;

    No loops, no Window functions, no APPLY.

    🙂

  • romina - Sunday, June 24, 2018 11:25 PM

    Hi there,

    I have been stuck on this for days and I need submitting this report in the next two days.

    This is the data I am working with:

    And this is what I need:

    Essentially, If I am reporting for June I need to have all the months even for those institutions who only have data as of February or March perhaps and need to add the same totals for those missing months as the total of the last month.

    Anyone can help please?

    Here is the SQL code to create the data:

    CREATE TABLE #data
    (
        ReportingMonth datetime,
        Institution varchar(4),
        Totals integer
    );
    GO

    INSERT INTO #data
    VALUES
    ('2018-01-01','Ins1',32),
    ('2018-02-01','Ins1',37),
    ('2018-03-01','Ins1',32),
    ('2018-04-01','Ins1',40),
    ('2018-05-01','Ins1',38),
    ('2018-06-01','Ins1',37),
    ('2018-01-01','Ins2',145),
    ('2018-02-01','Ins2',155),
    ('2018-01-01','Ins3',1489),
    ('2018-02-01','Ins3',1487),
    ('2018-03-01','Ins3',1485);
    GO

    SELECT * FROM #data

    How about we go entirely dynamic instead of hard coding the needed dates?   I assumed that the date values already in the table set the range of dates by just taking the minimum and maximum values.
    CREATE TABLE #data (
        ReportingMonth datetime,
        Institution varchar(4),
        Totals integer
    );
    INSERT INTO #data (ReportingMonth, Institution, Totals)
        VALUES    ('2018-01-01','Ins1',32),
                ('2018-02-01','Ins1',37),
                ('2018-03-01','Ins1',32),
                ('2018-04-01','Ins1',40),
                ('2018-05-01','Ins1',38),
                ('2018-06-01','Ins1',37),
                ('2018-01-01','Ins2',145),
                ('2018-02-01','Ins2',155),
                ('2018-01-01','Ins3',1489),
                ('2018-02-01','Ins3',1487),
                ('2018-03-01','Ins3',1485);

    SELECT *
    FROM #data;

    DECLARE @StartMonth AS date,
            @EndMonth    AS date;

    SELECT    @StartMonth = MIN(ReportingMonth),
            @EndMonth    = MAX(ReportingMonth)
    FROM #data;

    WITH Numbers AS (

        SELECT 1 AS N UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
        SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
        SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
    ),
        DATES AS (

            SELECT TOP (DATEDIFF(month, @StartMonth, @EndMonth) + 1)
                DATEADD(month, ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) - 1, @StartMonth) AS ReportingMonth
            FROM Numbers AS N1
                CROSS JOIN Numbers AS N2
    ),
        Institutions AS (

            SELECT Institution, MAX(ReportingMonth) AS MaxMonth
            FROM #data
            GROUP BY Institution
    ),
        MAX_MONTH_TOTALS AS (

            SELECT D.Institution, I.MaxMonth, D.Totals
            FROM #data AS D
                INNER JOIN Institutions AS I
                    ON D.Institution = I.Institution
                    AND D.ReportingMonth = I.MaxMonth
    )
    SELECT
        DT.ReportingMonth,
        I.Institution,
        CASE
            WHEN DT.ReportingMonth > MMT.MaxMonth THEN MMT.Totals
            ELSE D.Totals
        END
    FROM DATES AS DT
        CROSS JOIN Institutions AS I
        INNER JOIN MAX_MONTH_TOTALS AS MMT
            ON I.Institution = MMT.Institution
        LEFT OUTER JOIN #data AS D
            ON DT.ReportingMonth = D.ReportingMonth
            AND I.Institution = D.Institution
    ORDER BY
        I.Institution,
        DT.ReportingMonth;

    DROP TABLE #data;

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • sgmunson - Thursday, June 28, 2018 9:30 AM

    romina - Sunday, June 24, 2018 11:25 PM

    Hi there,

    I have been stuck on this for days and I need submitting this report in the next two days.

    This is the data I am working with:

    And this is what I need:

    Essentially, If I am reporting for June I need to have all the months even for those institutions who only have data as of February or March perhaps and need to add the same totals for those missing months as the total of the last month.

    Anyone can help please?

    Here is the SQL code to create the data:

    CREATE TABLE #data
    (
        ReportingMonth datetime,
        Institution varchar(4),
        Totals integer
    );
    GO

    INSERT INTO #data
    VALUES
    ('2018-01-01','Ins1',32),
    ('2018-02-01','Ins1',37),
    ('2018-03-01','Ins1',32),
    ('2018-04-01','Ins1',40),
    ('2018-05-01','Ins1',38),
    ('2018-06-01','Ins1',37),
    ('2018-01-01','Ins2',145),
    ('2018-02-01','Ins2',155),
    ('2018-01-01','Ins3',1489),
    ('2018-02-01','Ins3',1487),
    ('2018-03-01','Ins3',1485);
    GO

    SELECT * FROM #data

    How about we go entirely dynamic instead of hard coding the needed dates?   I assumed that the date values already in the table set the range of dates by just taking the minimum and maximum values.
    CREATE TABLE #data (
        ReportingMonth datetime,
        Institution varchar(4),
        Totals integer
    );
    INSERT INTO #data (ReportingMonth, Institution, Totals)
        VALUES    ('2018-01-01','Ins1',32),
                ('2018-02-01','Ins1',37),
                ('2018-03-01','Ins1',32),
                ('2018-04-01','Ins1',40),
                ('2018-05-01','Ins1',38),
                ('2018-06-01','Ins1',37),
                ('2018-01-01','Ins2',145),
                ('2018-02-01','Ins2',155),
                ('2018-01-01','Ins3',1489),
                ('2018-02-01','Ins3',1487),
                ('2018-03-01','Ins3',1485);

    SELECT *
    FROM #data;

    DECLARE @StartMonth AS date,
            @EndMonth    AS date;

    SELECT    @StartMonth = MIN(ReportingMonth),
            @EndMonth    = MAX(ReportingMonth)
    FROM #data;

    WITH Numbers AS (

        SELECT 1 AS N UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
        SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
        SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
    ),
        DATES AS (

            SELECT TOP (DATEDIFF(month, @StartMonth, @EndMonth) + 1)
                DATEADD(month, ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) - 1, @StartMonth) AS ReportingMonth
            FROM Numbers AS N1
                CROSS JOIN Numbers AS N2
    ),
        Institutions AS (

            SELECT Institution, MAX(ReportingMonth) AS MaxMonth
            FROM #data
            GROUP BY Institution
    ),
        MAX_MONTH_TOTALS AS (

            SELECT D.Institution, I.MaxMonth, D.Totals
            FROM #data AS D
                INNER JOIN Institutions AS I
                    ON D.Institution = I.Institution
                    AND D.ReportingMonth = I.MaxMonth
    )
    SELECT
        DT.ReportingMonth,
        I.Institution,
        CASE
            WHEN DT.ReportingMonth > MMT.MaxMonth THEN MMT.Totals
            ELSE D.Totals
        END
    FROM DATES AS DT
        CROSS JOIN Institutions AS I
        INNER JOIN MAX_MONTH_TOTALS AS MMT
            ON I.Institution = MMT.Institution
        LEFT OUTER JOIN #data AS D
            ON DT.ReportingMonth = D.ReportingMonth
            AND I.Institution = D.Institution
    ORDER BY
        I.Institution,
        DT.ReportingMonth;

    DROP TABLE #data;

    The problem with this approach is that it requires at least three scans of the table: once for determining the institutions; once for determining the min/max dates; and once for the final results.  Here is a test approach that only requires one scan.  You may need to increase the size of the tally table if you have gaps larger than 10 months.

    ;
    WITH

    Tally AS
    (
        SELECT ROW_NUMBER() OVER(ORDER BY @@VERSION) - 1 AS n FROM (VALUES(0), (0), (0), (0), (0), (0), (0), (0), (0), (0)) c(n)
    )
    , ReportingIntervals AS
    (
        SELECT
            Institution,
           
    Totals,
           
    ReportingMonth,
           
    DATEDIFF(MONTH, ReportingMonth, LEAD(ReportingMonth, 1, DATEADD(MONTH, 1, GETDATE())) OVER(PARTITION BY Institution ORDER BY ReportingMonth)) AS ReportMonthDuration
        FROM #data
    )
    SELECT DATEADD(MONTH, n, ReportingMonth) AS ReportingMonth, Institution, Totals
    FROM ReportingIntervals ri
    INNER JOIN Tally t
        ON t.n < ri.ReportMonthDuration
    ;

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply