Window Function Question - 3 month rolling average

  • Hi Everyone,

    I am trying to achieve a 3 month rolling average calculation with the first two rows (or two years) of each company's calculated average column returning a NULL.

    I have the basic SQL working for a small subset of data for the first company, but when I add a second company's worth of data into the table I'm unsure of why the first two rows for that company do not return NULL like they do for company ABC.

    CompnayRevenue YearRevenue AmountThreeMonthRunningAvg

    ABC201215654125.71NULL

    ABC201316336060.44NULL

    ABC201416316640.4916102275.5466

    ABC20158286560.4613646420.4633

    ABC20166599912.4610401037.8033

    ABC20176603495.80977163322.9099

    ABC20183618616.98665607341.7521

    ABC20192232067.744151393.5121

    ABC2020372011.292074232.0055

    ABC20210.00868026.3433

    DEF20122452324.12941445.1366

    DEF20133897922.142116748.7533

    DEF20144342007.983564084.7466

    DEF201513284278.397174736.17

    DEF201615043320.556110889868.9753

    DEF201714835750.305914387783.084

    DEF201816343303.652115407458.1713

    DEF201916209570.902115796208.2867

    DEF202011115708.557614556194.3706

    DEF20219023715.854612116331.7714

    Here is the SQL that I'm using to generate the results above:

    SELECT Company

    ,[Revenue Year]

    ,[Revenue Amount]

    ,CASE

    WHEN COUNT(*) OVER (

    ORDER BY Company

    ,[Revenue Year] ROWS BETWEEN UNBOUNDED PRECEDING

    AND CURRENT ROW

    ) > 2

    THEN AVG([Revenue Amount]) OVER (

    ORDER BY Company

    ,[Revenue Year] ROWS BETWEEN 2 PRECEDING

    AND CURRENT ROW

    )

    ELSE NULL

    END ThreeMonthRunningAvg

    FROM YHL.NAICS_Company_Revenue

    ORDER BY 1

    ,2

    I've changed the window of the query to just the company and then just the revenue amount, but I still must be missing something.

    When this table is fully loaded there will be about 10 or so company's worth of data ranging from 2012-2021. What I'm trying to achieve is a 3 year running average by company with the first two rows for each company to come back as a NULL (since they're isn't 3 years available for the avg calculation).

    Any tips or suggestions are much appreciated.

    Thanks for your time.

  • Where's the PARTITION BY Company part of your query? PARTITION BY is analogous to GROUP BY... and it seems to be missing, so the totals/counts won't "restart" when the Company changes.

    Should it not be... Note the "PARTITION BY Company" in both calculations

    SELECT Company

    ,[Revenue Year]

    ,[Revenue Amount]

    ,CASE

    WHEN COUNT(*) OVER (

    PARTITION BY Company

    ORDER BY Company

    ,[Revenue Year] ROWS BETWEEN UNBOUNDED PRECEDING

    AND CURRENT ROW

    ) > 2

    THEN AVG([Revenue Amount]) OVER (

    PARTITION BY Company

    ORDER BY Company

    ,[Revenue Year] ROWS BETWEEN 2 PRECEDING

    AND CURRENT ROW

    )

    ELSE NULL

    END ThreeMonthRunningAvg

    FROM YHL.NAICS_Company_Revenue

    ORDER BY 1

    ,2

  • Thanks for the reply. I missed the partition by clause and that has definitely helped -- appreciate you pointing that out.

    I thought I had the 3 month rolling average calculation correct, but it looks like I need to shift the frame of the window function by one row and I'm unsure of exactly how to do this.

    Here is the logic that I'm currently using:

    SELECT Company

    ,[Revenue Year]

    ,[Revenue Amount]

    ,CASE

    WHEN COUNT(*) OVER (

    PARTITION BY Company ORDER BY Company

    ,[Revenue Year]

    ) > 3

    THEN AVG([Revenue Amount]) OVER (

    PARTITION BY Company ORDER BY Company

    ,[Revenue Year] ROWS BETWEEN 3 PRECEDING

    AND CURRENT ROW

    )

    ELSE NULL

    END ThreeMonthRunningAvg

    FROM myTable

    ORDER BY 1

    ,2

    This is producing these results:

    CompanyRevenue YearRevenue AmountThreeMonthRunningAvg

    ABC2012 15654125.71NULL

    ABC2013 16336060.44NULL

    ABC2014 16316640.49NULL

    ABC2015 8286560.46 14148346.775

    ABC2016 6599912.46 11884793.4625

    What our folks are wanting to see is that when the year is 2015 they want a three year running average of 2012,2013,2014 so it is almost like i'd need to specify something like (ROWS BETWEEN 3 PRECEDING AND CURRENT ROW -1). I know that syntax is invalid, but is there a way to have the avg calculation only include the 3 prior years and not the current row like I have now?

    Thanks so much!

  • A couple things.

    First, the order by company in your window clause does not make any sense, I think you want it to be sorting by a date. Second, you don't need to include "ELSE NULL" in a CASE statement, the default behavior is to return a NULL. Third, to shift the rolling average by one month you would simply change "ROWS BETWEEN 3 PRECEDING AND CURRENT ROW" to "ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING". Note the sample data below.

    -- sample data

    DECLARE @table TABLE

    (

    revid int identity not null,

    company varchar(10) not null,

    revdate date not null,

    revAmt int

    );

    INSERT @table (company, revdate, revAmt)

    SELECT TOP (6) 'A', getdate()+checksum(newid())%100, abs(checksum(newid())%100)+1

    FROM sys.all_columns

    UNION ALL

    SELECT TOP (6) 'B', getdate()+checksum(newid())%100, abs(checksum(newid())%100)+1

    FROM sys.all_columns;

    -- solutions

    SELECT

    company,

    revdate,

    revAmt,

    CASE

    WHEN COUNT(*) OVER (PARTITION BY company ORDER BY revdate) >= 3

    THEN

    AVG(revAmt) OVER

    (

    PARTITION BY company

    ORDER BY revdate

    ROWS BETWEEN 3 PRECEDING AND CURRENT ROW

    )

    END, -- this month and the previous 2

    CASE

    WHEN COUNT(*) OVER (PARTITION BY company ORDER BY revdate) > 3

    THEN

    AVG(revAmt) OVER

    (

    PARTITION BY company

    ORDER BY revdate

    ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING

    )

    END -- shift the rows 1 back, note the different result

    FROM @table

    ORDER BY company, revdate; -- not required, using for display only. Remove this.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Just looking at this again, it appears (based on your sample data) that you are looking to do a rolling 3 year average. In that case, you want something like this:

    -- (1) sample data

    IF OBJECT_ID('tempdb..#YearlySalesByCo') IS NOT NULL DROP TABLE #YearlySalesByCo;

    CREATE TABLE #YearlySalesByCo

    (

    company varchar(10) not null,

    [revenue year] smallint not null,

    revenue int not null

    );

    WITH x AS (SELECT yr FROM (VALUES (2009),(2010),(2011),(2012),(2013),(2014),(2015)) x(yr))

    INSERT #YearlySalesByCo (company, [revenue year], revenue)

    SELECT 'A', yr, abs(checksum(newid())%100000)+10000 FROM x

    UNION ALL

    SELECT 'B', yr, abs(checksum(newid())%100000)+10000 FROM x;

    -- (2) You want this POC index

    CREATE UNIQUE NONCLUSTERED INDEX poc_YearlySalesByCo ON #YearlySalesByCo(company, [revenue year])

    INCLUDE (revenue);

    -- (3) Solution

    SELECT

    company,

    [revenue year],

    revenue,

    [3YearAverage] =

    CASE

    WHEN COUNT(*) OVER (PARTITION BY company ORDER BY [revenue year]) >= 3

    THEN AVG(revenue) OVER (PARTITION BY company ORDER BY [revenue year])

    END

    FROM

    (

    SELECT

    company,

    YrNbr = DENSE_RANK() OVER (PARTITION BY company ORDER BY [revenue year]),

    [revenue year],

    revenue = SUM(revenue)

    FROM #YearlySalesByCo

    GROUP BY Company, [revenue year]

    ) AS SalesByCoByYrByMo;

    GO

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Alan.B thank you very much for the two replies. I have learned a lot from the SQL that you posted and I'm grateful for your time.

    The first reply gave me what I needed after I adjusted the frame from

    ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING to ROWS BETWEEN 3 PRECEDING AND 1 PRECEDING

    Again, thank you very much!!

  • np!

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply