# AVG based on condition

• Likiraght

Valued Member

Points: 63

Hi, i need the AVERAGE flow between two dates: TODAY() and the most recent date that respect this condition Flow less than 20 for at leat 5 days in a row (could be more than 5 days).

Here's a sample:

`TIMESTAMP       FLOW18/07/2019          10017/07/2019           11016/07/2019          12015/07/2019          12014/07/2019          11013/07/2019           1312/07/2019           12011/07/2019           12010/07/2019          10009/07/2019          1208/07/2019          1007/07/2019          10006/07/2019         12005/07/2019          304/07/2019          503/07/2019          602/07/2019          401/07/2019           330/06/2019         30029/06/2019         200?`

The result for this sample is the AVG of the flow between 18/07/2019 and 05/07/2019.

• Jonathan AC Roberts

SSCoach

Points: 17159

comment deleted

• Jeff Moden

SSC Guru

Points: 995643

Hi Likiraght and welcome aboard!

Your question is good and so is your test data but you'll find that almost everyone likes to test their coded solutions before they post a reply.  They (including me) don't want to have to take the time to convert your test data to code and many heavy hitters on this forum will simply skip over posts that don't provide the test data in a "readily consumable" format.

Since you're brand new here, I've converted your test data into "readily consumable" code for you as follows (it's just one of many methods that result in the same thing... "readily consumable" test data):

`--===== Create the given test data in a readily consumable format.     -- You should do this whenever you post test data.     IF OBJECT_ID('tempdb..#TestData','U') IS NOT NULL        DROP TABLE #TestData; SELECT [TimeStamp] = CONVERT(DATE,v.BrFrDate,103)        ,Flow       = CONVERT(INT,v.Flow)   INTO #TestData   FROM (VALUES         ('18/07/2019',100)        ,('17/07/2019',110)        ,('16/07/2019',120)        ,('15/07/2019',120)        ,('14/07/2019',110)        ,('13/07/2019',13)        ,('12/07/2019',120)        ,('11/07/2019',120)        ,('10/07/2019',100)        ,('09/07/2019',12)        ,('08/07/2019',10)        ,('07/07/2019',100)        ,('06/07/2019',120)        ,('05/07/2019',3)        ,('04/07/2019',5)        ,('03/07/2019',6)        ,('02/07/2019',4)        ,('01/07/2019',3)        ,('30/06/2019',300)        ,('29/06/2019',200)        )v(BrFrDate,Flow);`

Here's one solution to your problem... it uses a rather old fashion method of subtracting a partitioned ever-increasing number from the date to form group "numbers", which appear to be dates but are just a way of grouping rows.  I've also taken the liberty to include a few more things that I think they may ask you for in the future.

`WITH cteGrp AS(--==== Substact an increasing number of days from the TimeStamp to form groups of dates based on     -- whether or not they have a FLOW of < 20. SELECT  GrpNum = DATEADD( dd                          ,-ROW_NUMBER() OVER (PARTITION BY CASE WHEN Flow >= 20 THEN 1 ELSE 0 END                                                ORDER BY [TimeStamp])                          ,[TimeStamp])        ,[TimeStamp]        ,Flow   FROM #TestData),cteCnt AS(--===== Determine a count for each group so we can figure out if there are 5 or more in a group. SELECT  GrpNum        ,GrpCnt = COUNT(*) OVER (PARTITION BY GrpNum)        ,[TimeStamp]        ,Flow   FROM cteGrp),cteCutOffDate AS(--==== Find the maximum TimeStamp of any group having a FLOW of <20 for 5 days or more SELECT CutOffDate = MAX([TimeStamp])   FROM cteCnt  WHERE Flow < 20 AND GrpCnt >=5) SELECT  StartDate      = MIN([TimeStamp])        ,EndDate        = MAX([TimeStamp])        ,AvgFlow        = AVG(Flow+0.0)        ,DaysMeasured   = COUNT(*)        ,TotalFlow      = SUM(Flow)   FROM #TestData td  WHERE [TimeStamp] > (SELECT CutOffDate FROM cteCutOffDate);`

For the test data you provided, that results in the following:

I suppose there's probably a simpler solution that uses Lead/Lag but I'm currently working on a machine that only has SQL Server 2008 on it and so don't have the ability to use Lead/Lag.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
"If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
"Change is inevitable... change for the better is not."

How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

• Jeff Moden

SSC Guru

Points: 995643

p.s.  If you'd like to learn more about the very useful method of conditionally grouping dates by simple subtraction of an increasing number, please see the following article.

https://www.sqlservercentral.com/articles/group-islands-of-contiguous-dates-sql-spackle

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
"If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
"Change is inevitable... change for the better is not."

How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

• drew.allen

SSC Guru

Points: 76707

A couple of other points.

• Always use a temp table or a table variable when providing sample data.

• It obviates the need for cleanup.

• Use an international date format instead of a localized date format.

• YYYYMMDD

• There is newish syntax that greatly simplifies conditionally dropping a table.  It's much more succinct and descriptive.

• `DROP TABLE IF EXISTS #TestData`

Here is my take.

`-- I used #flows rather than #TestData.WITH low_flows AS(	SELECT *, CASE WHEN MAX(FLOW) OVER( ORDER BY f.[TIMESTAMP] ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) < 20 THEN 1 ELSE 0 END AS low_flow	FROM #flows AS f), last_low_flows AS(	SELECT *, MAX(CASE WHEN lf.low_flow = 1 THEN lf.[TIMESTAMP] END) OVER() AS last_low_flow	FROM low_flows AS lf)SELECT AVG(llf.FLOW), llf.last_low_flow, MAX(llf.[TIMESTAMP])FROM last_low_flows AS llfWHERE llf.[TIMESTAMP] BETWEEN llf.last_low_flow AND GETDATE()GROUP BY llf.last_low_flow;`

And here is a comparison of the scans and reads:

`DREWTable 'Worktable'. Scan count 3, logical reads 45, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.Table '#flows. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.JEFFTable '#TestData'. Scan count 2, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.Table 'Worktable'. Scan count 3, logical reads 65, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.`

J. Drew Allen