Rolling Max Value

Question

Post reply

Rolling Max Value

Xedni

SSCertifiable

Points: 6951
More actions
August 21, 2014 at 1:03 pm

#290052

Hi SSC,
I've got a problem I'm trying to solve as efficiently as possible. I've attached simplified data, but to help conceptualize it, I want to, for each day in the life of a security, figure out what the highest value was over the past 52 weeks.
In SQL Server 2012+, this is a piece of cake using max() over()... and a row frame, but alas, we don't have 2012+ on any prod boxes.
Two Questions:
1) does anyone know an efficient way to do this (p.s. Quirky updates came to mind, but I couldn't figure out how to apply it to this problem)
2) Does anyone know roughly how the max() over() /w row frame works under the covers? Maybe I can reverse engineer it in an efficient manner
use master
go
set nocount on
go
/**********************
** BUILD SAMPLE DATA **
**********************/
if object_id('tempdb.dbo.#dat') is not null drop table #dat
create table #dat
(
RID int identity(1,1) primary key clustered,
Value float,
MaxRollingValue float
)
insert into #dat(Value)
select top 10000 (abs(checksum(newid())) % 1000) * rand()
from sys.objects a, sys.objects b
/******************
** SET MAX VALUE **
******************/
--Super efficient, but 2012+ only
--1 Scan, 38 LR
select
RID,
MaxRollingValue = max(Value) over (order by (select null)
rows between 10 preceding and current row)
from #dat
--Simple to understand, but horrible performance.
--10001 scans, 31691 LR
select
d1.RID,
MaxRollingValue = max(d2.Value)
from #dat d1
inner join #dat d2
on d1.RID >= d2.RID
and d1.RID - 10 <= d2.RID
and d1.Value <= d2.Value
group by d1.RID
Executive Junior Cowboy Developer, Esq.[/url]

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply

Luis Cazares SSC Guru Points: 183698 More actions · Answer 1

The Quirky Update is easy to get, assuming that you already know the rules to follow.

Here's an example with your sample data.

DECLARE @Value float=0,

@id int

UPDATE t SET

@Value = MaxRollingValue = CASE WHEN Value > @Value THEN Value ELSE @Value END,

@id = RID

FROM #dat t WITH( TABLOCKX)

OPTION( MAXDOP 1)

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

Xedni SSCertifiable Points: 6951 More actions · Answer 2

Thanks Luis. The only problem there is it doesn't work for a rolling window.

If the highest value is at the tail end of the window and you iterate one row, the high must now change to the second highest value. Then if that remained the high till IT fell out of scope, you'd have to switch to the 3rd highest, and so on.

Executive Junior Cowboy Developer, Esq.[/url]

Luis Cazares SSC Guru Points: 183698 More actions · Answer 3

Can you give me an example on what you just said? Are you trying to do a ranking? Or am I misunderstanding you?

Partitioning (using windows) the results is possible, but I'm trying to figure what do you need.

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

Xedni SSCertifiable Points: 6951 More actions · Answer 4

Sure. So turning the data set on its side, you could think of the sequence of values like this:

5,3,2,3,1,4,6....

The update you described works if you just want to know the highest value at any point along the whole series, but I want to know what the highest value was only for the past (lets say) 3 records. So let's start reading from the left to the right (with the current frame indicated by brackets)

1) 5],3,2,3,1,4,6. 5 is the highest (by default)

2) 5,3],2,3,1,4,6. 5 is still the highest

3) [5,3,2],3,1,4,6. 5 is still the highest

4) 5,[3,2,3],1,4,6. 3 is now the highest, but you have no way of knowing this since the highest value, against which we were previously comparing to determine the next in the series, is now out of scope.

If you're aware of a way to use an over clause to partition over a range without the use of ROWS or RANGE, I'd be interested to know.

Executive Junior Cowboy Developer, Esq.[/url]

Luis Cazares SSC Guru Points: 183698 More actions · Answer 5

I totally missed that part and understand your pain.

For now, the logic indicates that the "wide" join you're using seems the best option with one slight modification. You can remove the last condition to reduce one third of the logical reads. At least that's been consistent on my tests.

I'll try to think further.

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

Xedni SSCertifiable Points: 6951 More actions · Answer 6

Hokay, so I think I have an algorithm to do this with a quirky update. There may be some bugs in it because there are so many combinations of how the elements in the window are arranged, but I think more or less, this does the trick.

Early on I had the idea to store off the second highest value, and then if the first highest value rolled off, the second highest value would become the first. Originally I was storing the second highest value in the window whether it was before or after the first highest value. But after I jotted it down on paper, it became clear that you only need to store the second highest value if its IN FRONT of the first highest. No value smaller than the first highest value, behind it in the sequence, would ever supplant the first highest.

Consider the following sequence and assume a rolling window of 3. F = first highest, S = second highest

6,2,5,3,7,1,1,3

•[6],2,5,3,7,1,1,3

oF = 6

oS = null

•[6,2],5,3,7,1,1,3

oF = 6

oS = 2

•[6,2,5],3,7,1,1,3

oF = 6

oS = 5

•6,[2,5,3],7,1,1,3

o6 falls off

oF = 5

oS = 3

•6,2,[5,3,7],1,1,3

oF = 7

oS = null (again, we’re only interested in second highest value IN FRONT of the first highest)

•6,2,5,[3,7,1],1,3

oF = 7

oS = 1

•6,2,5,3,[7,1,1],3

oF = 7

oS = 1

•6,2,5,3,7,[1,1,3]

o7 falls off

oF = 3

oS = null

All these rules can be done on successive rows, making a quirky update possible. This works for the randomized data I originally asked about, but to show it in SQL, I’ve hard coded this exact sequence into this script.

Note the use of @temp<name> variables. The operations need to take place on the variables as they were when they entered the update. If you were to use the actual variables, as soon as the first clause modified @first, it would throw off the clauses for subsequent variable assignments.

The windowed function in 2012+ I think still works better than mine, but this is considerably better than any of the RBAR solutions in terms of performance.

/**********************

** BUILD SAMPLE DATA **

**********************/

if object_id('tempdb.dbo.#dat') is not null drop table #dat

create table #dat

(

RID int identity(1,1) primary key clustered,

Value float,

MaxRollingValue float,

Debug varchar(max),

FirstRID int

)

go

insert into #dat(Value)

values (6),(2),(5),(3),(7),(1),(1),(3)

--select top 10000 (abs(checksum(newid())) % 1000) * rand()

--from sys.objects a, sys.objects b

declare

@tempFirst float,

@first float = -1,

@tempFirstRID int,

@firstRID int = 1,

@tempSecond float,

@second float = -1,

@tempSecondRID int,

@secondRID int = null,

@interval int = 3,

@Anchor int

update #dat with (tablockx)

set @Anchor = RID,

@tempfirst = case when Value > @first then value

when (RID - @Interval) = @firstRID and Value < @second then @second

when (RID - @Interval) = @firstRID and Value > @second then Value

else @first

end,

@tempfirstRID = case when Value > @first then RID

when (RID - @Interval) = @firstRID and Value < @second then @secondRID

when (RID - @Interval) = @firstRID and Value > @second then RID

else @firstRID

end,

@tempSecond = case when Value > @first then -1

when Value < @first and value > @second then value

when (RID - @Interval) = @firstRID then -1

else @second

end,

@tempSecondRID = case when (RID - @Interval) = @firstRID then null

when Value > @first then null

when Value < @first and Value > @second then RID

else @SecondRID

end,

@first = @tempFirst,

@firstRID = @tempFirstRID,

@second = @tempSecond,

@secondRID = @tempSecondRID,

MaxRollingValue = @first,

FirstRID = @firstRID,

Debug = concat('@first:', @first, ' @firstRID:', @firstRID, ' @second:', @second, ' @secondRID:', @secondRID)

option (maxdop 1)

select *

from #dat

order by RID

Executive Junior Cowboy Developer, Esq.[/url]

Eirikur Eiriksson SSC Guru Points: 182882 More actions · Answer 7

Here is a little "first coffee in the morning" solution:doze:

It uses two instances of a Tally CTE to generate a set N{N+0,,,,N+x} for framing the x rows preceding and current row (kind of).

😎

DECLARE @SET_SIZE INT = 10;

--SET STATISTICS IO ON;

;WITH T(N) AS (SELECT N FROM (VALUES (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)) AS X(N))

,NUMS(N) AS (SELECT TOP(SELECT COUNT(*) FROM #dat) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM T T1,T T2,T T3,T T4,T T5)

,NM10(N) AS (SELECT TOP (@SET_SIZE) N - 1 FROM NUMS)

,CALC_SET AS

(

SELECT

NM.N

,NM.N + N10.N AS NX

FROM NUMS NM

OUTER APPLY NM10 N10

)

SELECT

CS.N

,MAX(DT.Value) AS MAX_VAL

FROM CALC_SET CS

INNER JOIN #dat DT

ON CS.NX = DT.RID

GROUP BY CS.N

--ORDER BY CS.N

--SET STATISTICS IO OFF;

IO statistics :pinch:

Table 'Worktable'. Scan count 1, logical reads 20438, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table '#dat________________________________________________________________________________________________________________000000000023'. Scan count 3, logical reads 114, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Eirikur Eiriksson SSC Guru Points: 182882 More actions · Answer 8

After the second coffee I realised that the solution was in fact "current row and X following", here is a "X preceding and current row" improvement.

😎

use master

go

set nocount on

go

/**********************

** BUILD SAMPLE DATA **

**********************/

if object_id('tempdb.dbo.#dat') is not null drop table #dat

create table #dat

(

RID int identity(1,1) primary key clustered,

Value float,

MaxRollingValue float

)

insert into #dat(Value)

select top 10000 (abs(checksum(newid())) % 1000) * rand()

from sys.objects a, sys.objects b

DECLARE @SET_SIZE INT = 10;

SET STATISTICS IO ON;

;WITH T(N) AS (SELECT N FROM (VALUES (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)) AS X(N))

,NUMS(N) AS (SELECT TOP(SELECT COUNT(*) FROM #dat) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM T T1,T T2,T T3,T T4,T T5)

,NM10(N) AS (SELECT TOP (@SET_SIZE) N - 1 FROM NUMS)

,PRENUM(RID,Value) AS (SELECT -N AS RID,0 AS Value FROM NM10)

,CALC_SET AS

(

SELECT

NM.N

,NM.N + N10.N AS NX

FROM NUMS NM

OUTER APPLY NM10 N10

)

,EXTENDED_DAT AS

(

SELECT

RID

,Value

FROM PRENUM

UNION ALL

SELECT

RID

,Value

FROM #dat

)

SELECT

CS.N

,MAX(DT.Value) AS MAX_VAL

FROM CALC_SET CS

INNER JOIN EXTENDED_DAT DT

ON CS.NX = DT.RID

GROUP BY CS.N

ORDER BY CS.N

SET STATISTICS IO OFF;

Xedni SSCertifiable Points: 6951 More actions · Answer 9

Wow, that works quite well. Given that I discovered some flaws in my post where I thought I had it with the quirky update, I'll try using this against my actual data set and see if everything holds water.

Thanks!

Executive Junior Cowboy Developer, Esq.[/url]

Eirikur Eiriksson SSC Guru Points: 182882 More actions · Answer 10

JeeTee (8/22/2014)
Wow, that works quite well. Given that I discovered some flaws in my post where I thought I had it with the quirky update, I'll try using this against my actual data set and see if everything holds water.
Thanks!

Quick thought, I don't think the solution is perfect but it should be easy to improve. Haven't had time to look at it in more detail but I suspect that the leading in part, where N < @SET_SIZE might need improving.

😎

Dwain Camps SSC Guru Points: 86908 More actions · Answer 11

Stick with a quirky update:

use master

go

set nocount on

go

/**********************

** BUILD SAMPLE DATA **

**********************/

if object_id('tempdb.dbo.#dat') is not null drop table #dat

create table #dat

(

RID int identity(1,1) primary key clustered,

Value float,

MaxRollingValue float

)

insert into #dat(Value)

select top 10000 (abs(checksum(newid())) % 1000) * rand()

from sys.objects a, sys.objects b

/******************

** SET MAX VALUE **

******************/

--Super efficient, but 2012+ only

--1 Scan, 38 LR

--select

-- RID,

-- MaxRollingValue = max(Value) over (order by (select null)

-- rows between 10 preceding and current row)

--from #dat

--SELECT *

--FROM #dat;

--Simple to understand, but horrible performance.

--10001 scans, 31691 LR

select

d1.RID,

MaxRollingValue = max(d2.Value)

from #dat d1

inner join #dat d2

on d1.RID >= d2.RID

and d1.RID - 10 <= d2.RID

and d1.Value <= d2.Value

group by d1.RID

DECLARE @Lag1 INT = NULL

,@Lag2 INT = NULL

,@Lag3 INT = NULL

,@Lag4 INT = NULL

,@Lag5 INT = NULL

,@Lag6 INT = NULL

,@Lag7 INT = NULL

,@Lag8 INT = NULL

,@Lag9 INT = NULL;

UPDATE #dat WITH(TABLOCKX)

SET MaxRollingValue =

(

SELECT MAX(lag)

FROM (VALUES(@Lag1),(@Lag2),(@Lag3),(@Lag4),(@Lag5),(@Lag6),(@Lag7),(@Lag8),(@Lag9),(Value)) a(lag)

)

,@Lag9 = @Lag8

,@Lag8 = @Lag7

,@Lag7 = @Lag6

,@Lag6 = @Lag5

,@Lag5 = @Lag4

,@Lag4 = @Lag3

,@Lag3 = @Lag2

,@Lag2 = @Lag1

,@Lag1 = Value

OPTION (MAXDOP 1);

SELECT *

FROM #dat;

This is a knock-off of what I did in this article: https://www.simple-talk.com/sql/t-sql-programming/calculating-values-within-a-rolling-window-in-transact-sql/

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

Xedni SSCertifiable Points: 6951 More actions · Answer 12

Thanks, Dwain. I'll look this solution over as well and see how it fits the data set I'm working with.

Fwiw (and this doesn't quite work in pure SQL), I did some research on different algorithms to achieve this and I think the best one is illustrated in this post:

http://softwarelearner.blogspot.com/2011/04/minima-in-sliding-window.html

Admittedly, this is not TSQL, but I thought I'd share the fruits of my research.

It's my guess that the function Microsoft built into 2012 probably does something along those lines. Either way, there have been a few ingenious approaches to solving this issues in SQL posted thus far. Thanks!

Executive Junior Cowboy Developer, Esq.[/url]

Dwain Camps SSC Guru Points: 86908 More actions · Answer 13

I looked at your link and I would caution you to be careful when you try to translate procedural solutions like that "ascending minima" thingy into a set-based (declarative) one. Often they don't translate well. Like for example in that case, sorting each subgroup of 10 elements, especially since that is not needed.

You were on the right track initially thinking Quirky Update (which is what mine does). The issue is that it becomes a bit of a curiosity when you need to create the sliding window.

BTW. Like your job title!

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St