Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12»»

Rolling Max Value Expand / Collapse
Author
Message
Posted Thursday, August 21, 2014 1:03 PM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Today @ 10:25 AM
Points: 132, Visits: 406
Hi SSC,

I've got a problem I'm trying to solve as efficiently as possible. I've attached simplified data, but to help conceptualize it, I want to, for each day in the life of a security, figure out what the highest value was over the past 52 weeks.

In SQL Server 2012+, this is a piece of cake using max() over()... and a row frame, but alas, we don't have 2012+ on any prod boxes.

Two Questions:
1) does anyone know an efficient way to do this (p.s. Quirky updates came to mind, but I couldn't figure out how to apply it to this problem)
2) Does anyone know roughly how the max() over() /w row frame works under the covers? Maybe I can reverse engineer it in an efficient manner

use master
go
set nocount on
go

/**********************
** BUILD SAMPLE DATA **
**********************/
if object_id('tempdb.dbo.#dat') is not null drop table #dat
create table #dat
(
RID int identity(1,1) primary key clustered,
Value float,
MaxRollingValue float
)

insert into #dat(Value)
select top 10000 (abs(checksum(newid())) % 1000) * rand()
from sys.objects a, sys.objects b

/******************
** SET MAX VALUE **
******************/
--Super efficient, but 2012+ only
--1 Scan, 38 LR
select
RID,
MaxRollingValue = max(Value) over (order by (select null)
rows between 10 preceding and current row)
from #dat

--Simple to understand, but horrible performance.
--10001 scans, 31691 LR
select
d1.RID,
MaxRollingValue = max(d2.Value)
from #dat d1
inner join #dat d2
on d1.RID >= d2.RID
and d1.RID - 10 <= d2.RID
and d1.Value <= d2.Value
group by d1.RID




Executive Junior Cowboy Developer, Esq.
Post #1605974
Posted Thursday, August 21, 2014 3:11 PM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Today @ 6:47 AM
Points: 3,908, Visits: 8,859
The Quirky Update is easy to get, assuming that you already know the rules to follow.
Here's an example with your sample data.

DECLARE @Value float=0,
@id int

UPDATE t SET
@Value = MaxRollingValue = CASE WHEN Value > @Value THEN Value ELSE @Value END,
@id = RID
FROM #dat t WITH( TABLOCKX)
OPTION( MAXDOP 1)





Luis C.
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

Forum Etiquette: How to post data/code on a forum to get the best help
Post #1606034
Posted Thursday, August 21, 2014 3:18 PM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Today @ 10:25 AM
Points: 132, Visits: 406
Thanks Luis. The only problem there is it doesn't work for a rolling window.

If the highest value is at the tail end of the window and you iterate one row, the high must now change to the second highest value. Then if that remained the high till IT fell out of scope, you'd have to switch to the 3rd highest, and so on.


Executive Junior Cowboy Developer, Esq.
Post #1606039
Posted Thursday, August 21, 2014 3:25 PM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Today @ 6:47 AM
Points: 3,908, Visits: 8,859
Can you give me an example on what you just said? Are you trying to do a ranking? Or am I misunderstanding you?
Partitioning (using windows) the results is possible, but I'm trying to figure what do you need.



Luis C.
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

Forum Etiquette: How to post data/code on a forum to get the best help
Post #1606045
Posted Thursday, August 21, 2014 3:33 PM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Today @ 10:25 AM
Points: 132, Visits: 406
Sure. So turning the data set on its side, you could think of the sequence of values like this:

5,3,2,3,1,4,6....

The update you described works if you just want to know the highest value at any point along the whole series, but I want to know what the highest value was only for the past (lets say) 3 records. So let's start reading from the left to the right (with the current frame indicated by brackets)

1) 5],3,2,3,1,4,6. 5 is the highest (by default)
2) 5,3],2,3,1,4,6. 5 is still the highest
3) [5,3,2],3,1,4,6. 5 is still the highest
4) 5,[3,2,3],1,4,6. 3 is now the highest, but you have no way of knowing this since the highest value, against which we were previously comparing to determine the next in the series, is now out of scope.

If you're aware of a way to use an over clause to partition over a range without the use of ROWS or RANGE, I'd be interested to know.


Executive Junior Cowboy Developer, Esq.
Post #1606053
Posted Thursday, August 21, 2014 4:00 PM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Today @ 6:47 AM
Points: 3,908, Visits: 8,859
I totally missed that part and understand your pain.
For now, the logic indicates that the "wide" join you're using seems the best option with one slight modification. You can remove the last condition to reduce one third of the logical reads. At least that's been consistent on my tests.
I'll try to think further.



Luis C.
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

Forum Etiquette: How to post data/code on a forum to get the best help
Post #1606062
Posted Thursday, August 21, 2014 4:34 PM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Today @ 10:25 AM
Points: 132, Visits: 406
Hokay, so I think I have an algorithm to do this with a quirky update. There may be some bugs in it because there are so many combinations of how the elements in the window are arranged, but I think more or less, this does the trick.

Early on I had the idea to store off the second highest value, and then if the first highest value rolled off, the second highest value would become the first. Originally I was storing the second highest value in the window whether it was before or after the first highest value. But after I jotted it down on paper, it became clear that you only need to store the second highest value if its IN FRONT of the first highest. No value smaller than the first highest value, behind it in the sequence, would ever supplant the first highest.

Consider the following sequence and assume a rolling window of 3. F = first highest, S = second highest
6,2,5,3,7,1,1,3

• [6],2,5,3,7,1,1,3
o F = 6
o S = null
• [6,2],5,3,7,1,1,3
o F = 6
o S = 2
• [6,2,5],3,7,1,1,3
o F = 6
o S = 5
• 6,[2,5,3],7,1,1,3
o 6 falls off
o F = 5
o S = 3
• 6,2,[5,3,7],1,1,3
o F = 7
o S = null (again, we’re only interested in second highest value IN FRONT of the first highest)
• 6,2,5,[3,7,1],1,3
o F = 7
o S = 1
• 6,2,5,3,[7,1,1],3
o F = 7
o S = 1
• 6,2,5,3,7,[1,1,3]
o 7 falls off
o F = 3
o S = null

All these rules can be done on successive rows, making a quirky update possible. This works for the randomized data I originally asked about, but to show it in SQL, I’ve hard coded this exact sequence into this script.

Note the use of @temp<name> variables. The operations need to take place on the variables as they were when they entered the update. If you were to use the actual variables, as soon as the first clause modified @first, it would throw off the clauses for subsequent variable assignments.

The windowed function in 2012+ I think still works better than mine, but this is considerably better than any of the RBAR solutions in terms of performance.


/**********************
** BUILD SAMPLE DATA **
**********************/
if object_id('tempdb.dbo.#dat') is not null drop table #dat
create table #dat
(
RID int identity(1,1) primary key clustered,
Value float,
MaxRollingValue float,
Debug varchar(max),
FirstRID int
)
go

insert into #dat(Value)
values (6),(2),(5),(3),(7),(1),(1),(3)
--select top 10000 (abs(checksum(newid())) % 1000) * rand()
--from sys.objects a, sys.objects b

declare
@tempFirst float,
@first float = -1,
@tempFirstRID int,
@firstRID int = 1,
@tempSecond float,
@second float = -1,
@tempSecondRID int,
@secondRID int = null,
@interval int = 3,
@Anchor int

update #dat with (tablockx)
set @Anchor = RID,
@tempfirst = case when Value > @first then value
when (RID - @Interval) = @firstRID and Value < @second then @second
when (RID - @Interval) = @firstRID and Value > @second then Value
else @first
end,
@tempfirstRID = case when Value > @first then RID
when (RID - @Interval) = @firstRID and Value < @second then @secondRID
when (RID - @Interval) = @firstRID and Value > @second then RID
else @firstRID
end,
@tempSecond = case when Value > @first then -1
when Value < @first and value > @second then value
when (RID - @Interval) = @firstRID then -1
else @second
end,
@tempSecondRID = case when (RID - @Interval) = @firstRID then null
when Value > @first then null
when Value < @first and Value > @second then RID
else @SecondRID
end,
@first = @tempFirst,
@firstRID = @tempFirstRID,
@second = @tempSecond,
@secondRID = @tempSecondRID,
MaxRollingValue = @first,
FirstRID = @firstRID,
Debug = concat('@first:', @first, ' @firstRID:', @firstRID, ' @second:', @second, ' @secondRID:', @secondRID)
option (maxdop 1)

select *
from #dat
order by RID



Executive Junior Cowboy Developer, Esq.
Post #1606070
Posted Thursday, August 21, 2014 10:31 PM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Today @ 10:22 AM
Points: 2,391, Visits: 6,620
Here is a little "first coffee in the morning" solution

It uses two instances of a Tally CTE to generate a set N{N+0,,,,N+x} for framing the x rows preceding and current row (kind of).


DECLARE @SET_SIZE INT = 10;
--SET STATISTICS IO ON;
;WITH T(N) AS (SELECT N FROM (VALUES (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)) AS X(N))
,NUMS(N) AS (SELECT TOP(SELECT COUNT(*) FROM #dat) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM T T1,T T2,T T3,T T4,T T5)
,NM10(N) AS (SELECT TOP (@SET_SIZE) N - 1 FROM NUMS)
,CALC_SET AS
(
SELECT
NM.N
,NM.N + N10.N AS NX
FROM NUMS NM
OUTER APPLY NM10 N10
)
SELECT
CS.N
,MAX(DT.Value) AS MAX_VAL
FROM CALC_SET CS
INNER JOIN #dat DT
ON CS.NX = DT.RID
GROUP BY CS.N
--ORDER BY CS.N
--SET STATISTICS IO OFF;

IO statistics
Table 'Worktable'. Scan count 1, logical reads 20438, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#dat________________________________________________________________________________________________________________000000000023'. Scan count 3, logical reads 114, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Post #1606101
Posted Thursday, August 21, 2014 10:46 PM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Today @ 10:22 AM
Points: 2,391, Visits: 6,620
After the second coffee I realised that the solution was in fact "current row and X following", here is a "X preceding and current row" improvement.


use master
go
set nocount on
go

/**********************
** BUILD SAMPLE DATA **
**********************/
if object_id('tempdb.dbo.#dat') is not null drop table #dat
create table #dat
(
RID int identity(1,1) primary key clustered,
Value float,
MaxRollingValue float
)

insert into #dat(Value)
select top 10000 (abs(checksum(newid())) % 1000) * rand()
from sys.objects a, sys.objects b



DECLARE @SET_SIZE INT = 10;
SET STATISTICS IO ON;
;WITH T(N) AS (SELECT N FROM (VALUES (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)) AS X(N))
,NUMS(N) AS (SELECT TOP(SELECT COUNT(*) FROM #dat) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM T T1,T T2,T T3,T T4,T T5)
,NM10(N) AS (SELECT TOP (@SET_SIZE) N - 1 FROM NUMS)
,PRENUM(RID,Value) AS (SELECT -N AS RID,0 AS Value FROM NM10)
,CALC_SET AS
(
SELECT
NM.N
,NM.N + N10.N AS NX
FROM NUMS NM
OUTER APPLY NM10 N10
)
,EXTENDED_DAT AS
(
SELECT
RID
,Value
FROM PRENUM
UNION ALL
SELECT
RID
,Value
FROM #dat
)
SELECT
CS.N
,MAX(DT.Value) AS MAX_VAL
FROM CALC_SET CS
INNER JOIN EXTENDED_DAT DT
ON CS.NX = DT.RID
GROUP BY CS.N
ORDER BY CS.N
SET STATISTICS IO OFF;

Post #1606103
Posted Friday, August 22, 2014 5:16 PM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Today @ 10:25 AM
Points: 132, Visits: 406
Wow, that works quite well. Given that I discovered some flaws in my post where I thought I had it with the quirky update, I'll try using this against my actual data set and see if everything holds water.

Thanks!


Executive Junior Cowboy Developer, Esq.
Post #1606552
« Prev Topic | Next Topic »

Add to briefcase 12»»

Permissions Expand / Collapse