Rolling Max Value

  • Hi SSC,

    I've got a problem I'm trying to solve as efficiently as possible. I've attached simplified data, but to help conceptualize it, I want to, for each day in the life of a security, figure out what the highest value was over the past 52 weeks.

    In SQL Server 2012+, this is a piece of cake using max() over()... and a row frame, but alas, we don't have 2012+ on any prod boxes.

    Two Questions:

    1) does anyone know an efficient way to do this (p.s. Quirky updates came to mind, but I couldn't figure out how to apply it to this problem)

    2) Does anyone know roughly how the max() over() /w row frame works under the covers? Maybe I can reverse engineer it in an efficient manner

    use master

    go

    set nocount on

    go

    /**********************

    ** BUILD SAMPLE DATA **

    **********************/

    if object_id('tempdb.dbo.#dat') is not null drop table #dat

    create table #dat

    (

    RID int identity(1,1) primary key clustered,

    Value float,

    MaxRollingValue float

    )

    insert into #dat(Value)

    select top 10000 (abs(checksum(newid())) % 1000) * rand()

    from sys.objects a, sys.objects b

    /******************

    ** SET MAX VALUE **

    ******************/

    --Super efficient, but 2012+ only

    --1 Scan, 38 LR

    select

    RID,

    MaxRollingValue = max(Value) over (order by (select null)

    rows between 10 preceding and current row)

    from #dat

    --Simple to understand, but horrible performance.

    --10001 scans, 31691 LR

    select

    d1.RID,

    MaxRollingValue = max(d2.Value)

    from #dat d1

    inner join #dat d2

    on d1.RID >= d2.RID

    and d1.RID - 10 <= d2.RID

    and d1.Value <= d2.Value

    group by d1.RID

    Executive Junior Cowboy Developer, Esq.[/url]

  • The Quirky Update is easy to get, assuming that you already know the rules to follow.

    Here's an example with your sample data.

    DECLARE @Value float=0,

    @id int

    UPDATE t SET

    @Value = MaxRollingValue = CASE WHEN Value > @Value THEN Value ELSE @Value END,

    @id = RID

    FROM #dat t WITH( TABLOCKX)

    OPTION( MAXDOP 1)

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Thanks Luis. The only problem there is it doesn't work for a rolling window.

    If the highest value is at the tail end of the window and you iterate one row, the high must now change to the second highest value. Then if that remained the high till IT fell out of scope, you'd have to switch to the 3rd highest, and so on.

    Executive Junior Cowboy Developer, Esq.[/url]

  • Can you give me an example on what you just said? Are you trying to do a ranking? Or am I misunderstanding you?

    Partitioning (using windows) the results is possible, but I'm trying to figure what do you need.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Sure. So turning the data set on its side, you could think of the sequence of values like this:

    5,3,2,3,1,4,6....

    The update you described works if you just want to know the highest value at any point along the whole series, but I want to know what the highest value was only for the past (lets say) 3 records. So let's start reading from the left to the right (with the current frame indicated by brackets)

    1) 5],3,2,3,1,4,6. 5 is the highest (by default)

    2) 5,3],2,3,1,4,6. 5 is still the highest

    3) [5,3,2],3,1,4,6. 5 is still the highest

    4) 5,[3,2,3],1,4,6. 3 is now the highest, but you have no way of knowing this since the highest value, against which we were previously comparing to determine the next in the series, is now out of scope.

    If you're aware of a way to use an over clause to partition over a range without the use of ROWS or RANGE, I'd be interested to know.

    Executive Junior Cowboy Developer, Esq.[/url]

  • I totally missed that part and understand your pain.

    For now, the logic indicates that the "wide" join you're using seems the best option with one slight modification. You can remove the last condition to reduce one third of the logical reads. At least that's been consistent on my tests.

    I'll try to think further.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Hokay, so I think I have an algorithm to do this with a quirky update. There may be some bugs in it because there are so many combinations of how the elements in the window are arranged, but I think more or less, this does the trick.

    Early on I had the idea to store off the second highest value, and then if the first highest value rolled off, the second highest value would become the first. Originally I was storing the second highest value in the window whether it was before or after the first highest value. But after I jotted it down on paper, it became clear that you only need to store the second highest value if its IN FRONT of the first highest. No value smaller than the first highest value, behind it in the sequence, would ever supplant the first highest.

    Consider the following sequence and assume a rolling window of 3. F = first highest, S = second highest

    6,2,5,3,7,1,1,3

    •[6],2,5,3,7,1,1,3

    oF = 6

    oS = null

    •[6,2],5,3,7,1,1,3

    oF = 6

    oS = 2

    •[6,2,5],3,7,1,1,3

    oF = 6

    oS = 5

    •6,[2,5,3],7,1,1,3

    o6 falls off

    oF = 5

    oS = 3

    •6,2,[5,3,7],1,1,3

    oF = 7

    oS = null (again, we’re only interested in second highest value IN FRONT of the first highest)

    •6,2,5,[3,7,1],1,3

    oF = 7

    oS = 1

    •6,2,5,3,[7,1,1],3

    oF = 7

    oS = 1

    •6,2,5,3,7,[1,1,3]

    o7 falls off

    oF = 3

    oS = null

    All these rules can be done on successive rows, making a quirky update possible. This works for the randomized data I originally asked about, but to show it in SQL, I’ve hard coded this exact sequence into this script.

    Note the use of @temp<name> variables. The operations need to take place on the variables as they were when they entered the update. If you were to use the actual variables, as soon as the first clause modified @first, it would throw off the clauses for subsequent variable assignments.

    The windowed function in 2012+ I think still works better than mine, but this is considerably better than any of the RBAR solutions in terms of performance.

    /**********************

    ** BUILD SAMPLE DATA **

    **********************/

    if object_id('tempdb.dbo.#dat') is not null drop table #dat

    create table #dat

    (

    RID int identity(1,1) primary key clustered,

    Value float,

    MaxRollingValue float,

    Debug varchar(max),

    FirstRID int

    )

    go

    insert into #dat(Value)

    values (6),(2),(5),(3),(7),(1),(1),(3)

    --select top 10000 (abs(checksum(newid())) % 1000) * rand()

    --from sys.objects a, sys.objects b

    declare

    @tempFirst float,

    @first float = -1,

    @tempFirstRID int,

    @firstRID int = 1,

    @tempSecond float,

    @second float = -1,

    @tempSecondRID int,

    @secondRID int = null,

    @interval int = 3,

    @Anchor int

    update #dat with (tablockx)

    set @Anchor = RID,

    @tempfirst = case when Value > @first then value

    when (RID - @Interval) = @firstRID and Value < @second then @second

    when (RID - @Interval) = @firstRID and Value > @second then Value

    else @first

    end,

    @tempfirstRID = case when Value > @first then RID

    when (RID - @Interval) = @firstRID and Value < @second then @secondRID

    when (RID - @Interval) = @firstRID and Value > @second then RID

    else @firstRID

    end,

    @tempSecond = case when Value > @first then -1

    when Value < @first and value > @second then value

    when (RID - @Interval) = @firstRID then -1

    else @second

    end,

    @tempSecondRID = case when (RID - @Interval) = @firstRID then null

    when Value > @first then null

    when Value < @first and Value > @second then RID

    else @SecondRID

    end,

    @first = @tempFirst,

    @firstRID = @tempFirstRID,

    @second = @tempSecond,

    @secondRID = @tempSecondRID,

    MaxRollingValue = @first,

    FirstRID = @firstRID,

    Debug = concat('@first:', @first, ' @firstRID:', @firstRID, ' @second:', @second, ' @secondRID:', @secondRID)

    option (maxdop 1)

    select *

    from #dat

    order by RID

    Executive Junior Cowboy Developer, Esq.[/url]

  • Here is a little "first coffee in the morning" solution:doze:

    It uses two instances of a Tally CTE to generate a set N{N+0,,,,N+x} for framing the x rows preceding and current row (kind of).

    😎

    DECLARE @SET_SIZE INT = 10;

    --SET STATISTICS IO ON;

    ;WITH T(N) AS (SELECT N FROM (VALUES (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)) AS X(N))

    ,NUMS(N) AS (SELECT TOP(SELECT COUNT(*) FROM #dat) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM T T1,T T2,T T3,T T4,T T5)

    ,NM10(N) AS (SELECT TOP (@SET_SIZE) N - 1 FROM NUMS)

    ,CALC_SET AS

    (

    SELECT

    NM.N

    ,NM.N + N10.N AS NX

    FROM NUMS NM

    OUTER APPLY NM10 N10

    )

    SELECT

    CS.N

    ,MAX(DT.Value) AS MAX_VAL

    FROM CALC_SET CS

    INNER JOIN #dat DT

    ON CS.NX = DT.RID

    GROUP BY CS.N

    --ORDER BY CS.N

    --SET STATISTICS IO OFF;

    IO statistics :pinch:

    Table 'Worktable'. Scan count 1, logical reads 20438, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#dat________________________________________________________________________________________________________________000000000023'. Scan count 3, logical reads 114, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

  • After the second coffee I realised that the solution was in fact "current row and X following", here is a "X preceding and current row" improvement.

    😎

    use master

    go

    set nocount on

    go

    /**********************

    ** BUILD SAMPLE DATA **

    **********************/

    if object_id('tempdb.dbo.#dat') is not null drop table #dat

    create table #dat

    (

    RID int identity(1,1) primary key clustered,

    Value float,

    MaxRollingValue float

    )

    insert into #dat(Value)

    select top 10000 (abs(checksum(newid())) % 1000) * rand()

    from sys.objects a, sys.objects b

    DECLARE @SET_SIZE INT = 10;

    SET STATISTICS IO ON;

    ;WITH T(N) AS (SELECT N FROM (VALUES (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)) AS X(N))

    ,NUMS(N) AS (SELECT TOP(SELECT COUNT(*) FROM #dat) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM T T1,T T2,T T3,T T4,T T5)

    ,NM10(N) AS (SELECT TOP (@SET_SIZE) N - 1 FROM NUMS)

    ,PRENUM(RID,Value) AS (SELECT -N AS RID,0 AS Value FROM NM10)

    ,CALC_SET AS

    (

    SELECT

    NM.N

    ,NM.N + N10.N AS NX

    FROM NUMS NM

    OUTER APPLY NM10 N10

    )

    ,EXTENDED_DAT AS

    (

    SELECT

    RID

    ,Value

    FROM PRENUM

    UNION ALL

    SELECT

    RID

    ,Value

    FROM #dat

    )

    SELECT

    CS.N

    ,MAX(DT.Value) AS MAX_VAL

    FROM CALC_SET CS

    INNER JOIN EXTENDED_DAT DT

    ON CS.NX = DT.RID

    GROUP BY CS.N

    ORDER BY CS.N

    SET STATISTICS IO OFF;

  • Wow, that works quite well. Given that I discovered some flaws in my post where I thought I had it with the quirky update, I'll try using this against my actual data set and see if everything holds water.

    Thanks!

    Executive Junior Cowboy Developer, Esq.[/url]

  • JeeTee (8/22/2014)


    Wow, that works quite well. Given that I discovered some flaws in my post where I thought I had it with the quirky update, I'll try using this against my actual data set and see if everything holds water.

    Thanks!

    Quick thought, I don't think the solution is perfect but it should be easy to improve. Haven't had time to look at it in more detail but I suspect that the leading in part, where N < @SET_SIZE might need improving.

    😎

  • Stick with a quirky update:

    use master

    go

    set nocount on

    go

    /**********************

    ** BUILD SAMPLE DATA **

    **********************/

    if object_id('tempdb.dbo.#dat') is not null drop table #dat

    create table #dat

    (

    RID int identity(1,1) primary key clustered,

    Value float,

    MaxRollingValue float

    )

    insert into #dat(Value)

    select top 10000 (abs(checksum(newid())) % 1000) * rand()

    from sys.objects a, sys.objects b

    /******************

    ** SET MAX VALUE **

    ******************/

    --Super efficient, but 2012+ only

    --1 Scan, 38 LR

    --select

    -- RID,

    -- MaxRollingValue = max(Value) over (order by (select null)

    -- rows between 10 preceding and current row)

    --from #dat

    --SELECT *

    --FROM #dat;

    --Simple to understand, but horrible performance.

    --10001 scans, 31691 LR

    select

    d1.RID,

    MaxRollingValue = max(d2.Value)

    from #dat d1

    inner join #dat d2

    on d1.RID >= d2.RID

    and d1.RID - 10 <= d2.RID

    and d1.Value <= d2.Value

    group by d1.RID

    DECLARE @Lag1 INT = NULL

    ,@Lag2 INT = NULL

    ,@Lag3 INT = NULL

    ,@Lag4 INT = NULL

    ,@Lag5 INT = NULL

    ,@Lag6 INT = NULL

    ,@Lag7 INT = NULL

    ,@Lag8 INT = NULL

    ,@Lag9 INT = NULL;

    UPDATE #dat WITH(TABLOCKX)

    SET MaxRollingValue =

    (

    SELECT MAX(lag)

    FROM (VALUES(@Lag1),(@Lag2),(@Lag3),(@Lag4),(@Lag5),(@Lag6),(@Lag7),(@Lag8),(@Lag9),(Value)) a(lag)

    )

    ,@Lag9 = @Lag8

    ,@Lag8 = @Lag7

    ,@Lag7 = @Lag6

    ,@Lag6 = @Lag5

    ,@Lag5 = @Lag4

    ,@Lag4 = @Lag3

    ,@Lag3 = @Lag2

    ,@Lag2 = @Lag1

    ,@Lag1 = Value

    OPTION (MAXDOP 1);

    SELECT *

    FROM #dat;

    This is a knock-off of what I did in this article: https://www.simple-talk.com/sql/t-sql-programming/calculating-values-within-a-rolling-window-in-transact-sql/


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Thanks, Dwain. I'll look this solution over as well and see how it fits the data set I'm working with.

    Fwiw (and this doesn't quite work in pure SQL), I did some research on different algorithms to achieve this and I think the best one is illustrated in this post:

    http://softwarelearner.blogspot.com/2011/04/minima-in-sliding-window.html

    Admittedly, this is not TSQL, but I thought I'd share the fruits of my research.

    It's my guess that the function Microsoft built into 2012 probably does something along those lines. Either way, there have been a few ingenious approaches to solving this issues in SQL posted thus far. Thanks!

    Executive Junior Cowboy Developer, Esq.[/url]

  • I looked at your link and I would caution you to be careful when you try to translate procedural solutions like that "ascending minima" thingy into a set-based (declarative) one. Often they don't translate well. Like for example in that case, sorting each subgroup of 10 elements, especially since that is not needed.

    You were on the right track initially thinking Quirky Update (which is what mine does). The issue is that it becomes a bit of a curiosity when you need to create the sliding window.

    BTW. Like your job title!


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply