Consecutive streak excluding weekends

  • Hi SSC,

    I'm trying to write an algorithm that returns the most recent and longest consecutive streak of positive or negative price changes in a given stock. The streak can extend over null weekends, but not over null weekdays (presumably trading days).

    For example, lets say Google had end of day positive returns on (any given) Tuesday, Monday, and previous Friday, but then Thursday, it had negative returns. That would be a 3 day streak between Friday and Tuesday. Also, if a date has a null value on a date that is NOT a weekend, the streak ends.

    In the following code sample, you can get a simplified idea of what the raw data will look like and what the output should look like.

    set nocount on

    set datefirst 7

    go

    if object_id('tempdb.dbo.#raw') is not null drop table #raw

    create table #raw

    (

    symbol varchar(30),

    tDate date,

    tDateInt as cast(cast(tDate as datetime) as int), --Integer version of the date

    pctChg float,

    isWeekend as case when datepart(weekday, tDate) in (1,7) then 1 else 0 end --indicate whether the date is weekend

    primary key clustered (symbol, tDate desc)

    )

    insert into #raw (symbol, tDate, pctChg) select 'A', cast(41854 as datetime), 3.14

    insert into #raw (symbol, tDate, pctChg) select 'A', cast(41853 as datetime), 2.21

    insert into #raw (symbol, tDate, pctChg) select 'A', cast(41852 as datetime), null

    insert into #raw (symbol, tDate, pctChg) select 'A', cast(41851 as datetime), null

    insert into #raw (symbol, tDate, pctChg) select 'A', cast(41850 as datetime), 1.01

    insert into #raw (symbol, tDate, pctChg) select 'A', cast(41849 as datetime), -1.67

    insert into #raw (symbol, tDate, pctChg) select 'B', cast(41854 as datetime), -.3

    insert into #raw (symbol, tDate, pctChg) select 'B', cast(41853 as datetime), -6

    insert into #raw (symbol, tDate, pctChg) select 'B', cast(41852 as datetime), null

    insert into #raw (symbol, tDate, pctChg) select 'B', cast(41851 as datetime), null

    insert into #raw (symbol, tDate, pctChg) select 'B', cast(41850 as datetime), -1.01

    insert into #raw (symbol, tDate, pctChg) select 'B', cast(41849 as datetime), 1.67

    insert into #raw (symbol, tDate, pctChg) select 'C', cast(41854 as datetime), 2.1

    insert into #raw (symbol, tDate, pctChg) select 'C', cast(41853 as datetime), null

    insert into #raw (symbol, tDate, pctChg) select 'C', cast(41852 as datetime), null

    insert into #raw (symbol, tDate, pctChg) select 'C', cast(41851 as datetime), null

    insert into #raw (symbol, tDate, pctChg) select 'C', cast(41850 as datetime), 1.01

    insert into #raw (symbol, tDate, pctChg) select 'C', cast(41849 as datetime), -1.67

    select top 1000 *

    from #raw

    --Simulation of what results should look like

    select symbol = 'A', streakStart = '2014-08-04', streakEnd = '2014-08-05', streakLength = 2 union all

    select symbol = 'B', streakStart = '2014-08-01', streakEnd = '2014-08-05', streakLength = 3 union all

    select symbol = 'C', streakStart = '2014-08-05', streakEnd = '2014-08-05', streakLength = 1

    I've done date streaks before by using a row_number and subtracting it from the date to get consecutive groupings, but this is a little different because it depends on the direction of the price change as well as whether or not it was a weekend.

    select top 1000

    *,

    grp = tDateInt - row_number() over (partition by symbol order by tDateInt)

    from #raw

    I should also mention that this has to be done over about half a million symbols so something RBAR is especially unappealing.

    Any ideas would be greatly appreciated.

    Executive Junior Cowboy Developer, Esq.[/url]

  • Currently working in finance, there's additional days you need to kick out, which are the holidays for the different markets. Assuming you're only working against NYSE, NASDAQ, and OTC, you still need to deal with that.

    With that in mind, do you have a calendar table already setup which lists allowable dates, or do you have a holiday table with is exclusionary for your list? If calendar, this becomes easy. If holiday, we have to build the calendar on the fly and then work from there.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Exchange holidays are a whole other set of fun, which, for simplicities sake, I've excluded. I'm guessing if I can define a way to skip over holidays, it could similarly be applied to exchange holidays.

    Executive Junior Cowboy Developer, Esq.[/url]

  • I would recommend you start with something like this:

    http://www.brianshowalter.com/calendar_tables

    Just the first one that came up on google, and it looks reasonable.

    Using a table like this you can filter, control, and get your dates organized into a reasonable fashion. Once you have that, a filtered list of the dates you want to use (non holiday/weekend), a ROW_NUMBER() function, and an application of islands and gaps techniques can get you where you want to be. But first, we need the calendar. 🙂


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • This will do it. As usual, most of the details are where they belong... in comments in the code. 😉

    The following code works with the test harness provided in the orginal post.

    --===== If the work table already exists, drop it to make reruns in SSMS easier

    IF OBJECT_ID('tempdb..#Work','U') IS NOT NULL

    DROP TABLE #Work

    ;

    --===== Create the work table. The given PK is quintessential.

    CREATE TABLE #Work

    (

    symbol VARCHAR(30) NOT NULL

    ,tDate DATETIME NOT NULL

    ,ChangeType SMALLINT NOT NULL

    ,MyGroup INT NOT NULL DEFAULT 0

    PRIMARY KEY CLUSTERED (symbol,tDate) --Absolutely critical. Don't even think of changing this.

    )

    ;

    --===== Populate the work table with the data that we need.

    -- Notice the ChangeType column determines the "direction"

    -- of the pctChg column.

    INSERT INTO #Work

    (symbol, tDate, ChangeType)

    SELECT symbol, tDate, ChangeType = SIGN(ISNULL(pctChg,0))

    FROM #Raw

    WHERE isWeekEnd = 0

    ;

    --===== Declare a set of obviously named variables to drive

    -- the "Quirky Update"

    DECLARE @PrevSymbol VARCHAR(30)

    ,@PrevChangeType SMALLINT

    ,@PrevMyGroup INT

    ,@SafetyCounter INT

    ;

    --===== Preset a couple of those variables to a known condition.

    SELECT @PrevMyGroup = 0

    ,@SafetyCounter = 1

    ;

    --===== Using a safety counter to raise an error if this ever makes a mistake

    -- (but it never will), scan the table and change/increment the MyGroup

    -- counter when the symbol or ChangeType column changes from row to row

    -- and do it all in the order of symbol ad tDate columns, just like the PK.

    WITH

    cteSafetyCounter AS

    ( --=== This cte exposes the columns we need to read or update and provides

    -- the safety counter.

    SELECT SafetyCounter = ROW_NUMBER() OVER (ORDER BY symbol,tDate)

    ,symbol

    ,tDate

    ,ChangeType

    ,MyGroup

    FROM #Work WITH(TABLOCKX,INDEX(1)) --Absolutely critical. Don't even think of changing this.

    )

    --==== Now we'll calculate the MyGroup column just like we would in procedural code...

    -- one row at a time using the "Quirky Update" as a "Pseudo Cursor".

    UPDATE tgt

    SET @PrevMyGroup = MyGroup =CASE

    WHEN @SafetyCounter = SafetyCounter

    THEN CASE

    WHEN symbol = @PrevSymbol AND ChangeType = @PrevChangeType

    THEN @PrevMyGroup

    ELSE @PrevMyGroup + 1

    END

    ELSE 1/0 --Raises an error if something gets out of sequence

    END

    ,@PrevSymbol = Symbol

    ,@PrevChangeType = ChangeType

    ,@SafetyCounter = @SafetyCounter + 1

    FROM cteSafetyCounter tgt

    OPTION (MAXDOP 1) --Absolutely critical. Don't even think of changing this.

    ;

    --===== Now that we have the data marked by groups, the rest is easy to produce the report.

    WITH

    cteAggregate AS

    ( --=== This aggregates a count for each MyGroup in each symbol.

    SELECT symbol

    ,streakStart = MIN(tDate)

    ,streakEnd = MAX(tDate)

    ,streakLength = COUNT(*)

    FROM #Work

    GROUP BY symbol,MyGroup

    )

    ,

    cteSortOrder AS

    ( --=== This numbers the groups in descending order by streakLength and uses streakStart as a tie-breaker

    SELECT SortOrder = ROW_NUMBER() OVER (PARTITION BY symbol ORDER BY streakLength DESC, streakStart DESC)

    ,*

    FROM cteAggregate

    ) --=== This returns the desired info for the largest, latest streak for each symbol.

    -- The SortOrder = 1 was created in descending order by steakLength and streakStart above.

    SELECT symbol, streakStart, streakEnd, streakLength

    FROM cteSortOrder

    WHERE SortOrder = 1

    ORDER BY symbol

    ;

    That produces the following results...

    symbol streakStart streakEnd streakLength

    ------ ----------------------- ----------------------- ------------

    A 2014-08-01 00:00:00.000 2014-08-05 00:00:00.000 3

    B 2014-08-01 00:00:00.000 2014-08-05 00:00:00.000 3

    C 2014-08-05 00:00:00.000 2014-08-05 00:00:00.000 1

    Note that the original post said that the streakLength for symbol "A" should be "2" and that's actually incorrect according to the given requirements. The requirements said that streaks should continue through weekend NULL days (presumably, always NULL on the weekend) but streaks stop on NULLs during the week.

    This code can certainly be tweeked to allow streaks to run through holidays just by adding AND isHoliday = 0 to the appropriate place in the code that populates the work table.

    As a bit of a sidebar, the "Quirky Update" method in this code is a highly controversial but highly effect method that I've been using just about forever. It even beats LEAD/LAG in 2012 and up. Some good folks (Paul White and Tom Tompson on this site and Peter Larsson separately on another site) came up with the "Safety Counter" thing after 2005 came out and that's nice to have. Once you get one of these bad boys working correctly, it won't fail especially when the INDEX hint is present. There are areas in the code that simply must not be altered and I've marked those areas.

    Be advised that, in no way, does MS support or recommend this method. Don't let that scare you too much. I've got systems that still work using this method after more than 15 years of SQL Server CUs, SPs, and Rev changes.

    If that's not good enough, then I recommend the creation of a CLR to pull this task off although I think it might be slower (I haven't tried it because I haven't needed to).

    Some of the heavy hitters on this site also use this technique and understand it well. To wit, if you have a question on it, don't hesitate to ask. One or more of us can help in this area.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • As another bit of a sidebar, we could also modify the code to identity the direction of the largest streak or produce 3 rows for each symbol to identify the largest negative, positive, and static (no change) streaks.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply