Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

Consecutive streak excluding weekends Expand / Collapse
Author
Message
Posted Friday, August 8, 2014 11:38 AM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Today @ 8:57 AM
Points: 95, Visits: 319
Hi SSC,

I'm trying to write an algorithm that returns the most recent and longest consecutive streak of positive or negative price changes in a given stock. The streak can extend over null weekends, but not over null weekdays (presumably trading days).

For example, lets say Google had end of day positive returns on (any given) Tuesday, Monday, and previous Friday, but then Thursday, it had negative returns. That would be a 3 day streak between Friday and Tuesday. Also, if a date has a null value on a date that is NOT a weekend, the streak ends.

In the following code sample, you can get a simplified idea of what the raw data will look like and what the output should look like.

set nocount on
set datefirst 7
go

if object_id('tempdb.dbo.#raw') is not null drop table #raw
create table #raw
(
symbol varchar(30),
tDate date,
tDateInt as cast(cast(tDate as datetime) as int), --Integer version of the date
pctChg float,
isWeekend as case when datepart(weekday, tDate) in (1,7) then 1 else 0 end --indicate whether the date is weekend
primary key clustered (symbol, tDate desc)
)

insert into #raw (symbol, tDate, pctChg) select 'A', cast(41854 as datetime), 3.14
insert into #raw (symbol, tDate, pctChg) select 'A', cast(41853 as datetime), 2.21
insert into #raw (symbol, tDate, pctChg) select 'A', cast(41852 as datetime), null
insert into #raw (symbol, tDate, pctChg) select 'A', cast(41851 as datetime), null
insert into #raw (symbol, tDate, pctChg) select 'A', cast(41850 as datetime), 1.01
insert into #raw (symbol, tDate, pctChg) select 'A', cast(41849 as datetime), -1.67
insert into #raw (symbol, tDate, pctChg) select 'B', cast(41854 as datetime), -.3
insert into #raw (symbol, tDate, pctChg) select 'B', cast(41853 as datetime), -6
insert into #raw (symbol, tDate, pctChg) select 'B', cast(41852 as datetime), null
insert into #raw (symbol, tDate, pctChg) select 'B', cast(41851 as datetime), null
insert into #raw (symbol, tDate, pctChg) select 'B', cast(41850 as datetime), -1.01
insert into #raw (symbol, tDate, pctChg) select 'B', cast(41849 as datetime), 1.67
insert into #raw (symbol, tDate, pctChg) select 'C', cast(41854 as datetime), 2.1
insert into #raw (symbol, tDate, pctChg) select 'C', cast(41853 as datetime), null
insert into #raw (symbol, tDate, pctChg) select 'C', cast(41852 as datetime), null
insert into #raw (symbol, tDate, pctChg) select 'C', cast(41851 as datetime), null
insert into #raw (symbol, tDate, pctChg) select 'C', cast(41850 as datetime), 1.01
insert into #raw (symbol, tDate, pctChg) select 'C', cast(41849 as datetime), -1.67

select top 1000 *
from #raw
--Simulation of what results should look like
select symbol = 'A', streakStart = '2014-08-04', streakEnd = '2014-08-05', streakLength = 2 union all
select symbol = 'B', streakStart = '2014-08-01', streakEnd = '2014-08-05', streakLength = 3 union all
select symbol = 'C', streakStart = '2014-08-05', streakEnd = '2014-08-05', streakLength = 1

I've done date streaks before by using a row_number and subtracting it from the date to get consecutive groupings, but this is a little different because it depends on the direction of the price change as well as whether or not it was a weekend.

select top 1000 
*,
grp = tDateInt - row_number() over (partition by symbol order by tDateInt)
from #raw

I should also mention that this has to be done over about half a million symbols so something RBAR is especially unappealing.

Any ideas would be greatly appreciated.


Executive Junior Cowboy Developer, Esq.
Post #1601291
Posted Friday, August 8, 2014 12:08 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 2:04 AM
Points: 5,383, Visits: 7,457
Currently working in finance, there's additional days you need to kick out, which are the holidays for the different markets. Assuming you're only working against NYSE, NASDAQ, and OTC, you still need to deal with that.

With that in mind, do you have a calendar table already setup which lists allowable dates, or do you have a holiday table with is exclusionary for your list? If calendar, this becomes easy. If holiday, we have to build the calendar on the fly and then work from there.



- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions | Forum Netiquette
For index/tuning help, follow these directions. |Tally Tables

Twitter: @AnyWayDBA
Post #1601297
Posted Friday, August 8, 2014 1:13 PM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Today @ 8:57 AM
Points: 95, Visits: 319
Exchange holidays are a whole other set of fun, which, for simplicities sake, I've excluded. I'm guessing if I can define a way to skip over holidays, it could similarly be applied to exchange holidays.

Executive Junior Cowboy Developer, Esq.
Post #1601316
Posted Friday, August 8, 2014 1:42 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 2:04 AM
Points: 5,383, Visits: 7,457
I would recommend you start with something like this:

http://www.brianshowalter.com/calendar_tables

Just the first one that came up on google, and it looks reasonable.

Using a table like this you can filter, control, and get your dates organized into a reasonable fashion. Once you have that, a filtered list of the dates you want to use (non holiday/weekend), a ROW_NUMBER() function, and an application of islands and gaps techniques can get you where you want to be. But first, we need the calendar.



- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions | Forum Netiquette
For index/tuning help, follow these directions. |Tally Tables

Twitter: @AnyWayDBA
Post #1601324
Posted Friday, August 8, 2014 11:03 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 8:40 AM
Points: 35,265, Visits: 31,754
This will do it. As usual, most of the details are where they belong... in comments in the code.

The following code works with the test harness provided in the orginal post.
--===== If the work table already exists, drop it to make reruns in SSMS easier
IF OBJECT_ID('tempdb..#Work','U') IS NOT NULL
DROP TABLE #Work
;
--===== Create the work table. The given PK is quintessential.
CREATE TABLE #Work
(
symbol VARCHAR(30) NOT NULL
,tDate DATETIME NOT NULL
,ChangeType SMALLINT NOT NULL
,MyGroup INT NOT NULL DEFAULT 0
PRIMARY KEY CLUSTERED (symbol,tDate) --Absolutely critical. Don't even think of changing this.
)
;
--===== Populate the work table with the data that we need.
-- Notice the ChangeType column determines the "direction"
-- of the pctChg column.
INSERT INTO #Work
(symbol, tDate, ChangeType)
SELECT symbol, tDate, ChangeType = SIGN(ISNULL(pctChg,0))
FROM #Raw
WHERE isWeekEnd = 0
;
--===== Declare a set of obviously named variables to drive
-- the "Quirky Update"
DECLARE @PrevSymbol VARCHAR(30)
,@PrevChangeType SMALLINT
,@PrevMyGroup INT
,@SafetyCounter INT
;
--===== Preset a couple of those variables to a known condition.
SELECT @PrevMyGroup = 0
,@SafetyCounter = 1
;
--===== Using a safety counter to raise an error if this ever makes a mistake
-- (but it never will), scan the table and change/increment the MyGroup
-- counter when the symbol or ChangeType column changes from row to row
-- and do it all in the order of symbol ad tDate columns, just like the PK.
WITH
cteSafetyCounter AS
( --=== This cte exposes the columns we need to read or update and provides
-- the safety counter.
SELECT SafetyCounter = ROW_NUMBER() OVER (ORDER BY symbol,tDate)
,symbol
,tDate
,ChangeType
,MyGroup
FROM #Work WITH(TABLOCKX,INDEX(1)) --Absolutely critical. Don't even think of changing this.
)
--==== Now we'll calculate the MyGroup column just like we would in procedural code...
-- one row at a time using the "Quirky Update" as a "Pseudo Cursor".
UPDATE tgt
SET @PrevMyGroup = MyGroup =CASE
WHEN @SafetyCounter = SafetyCounter
THEN CASE
WHEN symbol = @PrevSymbol AND ChangeType = @PrevChangeType
THEN @PrevMyGroup
ELSE @PrevMyGroup + 1
END
ELSE 1/0 --Raises an error if something gets out of sequence
END
,@PrevSymbol = Symbol
,@PrevChangeType = ChangeType
,@SafetyCounter = @SafetyCounter + 1
FROM cteSafetyCounter tgt
OPTION (MAXDOP 1) --Absolutely critical. Don't even think of changing this.
;
--===== Now that we have the data marked by groups, the rest is easy to produce the report.
WITH
cteAggregate AS
( --=== This aggregates a count for each MyGroup in each symbol.
SELECT symbol
,streakStart = MIN(tDate)
,streakEnd = MAX(tDate)
,streakLength = COUNT(*)
FROM #Work
GROUP BY symbol,MyGroup
)
,
cteSortOrder AS
( --=== This numbers the groups in descending order by streakLength and uses streakStart as a tie-breaker
SELECT SortOrder = ROW_NUMBER() OVER (PARTITION BY symbol ORDER BY streakLength DESC, streakStart DESC)
,*
FROM cteAggregate
) --=== This returns the desired info for the largest, latest streak for each symbol.
-- The SortOrder = 1 was created in descending order by steakLength and streakStart above.
SELECT symbol, streakStart, streakEnd, streakLength
FROM cteSortOrder
WHERE SortOrder = 1
ORDER BY symbol
;

That produces the following results...

symbol streakStart             streakEnd               streakLength
------ ----------------------- ----------------------- ------------
A 2014-08-01 00:00:00.000 2014-08-05 00:00:00.000 3
B 2014-08-01 00:00:00.000 2014-08-05 00:00:00.000 3
C 2014-08-05 00:00:00.000 2014-08-05 00:00:00.000 1



Note that the original post said that the streakLength for symbol "A" should be "2" and that's actually incorrect according to the given requirements. The requirements said that streaks should continue through weekend NULL days (presumably, always NULL on the weekend) but streaks stop on NULLs during the week.

This code can certainly be tweeked to allow streaks to run through holidays just by adding AND isHoliday = 0 to the appropriate place in the code that populates the work table.

As a bit of a sidebar, the "Quirky Update" method in this code is a highly controversial but highly effect method that I've been using just about forever. It even beats LEAD/LAG in 2012 and up. Some good folks (Paul White and Tom Tompson on this site and Peter Larsson separately on another site) came up with the "Safety Counter" thing after 2005 came out and that's nice to have. Once you get one of these bad boys working correctly, it won't fail especially when the INDEX hint is present. There are areas in the code that simply must not be altered and I've marked those areas.

Be advised that, in no way, does MS support or recommend this method. Don't let that scare you too much. I've got systems that still work using this method after more than 15 years of SQL Server CUs, SPs, and Rev changes.

If that's not good enough, then I recommend the creation of a CLR to pull this task off although I think it might be slower (I haven't tried it because I haven't needed to).

Some of the heavy hitters on this site also use this technique and understand it well. To wit, if you have a question on it, don't hesitate to ask. One or more of us can help in this area.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1601367
Posted Friday, August 8, 2014 11:13 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 8:40 AM
Points: 35,265, Visits: 31,754
As another bit of a sidebar, we could also modify the code to identity the direction of the largest streak or produce 3 rows for each symbol to identify the largest negative, positive, and static (no change) streaks.

--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1601368
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse