Recent PostsRecent Posts Popular TopicsPopular Topics
 Home Search Members Calendar Who's On

 Consecutive streak excluding weekends Rate Topic Display Mode Topic Options
Author
 Message
 Posted Friday, August 8, 2014 11:38 AM
 SSC Veteran Group: General Forum Members Last Login: Tuesday, July 5, 2016 10:39 AM Points: 241, Visits: 668
 Hi SSC,I'm trying to write an algorithm that returns the most recent and longest consecutive streak of positive or negative price changes in a given stock. The streak can extend over null weekends, but not over null weekdays (presumably trading days). For example, lets say Google had end of day positive returns on (any given) Tuesday, Monday, and previous Friday, but then Thursday, it had negative returns. That would be a 3 day streak between Friday and Tuesday. Also, if a date has a null value on a date that is NOT a weekend, the streak ends. In the following code sample, you can get a simplified idea of what the raw data will look like and what the output should look like.`set nocount onset datefirst 7goif object_id('tempdb.dbo.#raw') is not null drop table #rawcreate table #raw( symbol varchar(30), tDate date, tDateInt as cast(cast(tDate as datetime) as int), --Integer version of the date pctChg float, isWeekend as case when datepart(weekday, tDate) in (1,7) then 1 else 0 end --indicate whether the date is weekend primary key clustered (symbol, tDate desc))insert into #raw (symbol, tDate, pctChg) select 'A', cast(41854 as datetime), 3.14insert into #raw (symbol, tDate, pctChg) select 'A', cast(41853 as datetime), 2.21insert into #raw (symbol, tDate, pctChg) select 'A', cast(41852 as datetime), nullinsert into #raw (symbol, tDate, pctChg) select 'A', cast(41851 as datetime), nullinsert into #raw (symbol, tDate, pctChg) select 'A', cast(41850 as datetime), 1.01insert into #raw (symbol, tDate, pctChg) select 'A', cast(41849 as datetime), -1.67insert into #raw (symbol, tDate, pctChg) select 'B', cast(41854 as datetime), -.3insert into #raw (symbol, tDate, pctChg) select 'B', cast(41853 as datetime), -6insert into #raw (symbol, tDate, pctChg) select 'B', cast(41852 as datetime), nullinsert into #raw (symbol, tDate, pctChg) select 'B', cast(41851 as datetime), nullinsert into #raw (symbol, tDate, pctChg) select 'B', cast(41850 as datetime), -1.01insert into #raw (symbol, tDate, pctChg) select 'B', cast(41849 as datetime), 1.67insert into #raw (symbol, tDate, pctChg) select 'C', cast(41854 as datetime), 2.1insert into #raw (symbol, tDate, pctChg) select 'C', cast(41853 as datetime), nullinsert into #raw (symbol, tDate, pctChg) select 'C', cast(41852 as datetime), nullinsert into #raw (symbol, tDate, pctChg) select 'C', cast(41851 as datetime), nullinsert into #raw (symbol, tDate, pctChg) select 'C', cast(41850 as datetime), 1.01insert into #raw (symbol, tDate, pctChg) select 'C', cast(41849 as datetime), -1.67select top 1000 *from #raw--Simulation of what results should look likeselect symbol = 'A', streakStart = '2014-08-04', streakEnd = '2014-08-05', streakLength = 2 union allselect symbol = 'B', streakStart = '2014-08-01', streakEnd = '2014-08-05', streakLength = 3 union allselect symbol = 'C', streakStart = '2014-08-05', streakEnd = '2014-08-05', streakLength = 1 `I've done date streaks before by using a row_number and subtracting it from the date to get consecutive groupings, but this is a little different because it depends on the direction of the price change as well as whether or not it was a weekend.`select top 1000 *, grp = tDateInt - row_number() over (partition by symbol order by tDateInt)from #raw`I should also mention that this has to be done over about half a million symbols so something RBAR is especially unappealing.Any ideas would be greatly appreciated. Executive Junior Cowboy Developer, Esq.
Post #1601291
 Posted Friday, August 8, 2014 12:08 PM
 SSCertifiable Group: General Forum Members Last Login: Wednesday, September 23, 2015 3:34 PM Points: 5,467, Visits: 7,660
 Currently working in finance, there's additional days you need to kick out, which are the holidays for the different markets. Assuming you're only working against NYSE, NASDAQ, and OTC, you still need to deal with that.With that in mind, do you have a calendar table already setup which lists allowable dates, or do you have a holiday table with is exclusionary for your list? If calendar, this becomes easy. If holiday, we have to build the calendar on the fly and then work from there. - Craig FarrellNever stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake. For better assistance in answering your questions | Forum NetiquetteFor index/tuning help, follow these directions. |Tally TablesTwitter: @AnyWayDBA
Post #1601297
 Posted Friday, August 8, 2014 1:13 PM
 SSC Veteran Group: General Forum Members Last Login: Tuesday, July 5, 2016 10:39 AM Points: 241, Visits: 668
 Exchange holidays are a whole other set of fun, which, for simplicities sake, I've excluded. I'm guessing if I can define a way to skip over holidays, it could similarly be applied to exchange holidays. Executive Junior Cowboy Developer, Esq.
Post #1601316
 Posted Friday, August 8, 2014 1:42 PM
 SSCertifiable Group: General Forum Members Last Login: Wednesday, September 23, 2015 3:34 PM Points: 5,467, Visits: 7,660
 I would recommend you start with something like this:http://www.brianshowalter.com/calendar_tablesJust the first one that came up on google, and it looks reasonable.Using a table like this you can filter, control, and get your dates organized into a reasonable fashion. Once you have that, a filtered list of the dates you want to use (non holiday/weekend), a ROW_NUMBER() function, and an application of islands and gaps techniques can get you where you want to be. But first, we need the calendar. - Craig FarrellNever stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake. For better assistance in answering your questions | Forum NetiquetteFor index/tuning help, follow these directions. |Tally TablesTwitter: @AnyWayDBA
Post #1601324
 Posted Friday, August 8, 2014 11:03 PM
 SSC-Forever Group: General Forum Members Last Login: Yesterday @ 4:03 PM Points: 40,983, Visits: 38,278
 This will do it. As usual, most of the details are where they belong... in comments in the code. The following code works with the test harness provided in the orginal post.`--===== If the work table already exists, drop it to make reruns in SSMS easier IF OBJECT_ID('tempdb..#Work','U') IS NOT NULL DROP TABLE #Work;--===== Create the work table. The given PK is quintessential. CREATE TABLE #Work ( symbol VARCHAR(30) NOT NULL ,tDate DATETIME NOT NULL ,ChangeType SMALLINT NOT NULL ,MyGroup INT NOT NULL DEFAULT 0 PRIMARY KEY CLUSTERED (symbol,tDate) --Absolutely critical. Don't even think of changing this. );--===== Populate the work table with the data that we need. -- Notice the ChangeType column determines the "direction" -- of the pctChg column. INSERT INTO #Work (symbol, tDate, ChangeType) SELECT symbol, tDate, ChangeType = SIGN(ISNULL(pctChg,0)) FROM #Raw WHERE isWeekEnd = 0;--===== Declare a set of obviously named variables to drive -- the "Quirky Update"DECLARE @PrevSymbol VARCHAR(30) ,@PrevChangeType SMALLINT ,@PrevMyGroup INT ,@SafetyCounter INT;--===== Preset a couple of those variables to a known condition. SELECT @PrevMyGroup = 0 ,@SafetyCounter = 1;--===== Using a safety counter to raise an error if this ever makes a mistake -- (but it never will), scan the table and change/increment the MyGroup -- counter when the symbol or ChangeType column changes from row to row -- and do it all in the order of symbol ad tDate columns, just like the PK.WITHcteSafetyCounter AS( --=== This cte exposes the columns we need to read or update and provides -- the safety counter. SELECT SafetyCounter = ROW_NUMBER() OVER (ORDER BY symbol,tDate) ,symbol ,tDate ,ChangeType ,MyGroup FROM #Work WITH(TABLOCKX,INDEX(1)) --Absolutely critical. Don't even think of changing this.) --==== Now we'll calculate the MyGroup column just like we would in procedural code... -- one row at a time using the "Quirky Update" as a "Pseudo Cursor". UPDATE tgt SET @PrevMyGroup = MyGroup =CASE WHEN @SafetyCounter = SafetyCounter THEN CASE WHEN symbol = @PrevSymbol AND ChangeType = @PrevChangeType THEN @PrevMyGroup ELSE @PrevMyGroup + 1 END ELSE 1/0 --Raises an error if something gets out of sequence END ,@PrevSymbol = Symbol ,@PrevChangeType = ChangeType ,@SafetyCounter = @SafetyCounter + 1 FROM cteSafetyCounter tgt OPTION (MAXDOP 1) --Absolutely critical. Don't even think of changing this.;--===== Now that we have the data marked by groups, the rest is easy to produce the report.WITHcteAggregate AS( --=== This aggregates a count for each MyGroup in each symbol. SELECT symbol ,streakStart = MIN(tDate) ,streakEnd = MAX(tDate) ,streakLength = COUNT(*) FROM #Work GROUP BY symbol,MyGroup),cteSortOrder AS( --=== This numbers the groups in descending order by streakLength and uses streakStart as a tie-breaker SELECT SortOrder = ROW_NUMBER() OVER (PARTITION BY symbol ORDER BY streakLength DESC, streakStart DESC) ,* FROM cteAggregate) --=== This returns the desired info for the largest, latest streak for each symbol. -- The SortOrder = 1 was created in descending order by steakLength and streakStart above. SELECT symbol, streakStart, streakEnd, streakLength FROM cteSortOrder WHERE SortOrder = 1 ORDER BY symbol;`That produces the following results...`symbol streakStart streakEnd streakLength------ ----------------------- ----------------------- ------------A 2014-08-01 00:00:00.000 2014-08-05 00:00:00.000 3B 2014-08-01 00:00:00.000 2014-08-05 00:00:00.000 3C 2014-08-05 00:00:00.000 2014-08-05 00:00:00.000 1`Note that the original post said that the streakLength for symbol "A" should be "2" and that's actually incorrect according to the given requirements. The requirements said that streaks should continue through weekend NULL days (presumably, always NULL on the weekend) but streaks stop on NULLs during the week.This code can certainly be tweeked to allow streaks to run through holidays just by adding AND isHoliday = 0 to the appropriate place in the code that populates the work table.As a bit of a sidebar, the "Quirky Update" method in this code is a highly controversial but highly effect method that I've been using just about forever. It even beats LEAD/LAG in 2012 and up. Some good folks (Paul White and Tom Tompson on this site and Peter Larsson separately on another site) came up with the "Safety Counter" thing after 2005 came out and that's nice to have. Once you get one of these bad boys working correctly, it won't fail especially when the INDEX hint is present. There are areas in the code that simply must not be altered and I've marked those areas.Be advised that, in no way, does MS support or recommend this method. Don't let that scare you too much. I've got systems that still work using this method after more than 15 years of SQL Server CUs, SPs, and Rev changes.If that's not good enough, then I recommend the creation of a CLR to pull this task off although I think it might be slower (I haven't tried it because I haven't needed to).Some of the heavy hitters on this site also use this technique and understand it well. To wit, if you have a question on it, don't hesitate to ask. One or more of us can help in this area. --Jeff Moden"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".First step towards the paradigm shift of writing Set Based code: Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column." Helpful Links:How to post code problemsHow to post performance problems
Post #1601367
 Posted Friday, August 8, 2014 11:13 PM
 SSC-Forever Group: General Forum Members Last Login: Yesterday @ 4:03 PM Points: 40,983, Visits: 38,278
 As another bit of a sidebar, we could also modify the code to identity the direction of the largest streak or produce 3 rows for each symbol to identify the largest negative, positive, and static (no change) streaks. --Jeff Moden"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".First step towards the paradigm shift of writing Set Based code: Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column." Helpful Links:How to post code problemsHow to post performance problems
Post #1601368

 Permissions