This will do it. As usual, most of the details are where they belong... in comments in the code. 😉
The following code works with the test harness provided in the orginal post.
--===== If the work table already exists, drop it to make reruns in SSMS easier
IF OBJECT_ID('tempdb..#Work','U') IS NOT NULL
DROP TABLE #Work
;
--===== Create the work table. The given PK is quintessential.
CREATE TABLE #Work
(
symbol VARCHAR(30) NOT NULL
,tDate DATETIME NOT NULL
,ChangeType SMALLINT NOT NULL
,MyGroup INT NOT NULL DEFAULT 0
PRIMARY KEY CLUSTERED (symbol,tDate) --Absolutely critical. Don't even think of changing this.
)
;
--===== Populate the work table with the data that we need.
-- Notice the ChangeType column determines the "direction"
-- of the pctChg column.
INSERT INTO #Work
(symbol, tDate, ChangeType)
SELECT symbol, tDate, ChangeType = SIGN(ISNULL(pctChg,0))
FROM #Raw
WHERE isWeekEnd = 0
;
--===== Declare a set of obviously named variables to drive
-- the "Quirky Update"
DECLARE @PrevSymbol VARCHAR(30)
,@PrevChangeType SMALLINT
,@PrevMyGroup INT
,@SafetyCounter INT
;
--===== Preset a couple of those variables to a known condition.
SELECT @PrevMyGroup = 0
,@SafetyCounter = 1
;
--===== Using a safety counter to raise an error if this ever makes a mistake
-- (but it never will), scan the table and change/increment the MyGroup
-- counter when the symbol or ChangeType column changes from row to row
-- and do it all in the order of symbol ad tDate columns, just like the PK.
WITH
cteSafetyCounter AS
( --=== This cte exposes the columns we need to read or update and provides
-- the safety counter.
SELECT SafetyCounter = ROW_NUMBER() OVER (ORDER BY symbol,tDate)
,symbol
,tDate
,ChangeType
,MyGroup
FROM #Work WITH(TABLOCKX,INDEX(1)) --Absolutely critical. Don't even think of changing this.
)
--==== Now we'll calculate the MyGroup column just like we would in procedural code...
-- one row at a time using the "Quirky Update" as a "Pseudo Cursor".
UPDATE tgt
SET @PrevMyGroup = MyGroup =CASE
WHEN @SafetyCounter = SafetyCounter
THEN CASE
WHEN symbol = @PrevSymbol AND ChangeType = @PrevChangeType
THEN @PrevMyGroup
ELSE @PrevMyGroup + 1
END
ELSE 1/0 --Raises an error if something gets out of sequence
END
,@PrevSymbol = Symbol
,@PrevChangeType = ChangeType
,@SafetyCounter = @SafetyCounter + 1
FROM cteSafetyCounter tgt
OPTION (MAXDOP 1) --Absolutely critical. Don't even think of changing this.
;
--===== Now that we have the data marked by groups, the rest is easy to produce the report.
WITH
cteAggregate AS
( --=== This aggregates a count for each MyGroup in each symbol.
SELECT symbol
,streakStart = MIN(tDate)
,streakEnd = MAX(tDate)
,streakLength = COUNT(*)
FROM #Work
GROUP BY symbol,MyGroup
)
,
cteSortOrder AS
( --=== This numbers the groups in descending order by streakLength and uses streakStart as a tie-breaker
SELECT SortOrder = ROW_NUMBER() OVER (PARTITION BY symbol ORDER BY streakLength DESC, streakStart DESC)
,*
FROM cteAggregate
) --=== This returns the desired info for the largest, latest streak for each symbol.
-- The SortOrder = 1 was created in descending order by steakLength and streakStart above.
SELECT symbol, streakStart, streakEnd, streakLength
FROM cteSortOrder
WHERE SortOrder = 1
ORDER BY symbol
;
That produces the following results...
symbol streakStart streakEnd streakLength
------ ----------------------- ----------------------- ------------
A 2014-08-01 00:00:00.000 2014-08-05 00:00:00.000 3
B 2014-08-01 00:00:00.000 2014-08-05 00:00:00.000 3
C 2014-08-05 00:00:00.000 2014-08-05 00:00:00.000 1
Note that the original post said that the streakLength for symbol "A" should be "2" and that's actually incorrect according to the given requirements. The requirements said that streaks should continue through weekend NULL days (presumably, always NULL on the weekend) but streaks stop on NULLs during the week.
This code can certainly be tweeked to allow streaks to run through holidays just by adding AND isHoliday = 0 to the appropriate place in the code that populates the work table.
As a bit of a sidebar, the "Quirky Update" method in this code is a highly controversial but highly effect method that I've been using just about forever. It even beats LEAD/LAG in 2012 and up. Some good folks (Paul White and Tom Tompson on this site and Peter Larsson separately on another site) came up with the "Safety Counter" thing after 2005 came out and that's nice to have. Once you get one of these bad boys working correctly, it won't fail especially when the INDEX hint is present. There are areas in the code that simply must not be altered and I've marked those areas.
Be advised that, in no way, does MS support or recommend this method. Don't let that scare you too much. I've got systems that still work using this method after more than 15 years of SQL Server CUs, SPs, and Rev changes.
If that's not good enough, then I recommend the creation of a CLR to pull this task off although I think it might be slower (I haven't tried it because I haven't needed to).
Some of the heavy hitters on this site also use this technique and understand it well. To wit, if you have a question on it, don't hesitate to ask. One or more of us can help in this area.
--Jeff Moden
Change is inevitable... Change for the better is not.