First I'll applogize for having edited this comment so much. I hope it didn't cause headaches for those who monitor. I thoroughly appreciate all of you.
I've found a way to solve a problem for which I can find no solution other than the Quirky Update method even in V-2017. Unless someone knows a solution that performs at least within an order of magnitude of the one below it seems the usefulness of Quirky Update will not disappear anytime soon.
It's a gaps and islands problem except I need to allow/ignore gaps smaller than a specific size and duplicates must be ok. For example 1,2,2,3,5,8,10 needs to collesce into 2 groups: 1-5 and 8-10. In that example I'm ignoring gaps of 1. So 2 will cause a split but typically I'd ignore bigger gaps. Below gaps as large as 9 are ignored but 10 will cause a new island to form.
If another method exists I'm totally interested. Also, given the validity check discussed earlier doesn't work in this case I'm also interested in any ideas on that!
DROP TABLE #t;
CREATE TABLE #t (CreateDate date not null,TufpID int not null,Cnt int not null, FuzzyGroup int NULL);
ALTER TABLE #t ADD CONSTRAINT PK_temp PRIMARY KEY CLUSTERED (CreateDate,TufpID);
-- The source table has an index (TufpID,CreateDate) which is backwards so
-- This takes 40 seconds to write 4.4M rows from a source of 70M rows.
-- I'm only showing this to give a better idea of the actual source data.
INSERT INTO #T
,Cnt = COUNT(*)
FROM SourceTable X
GROUP BY X.CreateDate
ORDER BY CreateDate,TufpID; -- SELECT TOP 100 * FROM #T ORDER BY CreateDate,TufpID;
-- This takes 3 seconds to update the 4.4M rows created above
DECLARE @FuzzFactor int = 10; -- Ignore gaps smaller than this.
DECLARE @Group int = -@FuzzFactor*2;
DECLARE @Prior int = -@FuzzFactor*2;
DECLARE @CDate date;
SET @Group = FuzzyGroup = CASE WHEN t.TufpID - @PRIOR < @FuzzFactor AND t.CreateDate = @CDate
THEN @Group ELSE t.TufpID END
,@CDate = CASE WHEN @CDate = t.CreateDate THEN @CDate ELSE t.CreateDate END
,@Prior = CASE WHEN @Prior = t.TufpID-1 THEN @Prior + 1 ELSE t.TufpID END
FROM #t t WITH (TABLOCKX) OPTION(MAXDOP 1); -- SELECT TOP 100 * FROM #t ORDER BY CreateDate,TufpID;
-- Sumarize fuzzy islands
WITH X AS (
,RowCnt = SUM(Cnt)
,TufpIDCnt = COUNT(*)
GROUP BY CreateDate,FuzzyGroup
), Y AS (
,RowCnt = SUM(RowCnt)
,Clusters = COUNT(*)
,MinTufps = MIN(TufpIDCnt)
,MaxTufps = MAX(TufpIDCnt)
,AvgTufps = AVG(TufpIDCnt)
,TufpIDCnt = SUM(TufpIDCnt)
GROUP BY CreateDate
,IdsPerCluster = TufpIDCnt/Clusters
,AvgAsPct = (AvgTufps*100)/TufpIDCnt
,MaxAsPct = (MaxTufps*100)/TufpIDCnt
WHERE RowCnt > 10000
ORDER BY RowCnt desc;
Also, I realize this particular example can't have duplicates but other more complex cases do.
Thanks again for the technique!!