• Itzik Ben-Gan wrote about this quite a while back but he took the post down and haven't seen it reposted. Might be around somewhere.

    Anyway, I'm assuming that some decent scalability is needed so here's a 10 million row table over a period of 10 years with 1 million subscribers each having about 10 subscriptions that last from 0 to 5 years. It takes about 63 seconds to build the rows and the 3 indexes this problem needs.

    --=====================================================================================================================

    -- Create a million row test table.

    --=====================================================================================================================

    --===== If the test table exists, drop it to make reruns in SSMS easier

    SET NOCOUNT ON;

    IF OBJECT_ID(N'tempdb..#TestTable') IS NOT NULL

    DROP TABLE #TestTable

    ;

    GO

    --===== Create and populate a 10 million row test table on the fly.

    -- 1,000,000 random IDs with ~10 random date spans of 0 to 5 years each

    WITH

    cteGenDates AS

    (

    SELECT TOP 10000000

    SubscriberID = ABS(CHECKSUM(NEWID()))%1000000+1

    ,StartDate = DATEADD(dd,ABS(CHECKSUM(NEWID()))%DATEDIFF(dd,'2010','2020'),'2010')

    ,Span = ABS(CHECKSUM(NEwID()))%(365*5)

    FROM sys.all_columns ac1

    CROSS JOIN sys.all_columns ac2

    )

    SELECT SubscriptionID = IDENTITY(INT,1,1)

    ,SubscriberID

    ,StartDate

    ,EndDate = DATEADD(dd,Span,StartDate)

    INTO #TestTable

    FROM cteGenDates

    ORDER BY StartDate --For a bit a realism

    ;

    --===== Create the expected/needed indexes

    ALTER TABLE #TestTable ADD CONSTRAINT PK_#TestTable PRIMARY KEY CLUSTERED (SubscriptionID);

    CREATE UNIQUE INDEX IX_Begin_Unique ON #TestTable(SubscriberID, StartDate, SubscriptionID);

    CREATE UNIQUE INDEX IX_End_Unique ON #TestTable(SubscriberID, EndDate , SubscriptionID);

    So much for any magic on my part. The rest is all from Itzik. This solves for all spans for all subscribers in the entire table in about 22 seconds and saves the results in another table.

    SET STATISTICS TIME,IO ON;

    --===== If the results table already exists, drop it to make reruns easier.

    IF OBJECT_ID('tempdb..#Results','U') IS NOT NULL

    DROP TABLE #Results

    ;

    --===== Solve the problem using Itzik's count up/count down method of grouping.

    WITH

    C1 AS

    (

    SELECT SubscriptionID

    ,SubscriberID

    ,TS = StartDate

    ,Type = +1

    ,E = NULL

    ,S = ROW_NUMBER() OVER (PARTITION BY SubscriberID ORDER BY StartDate, SubscriptionID)

    FROM #TestTable

    UNION ALL

    SELECT SubscriptionID

    ,SubscriberID

    ,TS = EndDate+1

    ,Type = -1

    ,E = ROW_NUMBER() OVER(PARTITION BY SubscriberID ORDER BY EndDate, SubscriptionID)

    ,S = NULL

    FROM #TestTable

    )

    ,C2 AS

    (

    SELECT c1.*

    ,SE = ROW_NUMBER() OVER (PARTITION BY SubscriberID ORDER BY TS, Type DESC, SubscriptionID)

    FROM C1 c1

    )

    ,C3 AS

    (

    SELECT SubscriberID

    ,TS

    ,GrpNum = FLOOR((ROW_NUMBER() OVER(PARTITION BY SubscriberID ORDER BY TS)-1)/2+1)

    FROM C2

    WHERE COALESCE(S-(SE-S)-1, (SE-E)-E) = 0

    )

    SELECT SubscriberID

    ,StartDate = MIN(TS)

    ,EndDate = MAX(TS-1)

    INTO #Results

    FROM C3

    GROUP BY SubscriberID, GrpNum

    ORDER BY SubscriberID, StartDate

    ;

    SET STATISTICS TIME,IO OFF;

    You can change the "3" in the following code to compare results between the two tables.

    --===== This displays the raw rows from subscriber #3 to compare with spans that follow.

    DECLARE @SubscriberID INT = 3 --<<<< Change the 3 to compare other subscribers.

    SELECT * FROM #TestTable WHERE SubscriberID = @SubscriberID ORDER BY SubscriberID,StartDate;

    SELECT * FROM #Results WHERE SubscriberID = @SubscriberID ORDER BY SubscriberID,StartDate;

    The really neat thing is that this code works in 2005 and up.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)