• sgmunson - Friday, June 23, 2017 10:07 AM

    Okay, now that I've had some time to mess with it, here's a working solution:
    USE LOCAL_DB;
    GO

    SET NOCOUNT ON;

    IF OBJECT_ID(N'dbo.DailyVisits', N'U') IS NOT NULL
        BEGIN
        DROP TABLE dbo.DailyVisits;
        END;

    GO

    CREATE TABLE dbo.DailyVisits (
        dt date NOT NULL,
        visitor varchar(10) NOT NULL,
        CONSTRAINT PK_DailyVisits PRIMARY KEY(dt, visitor)
    );
    CREATE NONCLUSTERED INDEX IX_DailyVisits_visitor_dt ON dbo.DailyVisits
        (
        visitor ASC,
        dt ASC
        );

    INSERT INTO dbo.DailyVisits (dt, visitor)
        VALUES    ('20110601', 'A'),
                ('20110601', 'B'),
                ('20110601', 'C'),
                --
                ('20110602', 'A'),
                ('20110602', 'C'),
                --
                ('20110603', 'A'),
                ('20110603', 'D'),
                --
                --
                ('20110607', 'A'),
                ('20110607', 'D'),
                --
                ('20110608', 'D'),
                ('20110608', 'E'),
                ('20110608', 'F');

    SELECT *
    FROM dbo.DailyVisits;

    --Solution:
    DECLARE @Start AS date, @End AS date;
    SELECT @Start = MIN(dt), @End = MAX(dt)
    FROM dbo.DailyVisits;

    WITH CALENDAR AS (

            SELECT @Start AS dt, CAST(DATEADD(day, -1, @Start) AS date) AS prevday
            UNION ALL
            SELECT CAST(DATEADD(day, 1, dt) AS date) AS dt, CAST(DATEADD(day, 1, prevday) AS date) AS prevday
            FROM CALENDAR
            WHERE dt < @End
    ),
        MERGED_DATA AS (

            SELECT C.dt, C.prevday, DV.visitor
            FROM CALENDAR AS C
                LEFT OUTER JOIN dbo.DailyVisits AS DV
                    ON C.dt = DV.dt
    )
    SELECT MD.dt,
        COUNT(DISTINCT MD.visitor) AS NumVisits,
        ISNULL(SUM(CASE WHEN MD.visitor IS NOT NULL AND M2.visitor IS NULL THEN 1 ELSE 0 END), 0) AS Added,
        --CASE WHEN MD.visitor IS NOT NULL AND M2.visitor IS NULL THEN 1 ELSE 0 END AS Added,
        MAX(R.Removed) AS Removed,
        --R.Removed,
        ISNULL(SUM(CASE WHEN MD.visitor = M2.visitor THEN 1 ELSE 0 END), 0) AS Remained
        --CASE WHEN MD.visitor = M2.visitor THEN 1 ELSE 0 END AS Remained
    FROM MERGED_DATA AS MD
        LEFT OUTER JOIN MERGED_DATA AS M2
            ON MD.prevday = M2.dt
            AND MD.visitor = M2.visitor
        OUTER APPLY (
            SELECT COUNT(MD2.visitor) AS Removed
            FROM MERGED_DATA AS MD2
            WHERE MD2.dt = MD.prevday
                AND MD2.visitor NOT IN (
                    SELECT MD3.visitor
                    FROM MERGED_DATA AS MD3
                    WHERE MD3.dt = MD2.dt
                        AND MD3.visitor <> MD.visitor
                    )
            ) AS R
    GROUP BY MD.dt
    ORDER BY MD.dt;

    DROP TABLE dbo.DailyVisits;
    /*    -- RESULTS SHOULD BE
      dt        #visits   #added   #removed #remained
    2011-06-01     3         3        0        0
    2011-06-02     2         0        1        2
    2011-06-03     2         1        1        1
    2011-06-04     0         0        2        0
    2011-06-05     0         0        0        0
    2011-06-06     0         0        0        0
    2011-06-07     2         2        0        0
    2011-06-08     3         2        1        1
    */

    Steve,
    Try it with a bit more test data and see where the two Triangular Joins come into play.  Here's the test harness.  Turn on the Actual Execution Plan and look for the big arrows. 


    SET NOCOUNT ON;

    IF OBJECT_ID(N'dbo.DailyVisits', N'U') IS NOT NULL
        BEGIN
        DROP TABLE dbo.DailyVisits;
        END;

    GO

    CREATE TABLE dbo.DailyVisits (
        dt date NOT NULL,
        visitor varchar(10) NOT NULL,
        CONSTRAINT PK_DailyVisits PRIMARY KEY(dt, visitor)
    );
    CREATE NONCLUSTERED INDEX IX_DailyVisits_visitor_dt ON dbo.DailyVisits
        (
        visitor ASC,
        dt ASC
        );

    --===== 10,000 visitors over a 5 year period
     INSERT INTO dbo.DailyVisits
            (dt, visitor)
     SELECT TOP (10000)
             dt      = DATEADD(dd,ABS(CHECKSUM(NEWID())%DATEDIFF(dd,'2011','2017')),'2011') --2017 is exclusive here
            ,visitor = RIGHT(NEWID(),7)
       FROM      sys.all_columns ac1
      CROSS JOIN sys.all_columns ac2
    ;

    For more information on the devastating effect that Triangular Joins can have even on such small row counts, please see the following article.
    http://www.sqlservercentral.com/articles/T-SQL/61539/

    Also, because you're using an rCTE that counts, you also have to add a MAXRECURSION option if you want to be able to process more than 100 days.  I wouldn't use the rCTE for this.  Please see the following article on why you should never use one to create a sequence/count.
    http://www.sqlservercentral.com/articles/T-SQL/74118/

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)