Most efficient Running Total with COUNT(DISITNCT #)

  • Hello All,

    I'm trying to fine the most efficient way of doing a running total of COUNT(DISTINCT ID's). I need to run it as of year, as of the month during a year. I looked at WINDOWS functions but can't do a COUNT(DISTINCT). I have some small test data. But i'll be running this over around 15 million row table. I have complete schema control as well. but here is the bare basics of the data. Any help or pointing in the right direction is very appreciated. 

    CREATE TABLE #Test ( VisitID INT, VisitDate DATE )
    INSERT INTO #Test (VisitID, VisitDate) VALUES
    (295591    ,'2015-10-02'),
    (1101206    ,'2014-04-01'),
    (1104675    ,'2014-04-11'),
    (1116085    ,'2014-05-14'),
    (291111    ,'2015-09-20'),
    (1108068    ,'2014-04-22'),
    (1116085    ,'2014-05-14'),
    (291111    ,'2015-09-20')

    SELECT * FROM #Test

    DROP TABLE #Test

    ***SQL born on date Spring 2013:-)

  • Not entirely certain what you are after but here is an example that 

    1. Finds the first visit for each ID
    2. Aggregates the number of visits to Year-Month
    3. Does a running total for Year-Month

    😎

    IF OBJECT_ID(N'tempdb..#Test') IS NOT NULL DROP TABLE #Test;
    CREATE TABLE #Test ( VisitID INT, VisitDate DATE )
    INSERT INTO #Test (VisitID, VisitDate) VALUES
    (295591  ,'2015-10-02'),
    (1101206  ,'2014-04-01'),
    (1104675  ,'2014-04-11'),
    (1116085  ,'2014-05-14'),
    (291111  ,'2015-09-20'),
    (1108068  ,'2014-04-22'),
    (1116085  ,'2014-05-14'),
    (291111  ,'2015-09-20');

    CREATE NONCLUSTERED INDEX NCLIDX_#TEST_VISITID_VISITDATE ON #Test (VisitID ASC, VisitDate ASC);

    ;WITH FIRST_VISITS AS
    (
      SELECT
       T.VisitID
       ,MIN(T.VisitDate) AS FIRST_VISIT
      FROM #Test T
      GROUP BY T.VisitID
    )
    ,VISIT_BY_YEAR_MONTH AS
    (
      SELECT
       YEAR(FV.FIRST_VISIT)  AS V_YEAR
      ,MONTH(FV.FIRST_VISIT)  AS V_MONTH
      ,COUNT(FV.VisitID)   AS V_COUNT
      FROM  FIRST_VISITS  FV
      GROUP BY YEAR(FV.FIRST_VISIT)
        ,MONTH(FV.FIRST_VISIT)
    )
    SELECT
      VYM.V_YEAR
     ,VYM.V_MONTH
     ,VYM.V_COUNT
     ,SUM(VYM.V_COUNT) OVER
       (
        ORDER BY VYM.V_YEAR
           ,VYM.V_MONTH
        ROWS BETWEEN UNBOUNDED PRECEDING
          AND  CURRENT ROW
       ) AS RT_VISIT_COUNT
    FROM  VISIT_BY_YEAR_MONTH  VYM;

    Output

    V_YEAR      V_MONTH     V_COUNT     RT_VISIT_COUNT
    ----------- ----------- ----------- --------------
    2014        4           3           3
    2014        5           1           4
    2015        9           1           5
    2015        10          1           6

  • Hi Eirikur,

    Thanks again for your input, those results are very close to what I need. Except the Year counts would start over vs be continous

    So the end total for 2014 would be 4 and the end total for 2015 would. The months are dead on. I have used the quirky update method before but i'm not sure how i can get that to work for COUNT(DISTINCT).

    ***SQL born on date Spring 2013:-)

  • This might help you, restarts the count on a new year. Subject X from year n-1 will be counted again on year n.
    😎

    IF OBJECT_ID(N'tempdb..#Test') IS NOT NULL DROP TABLE #Test;
    CREATE TABLE #Test ( VisitID INT, VisitDate DATE )
    INSERT INTO #Test (VisitID, VisitDate) VALUES
    (295591  ,'2015-10-02'),
    (1101206  ,'2014-04-01'),
    (1104675  ,'2014-04-11'),
    (1116085  ,'2014-05-14'),
    (291111  ,'2015-09-20'),
    (1108068  ,'2014-04-22'),
    (1116085  ,'2014-05-14'),
    (291111  ,'2015-09-20');

    CREATE NONCLUSTERED INDEX NCLIDX_#TEST_VISITID_VISITDATE ON #Test (VisitID ASC, VisitDate ASC);

    ;WITH FIRST_VISITS AS
    (
      SELECT
       T.VisitID
       ,YEAR(T.VisitDate) AS V_YEAR
       ,MIN(T.VisitDate) AS FIRST_VISIT
      FROM #Test T
      GROUP BY T.VisitID
        ,YEAR(T.VisitDate)
    )
    ,VISIT_BY_YEAR_MONTH AS
    (
      SELECT
       FV.V_YEAR      AS V_YEAR
      ,MONTH(FV.FIRST_VISIT)  AS V_MONTH
      ,COUNT(FV.VisitID)   AS V_COUNT
      FROM  FIRST_VISITS  FV
      GROUP BY FV.V_YEAR
        ,MONTH(FV.FIRST_VISIT)
    )
    SELECT
      VYM.V_YEAR
     ,VYM.V_MONTH
     ,VYM.V_COUNT
     ,SUM(VYM.V_COUNT) OVER
       (
        PARTITION BY VYM.V_YEAR
        ORDER BY VYM.V_YEAR
           ,VYM.V_MONTH
        ROWS BETWEEN UNBOUNDED PRECEDING
          AND  CURRENT ROW
       ) AS RT_VISIT_COUNT
    FROM  VISIT_BY_YEAR_MONTH  VYM;

    Output
    V_YEAR      V_MONTH     V_COUNT     RT_VISIT_COUNT
    ----------- ----------- ----------- --------------
    2014        4           3           3
    2014        5           1           4
    2015        9           1           1
    2015        10          1           2

  • Awesome going to give it a try !. Thanks !

    ***SQL born on date Spring 2013:-)

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply