Obtaining history totals on a monthly basis with only one table read?

  • Here's an interesting little problem that I'm trying to solve ...

    Basically, I have a table of registrations and subscribers. What I want is to have a report, with the data presented by month. For each month, there should be a record with the amount of subscribers we had up to that date.

    So, as an example set of data,

    DECLARE @TestTable TABLE

    (

    ID INT PRIMARY KEY IDENTITY,

    DateCreated SMALLDATETIME,

    Subscriber BIT

    )

    INSERT INTO @TestTable(DateCreated, Subscriber)

    VALUES('2009-01-01', 1)

    INSERT INTO @TestTable(DateCreated, Subscriber)

    VALUES('2009-02-01', 1)

    INSERT INTO @TestTable(DateCreated, Subscriber)

    VALUES('2009-02-02', 1)

    INSERT INTO @TestTable(DateCreated, Subscriber)

    VALUES('2009-04-05', 1)

    INSERT INTO @TestTable(DateCreated, Subscriber)

    VALUES('2009-05-05', 0)

    INSERT INTO @TestTable(DateCreated, Subscriber)

    VALUES('2009-05-07', 1)

    INSERT INTO @TestTable(DateCreated, Subscriber)

    VALUES('2009-05-09', 1)

    INSERT INTO @TestTable(DateCreated, Subscriber)

    VALUES('2009-10-01', 0)

    INSERT INTO @TestTable(DateCreated, Subscriber)

    VALUES('2009-11-01', 1)

    INSERT INTO @TestTable(DateCreated, Subscriber)

    VALUES('2009-12-01', 1)

    IDDateCreatedSubscriber

    12009-01-01 00:00:001

    22009-02-01 00:00:001

    32009-02-02 00:00:001

    42009-04-05 00:00:001

    52009-05-05 00:00:000

    62009-05-07 00:00:001

    72009-05-09 00:00:001

    82009-10-01 00:00:000

    92009-11-01 00:00:001

    102009-12-01 00:00:001

    And, the result set should be:

    YearMonthCount

    2009January1

    2009Februray3

    2009March3

    2009April4

    2009May6

    2009June6

    2009July6

    2009August6

    2009September6

    2009October6

    2009November7

    2009December8

    Now, here's the issue I'm trying to solve. I know this would be "easy" to do by just running a "SELECT COUNT(*) FROM TABLE WHERE dateCreated < @CurrentDate AND Subscriber = 1" for each month and inserting a record into a temp table. But what I'm trying to accomplish is to avoid having to query the entire table several times, especially since I'm potentially dealing with a non-indexed field. What I want to do is to basically run the query once, get all the data, and then do some sort of join or aggregate for each month that's requested. something like "SELECT SUM(CASE WHEN dateCreated < @CurrentDate THEN 1 ELSE 0) FROM TABLE GROUP BY dateCreated", except only showing the months that I'm interested in. One solution that I've considered is making a temp table, dumping all the records from my raw table into that temp table, then indexing the temp table accordingly and re-querying it repeatedly for each month, but I'd still like to find a solution that would involve only querying the table once, in whatever form I choose.

  • I assume you have something like a calendar table...

    If not, I've included a calendar (CTE_calendar) in the code below.

    Also, please note that I've used a temp table with a PK constraint to store the intermediate result.

    The reason is as simple as it is complicated:

    Jeff Moden's article [/url] shows how the "quirky update" works (at least in the attached SQL code).

    DECLARE @TestTable TABLE

    (

    ID INT PRIMARY KEY IDENTITY,

    DateCreated SMALLDATETIME,

    Subscriber BIT

    )

    INSERT INTO @TestTable (DateCreated, Subscriber)

    VALUES ('2009-01-01', 1)

    INSERT INTO @TestTable (DateCreated, Subscriber)

    VALUES ('2009-02-01', 1)

    INSERT INTO @TestTable (DateCreated, Subscriber)

    VALUES ('2009-02-02', 1)

    INSERT INTO @TestTable (DateCreated, Subscriber)

    VALUES ('2009-04-05', 1)

    INSERT INTO @TestTable (DateCreated, Subscriber)

    VALUES ('2009-05-05', 0)

    INSERT INTO @TestTable (DateCreated, Subscriber)

    VALUES ('2009-05-07', 1)

    INSERT INTO @TestTable (DateCreated, Subscriber)

    VALUES ('2009-05-09', 1)

    INSERT INTO @TestTable (DateCreated, Subscriber)

    VALUES ('2009-10-01', 0)

    INSERT INTO @TestTable (DateCreated, Subscriber)

    VALUES ('2009-11-01', 1)

    INSERT INTO @TestTable (DateCreated, Subscriber)

    VALUES ('2009-12-01', 1)

    SET NOCOUNT ON

    -- Declare intermedite table to hold the results and to perform the running total on

    CREATE TABLE #test2 (ID INT IDENTITY(1,1), datevalue DATETIME, cnt INT, rn_total INT,

    CONSTRAINT PK_#test2_ID PRIMARY KEY (ID)

    )

    ; WITH

    CTE_calendar AS -- build calendar table, assuming it doesn't exist

    (

    SELECT DATEADD(mm, RowNum-1,CONVERT(DateTime, '20090101', 112))AS datevalue

    FROM

    (

    SELECT row_number() OVER (ORDER BY object_id) AS RowNum FROM sys.all_objects

    ) RowTable

    WHERE RowNum <= DATEDIFF(Month,'20090101','20091231') + 1

    ),

    CTE_monthlyData AS --build monthly total of rows with [subscriber] = 1

    (

    SELECT

    DATEADD(mm, DATEDIFF(mm, 0, CAST(DateCreated AS DATETIME)), 0) AS CREATED,

    COUNT(*) AS CNT

    FROM @TestTable TT

    WHERE Subscriber = 1

    GROUP BY DATEADD(mm, DATEDIFF(mm, 0, CAST(DateCreated AS DATETIME)), 0)

    )

    INSERT INTO #test2 (datevalue, CNT, rn_total) -- fill intermdiate table

    SELECT datevalue, ISNULL(CNT,0),0

    FROM CTE_calendar

    LEFT OUTER JOIN CTE_monthlyData ON datevalue = CREATED

    ORDER BY datevalue

    DECLARE @run INT-- used for adding the running total

    SET @run = 0

    UPDATE #test2-- calculate the running total

    SET @run = rn_total = CNT + @run

    FROM #test2 WITH (INDEX(PK_#test2_ID),TABLOCKX)

    -- print requested result

    SELECT

    YEAR(datevalue) AS Year,

    DATENAME(MONTH, datevalue) AS Month,

    rn_total AS COUNT

    FROM #test2

    -- cleanup

    DROP TABLE #test2

    /*result set:

    YearMonthCOUNT

    2009January1

    2009February3

    2009March3

    2009April4

    2009May6

    2009June6

    2009July6

    2009August6

    2009September6

    2009October6

    2009November7

    2009December8

    */



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • So, let me see if I understand this right.

    What you're essentially doing is, getting a total of the subscribers on a monthly basis, and then, using the results stored into a temp table, you're getting a running total of the sum of the monthly data, and updating the temp table "sum" field with that running total.

    Creative. Very creative. I like it, thanks!

  • kramaswamy (8/17/2009)


    So, let me see if I understand this right.

    What you're essentially doing is, getting a total of the subscribers on a monthly basis, and then, using the results stored into a temp table, you're getting a running total of the sum of the monthly data, and updating the temp table "sum" field with that running total.

    Creative. Very creative. I like it, thanks!

    Youre understanding is 100% correct. Glad you like the solution! 🙂

    But I have to admit that the only creative part from my side was to apply a solution I learned here at SCC (based on Jeff's article I mentioned in my previous post). So, if there'd be any creativity award for this solution, it belongs to Jeff. 😉



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • A few more things to add:

    #1) I forgot to thank you for the way you've provided sample data and that you took the time to dive into the solution I proposed in order to understand what's going on! Good job!

    #2) You've mentioned that you're "potentially dealing with a non-indexed field". I assume you're talking about the subscriber column. What's the reason for not having it as included column (preferred on an index having the DateCreated column as first index column)?

    If the subscriber column can either be 0 or 1 then you could use SUM(subscriber) instead of COUNT(*) and the WHERE clause... - assuming you can change the data type from BIT to say TINYINT (mabe with a constraint to only allow 0,1 and NULL)

    #3) The CAST part of my code might not be required, when I look at the code "from a distance"...

    #4) If you're having a calendar table with separate month column then you wouldn't even need the date conversion function. This should also help performance...

    Here's an alternative solution considering #3 and #4 (assuming same sample data as before):

    SET NOCOUNT ON

    -- Create intermedite table to hold the results and to perform the running total on

    CREATE TABLE #test2 (ID INT IDENTITY(1,1), MonthValue DATETIME, cnt INT, rn_total INT,

    CONSTRAINT PK_#test2_ID PRIMARY KEY NONCLUSTERED (ID)

    )

    ; WITH

    CTE_calendar AS -- build calendar table, assuming it doesn't exist

    (

    SELECT

    CONVERT(DateTime, '20090101', 112) + RowNum - 1 AS datevalue,

    DATEADD(mm, DATEDIFF(mm, 0, CONVERT(DateTime, '20090101', 112) + RowNum - 1), 0) AS MonthValue

    FROM

    (

    SELECT row_number() OVER (ORDER BY object_id) AS RowNum FROM sys.all_objects

    ) RowTable

    WHERE RowNum <= DATEDIFF(dd,'20090101','20091231') + 1

    )

    INSERT INTO #test2 (MonthValue, CNT, rn_total) -- fill intermdiate table

    SELECT MonthValue, SUM(ISNULL(CAST(Subscriber AS INT),0)),0-- CAST only required if data type change from BIT to integer type is not an option.

    FROM CTE_calendar

    LEFT OUTER JOIN @TestTable TestTable ON datevalue = DateCreated

    GROUP BY MonthValue

    ORDER BY MonthValue

    DECLARE @run INT-- used for adding the running total

    SET @run = 0

    UPDATE #test2-- calculate the running total

    SET @run = rn_total = CNT + @run

    FROM #test2 WITH (INDEX(PK_#test2_ID),TABLOCKX)

    -- print requested result

    SELECT

    YEAR(MonthValue) AS Year,

    DATENAME(MONTH, MonthValue) AS Month,

    rn_total AS COUNT

    FROM #test2

    -- cleanup

    DROP TABLE #test2



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Yeah, having tried to help some other people out here with incomplete data to work with, I know how annoying that can be, so I do what I can 😛

    The reason why I mentioned potentially non-indexed columns is because I didn't give a complete example case, the actual situation involves some more intricate querying with joins and stuff, but which wasn't really applicable to the problem at hand.

    I've got the date part handled by creating a year and month temp table based on the values which we need to use for the query, so I'm pretty much set now. Thanks alot for the help~

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply