Calculate totals in each category by day

  • I need to calculates for each of a category by day, and I need to carry over those totals where a day isn't present. Just to clarify this is not a running total - This is "how many people were in __ status by date?"

    Here is some sample data:

    CREATE TABLE #StCg (StDt DATE NOT NULL, StID VARCHAR(25) NOT NULL, NwStStg VARCHAR(75) NOT NULL)

    INSERT INTO #StCg(StDt, StID, NwStStg) VALUES

    ('2013-08-01', 'a', 'New'),

    ('2013-08-03', 'a', 'Call'),

    ('2013-08-07', 'a', 'Start'),

    ('2013-08-02', 'b', 'New'),

    ('2013-08-07', 'b', 'Call'),

    ('2013-08-04', 'c', 'New'),

    ('2013-08-06', 'c', 'Canc')

    SELECT *

    FROM #StCg

    Here is the output with an extra column to explain things further. I put the output on an Excel sheet to make it easier to see.

  • I'm sorry, but I need to be quick; I'm already past my bedtime.

    You need a table of numbers - or in this case a table of dates to fill up the gaps. Once you have it, you can left-join from the table.

    Here is an introduction to the topic:

    http://www.sommarskog.se/arrays-in-sql-2005.html#numbersasconcept. You should only read down to the next header.

    [font="Times New Roman"]Erland Sommarskog, SQL Server MVP, www.sommarskog.se[/font]

  • Thanks for the reply! The article was helpful. However, I need to do more than left join to a row of sequential dates. I need to carry over the total number of IDs for a given status for each day.

    I wonder if the best way to handle this might be to create another column that is the end date for the status? For example, in my sample data above, "a" was in "Call" on 8/3, 8/4, 8/5, and 8/6 but that status ended on 8/7 when "a" went to "Start". I'm not sure how to do this or if a better way exists. But if this could be done I could somehow pass go through each date counting the NwStStg for those with a StChg date >= the date and this new column which is the end of status date <= (or null) for the given date.

    Again, I'm not sure if this is a good idea. And if it is - I'm not sure the best way to pull it off 🙂

  • Interesting and challenging problem! Thanks for a great start to the morning.

    My solution, for which I will not make any claims of trying to optimize it for performance, is:

    WITH SampleData (StDt, StID, NwStStg) AS (

    SELECT StDt, StID, NwStStg

    FROM (VALUES

    ('2013-08-01', 'a', 'New'),

    ('2013-08-03', 'a', 'Call'),

    ('2013-08-07', 'a', 'Start'),

    ('2013-08-02', 'b', 'New'),

    ('2013-08-07', 'b', 'Call'),

    ('2013-08-04', 'c', 'New'),

    ('2013-08-06', 'c', 'Canc'),

    ('2013-08-08', 'a', 'Canc'),

    ('2013-08-08', 'b', 'Canc')

    ) a(StDt, StID, NwStStg))

    ,Calendar (d) AS (

    SELECT TOP (

    (

    SELECT 1+DATEDIFF(day, MIN(StDt), MAX(StDt))

    FROM SampleData

    ))

    DATEADD(day,

    -1+ROW_NUMBER() OVER (ORDER BY (SELECT NULL)),

    (SELECT MIN(StDT) FROM SampleData))

    FROM sys.all_columns)

    SELECT StDt, NwStStg, InStatus

    FROM (

    SELECT StDt

    ,[New]=COUNT(CASE NwStStg WHEN 'New' THEN 1 END)

    ,[Start]=COUNT(CASE NwStStg WHEN 'Start' THEN 1 END)

    ,[Call]=COUNT(CASE NwStStg WHEN 'Call' THEN 1 END)

    ,[Canc]=COUNT(CASE NwStStg WHEN 'Canc' THEN 1 END)

    FROM (

    SELECT StDt=d, StID

    ,NwStStg=(

    SELECT TOP 1 NwStStg

    FROM SampleData c

    WHERE c.StID = b.StID AND c.StDt <= d

    ORDER BY c.StDt DESC

    )

    FROM Calendar a

    CROSS APPLY (SELECT DISTINCT StID FROM SampleData) b

    ) a

    GROUP BY StDt

    ) a

    CROSS APPLY (

    VALUES ('New', [New]),('Start', [Start]),('Call', [Call])) b (NwStStg, InStatus)

    ORDER BY StDt, CASE NwStStg WHEN 'New' THEN 1 WHEN 'Call' THEN 2 ELSE 3 END

    Explanation:

    - SampleData CTE is just your sample data put into a CTE rather than a temporary table.

    - Calendar CTE is an in line calendar table that limits the returned results to dates within the range of your data. This is the equivalent to what Erland suggested for a tallytable. Rows are exploded using CROSS APPLY at this point so there's one for each StID.

    - The innermost query, uses a correlated sub-query to pick off the latest status of each StID at or prior to a given date.

    - The middle-level query constructs a cross tab of those statuses as columns (at that point, CANC is not really needed but I left it in), counting the occurrences by StID within status.

    - The outermost query groups and UNPIVOTs using the CROSS APPLY VALUES approach described in the article in my signature links.

    Please let me know how it works for you. 🙂


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • I was hoping you might show up 🙂 I'll give it a test run first thing! Thanks(!), and I'll write back soon.

  • DataAnalyst011 (9/11/2013)


    I was hoping you might show up 🙂 I'll give it a test run first thing! Thanks(!), and I'll write back soon.

    Heh, heh. Flattery will get you everywhere.

    Note that I added a case to the SampleData CTE in my prior post with something I thought might trip up my query (it didn't) or might also add a bit of challenge to anyone else that comes along and wants to give this a try.

    I hope, based on Erland's description I didn't over complicate the thing.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • It was bugging me that I did a PIVOT using a crosstab and then did an UNPIVOT. That just somehow seemed unnecessary. Sure enough, that led to a simplification as follows:

    WITH SampleData (StDt, StID, NwStStg) AS (

    SELECT StDt, StID, NwStStg

    FROM (VALUES

    ('2013-08-01', 'a', 'New'),

    ('2013-08-03', 'a', 'Call'),

    ('2013-08-07', 'a', 'Start'),

    ('2013-08-02', 'b', 'New'),

    ('2013-08-07', 'b', 'Call'),

    ('2013-08-04', 'c', 'New'),

    ('2013-08-06', 'c', 'Canc'),

    ('2013-08-08', 'a', 'Canc'),

    ('2013-08-08', 'b', 'Canc')

    ) a(StDt, StID, NwStStg))

    ,Calendar (d) AS (

    SELECT TOP (

    (

    SELECT 1+DATEDIFF(day, MIN(StDt), MAX(StDt))

    FROM SampleData

    ))

    DATEADD(day,

    -1+ROW_NUMBER() OVER (ORDER BY (SELECT NULL)),

    (SELECT MIN(StDT) FROM SampleData))

    FROM sys.all_columns)

    SELECT StDt, NwStStg, InStatus=ISNULL(SUM(InStatus), 0)

    FROM (

    SELECT StDt=d, StID, c.NwStStg

    ,InStatus=

    CASE

    WHEN c.NwStStg=

    (

    SELECT TOP 1 NwStStg

    FROM SampleData d

    WHERE d.StID = b.StID AND d.StDt <= d

    ORDER BY d.StDt DESC

    ) THEN 1 END

    FROM Calendar a

    CROSS JOIN (SELECT DISTINCT StID FROM SampleData) b

    CROSS JOIN (SELECT 'New' UNION ALL SELECT 'Start' UNION ALL SELECT 'Call') c (NwStStg)

    ) a

    GROUP BY StDt, NwStStg

    ORDER BY StDt, CASE NwStStg WHEN 'New' THEN 1 WHEN 'Call' THEN 2 ELSE 3 END;


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Hi Dwain, these work just as needed! Thank you so much for the help (and for the explanation). And re:flattery - You've bailed me out two or three times. I'm just giving credit 🙂

  • Thanks for letting me know.

    I assume the second one, being simpler, should be faster. If you can confirm that, it would be interesting too.

    Have a good day.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply