How to get monthly YTD data

  • To Anjali regarding an Oracle version:

    Going back at least to 9i, Oracle has Analytic functions that are designed for just this sort of thing. Here's how you could do it:

    create table MyUser

    (

    UserId int not null primary key,

    UserName varchar2(100),

    UserCreationDate date

    );

    insert into MyUser values (1, 'Adam', to_date('2011-09-30', 'YYYY-MM-DD'));

    insert into MyUser values (2, 'Yvette', to_date('2011-10-01', 'YYYY-MM-DD'));

    insert into MyUser values (3, 'Joe', to_date('2011-11-17', 'YYYY-MM-DD'));

    insert into MyUser values (4, 'Sue', to_date('2011-11-18', 'YYYY-MM-DD'));

    insert into MyUser values (5, 'Anuj', to_date('2011-11-24', 'YYYY-MM-DD'));

    insert into MyUser values (6, 'Nick', to_date('2011-12-25', 'YYYY-MM-DD'));

    insert into MyUser values (7, 'Henrika', to_date('2012-01-15', 'YYYY-MM-DD'));

    select sum(count(*)) over (partition by extract(year from UserCreationDate)

    order by extract(year from UserCreationDate), extract(month from UserCreationDate)

    range unbounded preceding )

    as UsersYTD,

    to_char(UserCreationDate, 'MON-YYYY') period,

    extract(month from UserCreationDate) monthval

    from MyUser

    group by to_char(UserCreationDate, 'MON-YYYY'),

    extract(month from UserCreationDate),

    extract(year from UserCreationDate)

    order by extract(year from UserCreationDate), extract(month from UserCreationDate);

    If you only need part of the year (as in the final example in the article), make the query above a nested query and put the "where" clause on the outside:

    select * from (

    {select query above}

    ) x

    where x.monthval >= 11

    order by yearval, monthval;

    Good luck!

    Mike

  • TheSQLGuru (1/24/2012)


    1) as others have stated that triangular join (xx <= yy) is gonna kill you performance wise as number of rows increases.

    2) cursors are still the fastest supported way to do running totals in sql server. Jeff Moden and others have come up with the "quirky update" to do these as well, although not officially supported by microsoft.

    1) Has anybody posted a better method when you're doing trending analysis? say I have a start date, and and end date on a dimension table in a DW and want to pick up each of the calendar dates between them to show that a customer was licensing 500 GB of disk from 1/1/2012 until 1/15/2012, and then went up to 750 GB on 1/15/2012? I briefly considered storing the values for each day, but the storage and performance get out of hand that way too.

    2) The "quirky update" is in my local copy of books online for SQL Server 2008R2.

    Syntax

    [ WITH <common_table_expression> [...n] ]

    UPDATE

    [ TOP ( expression ) [ PERCENT ] ]

    { <object> | rowset_function_limited

    [ WITH ( <Table_Hint_Limited> [ ...n ] ) ]

    }

    SET

    { column_name = { expression | DEFAULT | NULL }

    | { udt_column_name.{ { property_name = expression

    | field_name = expression }

    | method_name ( argument [ ,...n ] )

    }

    }

    | column_name { .WRITE ( expression , @Offset , @Length ) }

    | @variable = expression

    | @variable = column = expression

    | column_name { += | -= | *= | /= | %= | &= | ^= | |= } expression

    | @variable { += | -= | *= | /= | %= | &= | ^= | |= } expression

    | @variable = column { += | -= | *= | /= | %= | &= | ^= | |= } expression

    } [ ,...n ]

    [ <OUTPUT Clause> ]

    Heh it's in books online now too.

    http://technet.microsoft.com/en-us/library/ms177523.aspx



    --Mark Tassin
    MCITP - SQL Server DBA
    Proud member of the Anti-RBAR alliance.
    For help with Performance click this link[/url]
    For tips on how to post your problems[/url]

  • I was surprised this didn't cover ROLLUP, CUBE, or GROUPING SETS myself.



    --Mark Tassin
    MCITP - SQL Server DBA
    Proud member of the Anti-RBAR alliance.
    For help with Performance click this link[/url]
    For tips on how to post your problems[/url]

  • I find a calendar table with pre-defined columns for month, fiscal month, etc., and other columns like YYYY-MM is invaluable for aggregating and sorting data like this.

    Joining the source table to the calendar table lets you take advantage of those pre-defined columns.

  • This is a great way to get a running total. I would caution people you are trying to use this to get sales for your company. Everyone I have ever worked for would not want this approach because it would hide the true monthly sales, ie with this approach you would have to take the difference between the previous month sales and the current month sales to get the actual month sales. If you graphed the sales based off of that query the graph would always look the same, steadily increasing. It is hard to make business decisions off of a report like that. Hope that made sense. :hehe:

  • Christian Bahnsen (1/24/2012)


    I find a calendar table with pre-defined columns for month, fiscal month, etc., and other columns like YYYY-MM is invaluable for aggregating and sorting data like this.

    Joining the source table to the calendar table lets you take advantage of those pre-defined columns.

    Right, that's what I do... my source table has a begin and and end date (it's a type 2 scd) and I join it to the calendar table where effective_date <= calendar_date and inactive_date > calendar_date

    but it does create a triangular join. Other than creating millions of fact records into a table each month, I don't know of a way to avoid the triangle when dealing with trends.



    --Mark Tassin
    MCITP - SQL Server DBA
    Proud member of the Anti-RBAR alliance.
    For help with Performance click this link[/url]
    For tips on how to post your problems[/url]

  • Also, using a date table allows you to show months that have no corresponding values. In the example, if there were no users for a particular month, that month would not be on the report.

  • Christian Bahnsen (1/24/2012)


    I find a calendar table with pre-defined columns for month, fiscal month, etc., and other columns like YYYY-MM is invaluable for aggregating and sorting data like this.

    Joining the source table to the calendar table lets you take advantage of those pre-defined columns.

    I do the exact same thing, it has saved me a lot of time just joining to a table like this.

  • The other simple way is:

    SELECT RIGHT(CONVERT(VARCHAR(11), GETDATE(), 106), 8)

    This will give you the same format.

  • mtassin (1/24/2012)


    TheSQLGuru (1/24/2012)


    1) as others have stated that triangular join (xx <= yy) is gonna kill you performance wise as number of rows increases.

    2) cursors are still the fastest supported way to do running totals in sql server. Jeff Moden and others have come up with the "quirky update" to do these as well, although not officially supported by microsoft.

    1) Has anybody posted a better method when you're doing trending analysis? say I have a start date, and and end date on a dimension table in a DW and want to pick up each of the calendar dates between them to show that a customer was licensing 500 GB of disk from 1/1/2012 until 1/15/2012, and then went up to 750 GB on 1/15/2012? I briefly considered storing the values for each day, but the storage and performance get out of hand that way too.

    2) The "quirky update" is in my local copy of books online for SQL Server 2008R2.

    Syntax

    [ WITH <common_table_expression> [...n] ]

    UPDATE

    [ TOP ( expression ) [ PERCENT ] ]

    { <object> | rowset_function_limited

    [ WITH ( <Table_Hint_Limited> [ ...n ] ) ]

    }

    SET

    { column_name = { expression | DEFAULT | NULL }

    | { udt_column_name.{ { property_name = expression

    | field_name = expression }

    | method_name ( argument [ ,...n ] )

    }

    }

    | column_name { .WRITE ( expression , @Offset , @Length ) }

    | @variable = expression

    | @variable = column = expression

    | column_name { += | -= | *= | /= | %= | &= | ^= | |= } expression

    | @variable { += | -= | *= | /= | %= | &= | ^= | |= } expression

    | @variable = column { += | -= | *= | /= | %= | &= | ^= | |= } expression

    } [ ,...n ]

    [ <OUTPUT Clause> ]

    Heh it's in books online now too.

    http://technet.microsoft.com/en-us/library/ms177523.aspx

    Incorrect. The SYNTAX of the "quirky update" is allowed, and I never said it wasn't. It is the REQUIREMENT that data processing take place in a particular order that isn't supported. There are all kinds of requirements to "ensure" this (even some 'error checking' now to see if it did NOT happen in the required order) but that is NOT the same thing. I note that over the years several "unknown-at-the-time" oopsies have been found that could cause invalid data, which caused proponents to find new rules to guard against those issues. There is no reason to believe that some other "oh, didn't think of that" issue won't arise in the future, or that Microsoft won't simply change the way the engine processes data to void the current rule set for the "quirky update".

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • 🙂

  • ceasor (1/24/2012)


    Nice try (I know how time-costly to create an article).

    I dislike your approach. OLAP service (SSAS) is designed to solve such task. Use calculated Dimension or Time intelligence for that.

    It s better to create some kind of 'month key' (201101,201102,201103,etc.) and group by integer instead of strings. So many functions in where clause - indexes will not be used, try to filter data by datetime range first. Why Cross join? - isnt none-equivalent join enough? CTE? - are you sure it is performed only once?

    It s better to give all transformations to SSIS

    And SSRS 2008 has Running Sum built-in function if you need it.

    It s not good habit to solve everything by T-SQL. Sql Server is powerful tool

    1) the vast majority of SQL Server shops have ZERO experience or skills with SSAS. Your first recommendation is thus useless for most entities.

    2) outside of true dataware housing constructs integer date keys are not used nor necessarily appropriate, so this point is also not helpful.

    3) Indexes in where clause do not necessarily prevent use of indexes. They MAY void the use of index SEEKS however, but that isn't the same thing.

    4) SSIS - really?? Again most shops don't have experience with that, and secondly why would you use that to simply view output from a query or feed that output to a web page or other app?

    5) SSRS: yet again you recommend a product that a) isn't used by most shops and b) doesn't allow for simple viewing of query output (without all the overhead of setting it up, running it,etc). Oh, and have you ever tried to shove 5M rows of data out of sql server and through SSRS's "running sum built-in function". Please try that and let me know how the performance went.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • for the basic example, why not just do something like

    SELECT usercount, Period, monthval

    FROM (

    SELECT MONTH( UserCreationDate ) AS monthval,

    DATENAME( MONTH, UserCreationDate ) + ' ' + DATENAME( YEAR, UserCreationDate ) AS Period

    FROM MyUser

    WHERE YEAR( UserCreationDate ) = YEAR( GETDATE() )

    GROUP BY MONTH( UserCreationDate ),

    DATENAME( MONTH, UserCreationDate ) + ' ' + DATENAME( YEAR, UserCreationDate )

    ) AS Months

    CROSS APPLY(

    SELECT COUNT(*)

    FROM MyUser M

    WHERE MONTH(M.UserCreationDate) <= Months.monthval

    AND YEAR( UserCreationDate ) = YEAR( GETDATE() )

    ) AS YTD( usercount )

    ORDER BY monthval

    instead of using a CTE?

  • I don't know about you, but I try to avoid functions in WHERE clauses like the plague

  • TheSQLGuru (1/24/2012)


    1) the vast majority of SQL Server shops have ZERO experience or skills with SSAS. Your first recommendation is thus useless for most entities.

    I've got 5 years of experience.... but most days I still feel like I have ZERO skills. 🙂 Get AS to do what you want, is generally a PITA.

    4) SSIS - really?? Again most shops don't have experience with that, and secondly why would you use that to simply view output from a query or feed that output to a web page or other app?

    Is this actually true? You're exciting me about my job prospects.... I figured by now, most shops had a need to import/export data from SQL server... SSIS seems to be the best solution for that, and it's relatively easy to use.

    5) SSRS: yet again you recommend a product that a) isn't used by most shops and b) doesn't allow for simple viewing of query output (without all the overhead of setting it up, running it,etc). Oh, and have you ever tried to shove 5M rows of data out of sql server and through SSRS's "running sum built-in function". Please try that and let me know how the performance went.

    And downright sucks.... Try using a Report Model from SSRS to do anything interesting.



    --Mark Tassin
    MCITP - SQL Server DBA
    Proud member of the Anti-RBAR alliance.
    For help with Performance click this link[/url]
    For tips on how to post your problems[/url]

Viewing 15 posts - 16 through 30 (of 42 total)

You must be logged in to reply to this topic. Login to reply