Calendar Table Function

  • We use a date dimension which has any date we'd want to get other info from and join on the date. If I wanted to do this in memory using window functions I'd do something like this:

    declare @start date= '20150101';

    declare @end date = '20151230';

    with cte as

    (

    select @start [date]

    union all

    select dateadd(day, 1, [date])

    from cte

    where [date] <= @end

    )

    select [date]

    , FIRST_VALUE([date]) over(partition by year([date]), month([date]) order by [date]) FDOM

    , FIRST_VALUE([date]) over(partition by year([date]), month([date]) order by [date] desc) LDOM

    from cte

    order by [date]

    OPTION (MAXRECURSION 0);

    That being said, the built-in function EOMONTH(date) will get you the last date in the month too in SQL2012+

    Cheers!

  • dietztm (1/25/2016)


    We use a date dimension which has any date we'd want to get other info from and join on the date. If I wanted to do this in memory using window functions I'd do something like this:

    declare @start date= '20150101';

    declare @end date = '20151230';

    with cte as

    (

    select @start [date]

    union all

    select dateadd(day, 1, [date])

    from cte

    where [date] <= @end

    )

    select [date]

    , FIRST_VALUE([date]) over(partition by year([date]), month([date]) order by [date]) FDOM

    , FIRST_VALUE([date]) over(partition by year([date]), month([date]) order by [date] desc) LDOM

    from cte

    order by [date]

    OPTION (MAXRECURSION 0);

    That being said, the built-in function EOMONTH(date) will get you the last date in the month too in SQL2012+

    Cheers!

    That would be as slow (or slower) than using a WHILE loop because of the recursive CTE (rCTE) that "counts up". Please see the following article for why not to use an rCTE for such things as well as 3 very high performance alternatives.

    [font="Arial Black"]Hidden RBAR: Counting with Recursive CTE's[/font][/url]

    And, just in case, please don't justify it by saying it's for a small amount of rows or will be used infrequently. I have a slow database full of such justifications and all of it is going to need to be fixed to get performance back to where it belongs. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Hello guys.

    Another awesome solution for calendar table from Sean Smith (great thanks Sean, we used it intensively): https://github.com/ktaranov/sqlserver-kit/blob/master/Scripts/Date%20Calendar%20Cross-Reference%20Table.sql

  • Hi Jeff thanks for your response.

    That recursive CTE I posted was only to generate some test data to demonstrate window functions for the LDOM and FDOM. I chose that method as a short, easy to understand piece of code.

    In our production user-facing data warehouse extracts, we generally don't use recursive CTEs or procedural code, but its a judgement call, not a blanket prohibition. There is a balancing act between agility and performance. We consider both but tend to optimize for agility since staff time is more expensive than hardware in our shop.

    Cheers!

  • dietztm (1/26/2016)


    Hi Jeff thanks for your response.

    That recursive CTE I posted was only to generate some test data to demonstrate window functions for the LDOM and FDOM. I chose that method as a short, easy to understand piece of code.

    In our production user-facing data warehouse extracts, we generally don't use recursive CTEs or procedural code, but its a judgement call, not a blanket prohibition. There is a balancing act between agility and performance. We consider both but tend to optimize for agility since staff time is more expensive than hardware in our shop.

    Cheers!

    Appreciate the feedback. Thanks. Just so you know, though, I currently have a database where supposed "agility" has crushed performance and there's no single problem. It's all a problem because everyone kept saying that staff time is so much more expensive than hardware. When it was moved to the new hardware, it didn't help for most of it.

    Also, performance is truly in the code. You can't buy hardware that runs 60 to 1000 times faster but you can write code that does. With the exception of SSDs (which still aren't a panacea of performance) you'd be lucky if you could buy hardware that actually ran twice as fast(MPPs are good for something less than 30X and that's after you fix the code to run on such a box).

    The real problem is when such "agile" shortcuts are taken all the time and most of the code is performance challenged. There's no simple fix there. You have to address all the code. It's much better to have the staff learn how to do it right the first time all the time. You'll also find that writing it right usually takes less time to write and certainly less time than you will fixing problems.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff - Sorry to hear you're having performance issues with your database.

    Our problems are the opposite of yours. We have a lot of pressure for quick turnaround and a volatile data model, hence the need for agility (proper agility, not the lack of standards that some people conflate with "agility") - but no performance issues. A lot of problems can be prevented or alleviated by using the appropriate architecture.

    YMMV.

  • dietztm (1/27/2016)


    Jeff - Sorry to hear you're having performance issues with your database.

    Our problems are the opposite of yours. We have a lot of pressure for quick turnaround and a volatile data model, hence the need for agility (proper agility, not the lack of standards that some people conflate with "agility") - but no performance issues. A lot of problems can be prevented or alleviated by using the appropriate architecture.

    YMMV.

    That's precisely correct. A good architecture is a wonderful thing to have as well as proper standards. I've implemented the standards for all new and reworked code. The problem is that dozens of people had their way with this system for years before I got here and a couple of the good ones that we asked to stay said that no one had to answer to anyone during development. It really shows and it's hurting both our "agility" and our performance.

    Glad you have all that sussed from the git but, if you're using rCTEs that count, then that's a least one bit of code that has a performance problem that you're not aware of. It's kind of like when I got here when 15 second screen returns were explained away as "they're getting a lot of data". We've come a long way from then but it's been tough because of all the little things that have been collectively been killing things when you add them all up.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Nice script, thanks.

  • Nice work, well done sir. A few things to note:

    First a couple of ticky-tacky things that aren't a huge deal but worth mentioning...

    You may want to consider using a derived CTE tally table instead of the sys.all_objects for a few reasons. First, you add SCHEMABINDING or make the function deterministic. Also, according to this Paul White article, system tables will make it tougher to get a parallel query plan. Also, you are generating reads using that table (1 for every 10 rows).

    Second, you may want to include a TOP statement in the tally table (tt) subquery as it will help your cardinality estimates in the execution plan.

    Something like this (holiday table excluded since I don't have one set up) would not have any of these shortcomings:

    CREATE FUNCTION dbo.CALENDAR2

    (

    @StartDate DATETIME

    , @EndDate DATETIME

    )

    RETURNS TABLE

    AS

    RETURN

    WITH

    E1(N) AS (SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1),

    iTally(RID) AS

    (

    SELECT TOP (DATEDIFF(DAY,@StartDate,@EndDate)+1) ROW_NUMBER() OVER (ORDER BY (SELECT 1))

    FROM E1 a, E1 b, E1 c, E1 d

    )

    SELECT tt.RID

    , DATEADD(DAY,tt.RID-1,@StartDate) AS [Date]

    , DATEPART(quarter,DATEADD(DAY,tt.RID-1,@StartDate)) AS [Quarter]

    , DATEPART(dayofyear,DATEADD(DAY,tt.RID-1,@StartDate)) AS [DayofYear]

    , DATEPART(WEEK,DATEADD(DAY,tt.RID-1,@StartDate)) AS [WeekofYear]

    , DATEPART(YEAR,DATEADD(DAY,tt.RID-1,@StartDate)) AS [Year]

    , DATEPART(MONTH,DATEADD(DAY,tt.RID-1,@StartDate)) AS [Month]

    , DATEPART(DAY,DATEADD(DAY,tt.RID-1,@StartDate)) AS [Day]

    , DATEPART(weekday,DATEADD(DAY,tt.RID-1,@StartDate)) AS [Weekday]

    , DATENAME(MONTH,DATEADD(DAY,tt.RID-1,@StartDate)) AS [MonthName]

    , DATENAME(weekday,DATEADD(DAY,tt.RID-1,@StartDate)) AS [WeekdayName]

    , (RIGHT(

    REPLICATE('0',(4)) +

    CONVERT([VARCHAR],DATEPART(YEAR,DATEADD(DAY,tt.RID-1,@StartDate)),0)

    ,(4)

    )+

    RIGHT(

    REPLICATE('0',(2)) +

    CONVERT([VARCHAR],DATEPART(MONTH,DATEADD(DAY,tt.RID-1,@StartDate)),0)

    ,(2)

    )

    ) AS [Vintage]

    FROM iTally tt

    WHERE DATEADD(DAY,tt.RID-1,@StartDate) <= @EndDate;

    GO

    Now, regarding the examples. Getting the the last day of the month can be done by simply getting the first day of the next month and subtracting a day. Note these examples and run with "show actual execution plan turned on":

    -- Get the last day of the month

    SET STATISTICS IO ON;

    PRINT 'Old way:';

    SELECT DISTINCT MAX([Date]) OVER (PARTITION BY [Year],[MonthName]) AS [Date]

    FROM dbo.itvf_CALENDAR('01/01/2014','12/31/2014')

    ORDER BY [Date];

    -- 3X tables spools and 2 sorts

    PRINT CHAR(10)+'better way:';

    SELECT [Date] = DATEADD(DAY,-1,[Date])

    FROM dbo.itvf_CALENDAR('02/01/2014','1/1/2015')

    WHERE Day = 1

    -- more stress on the FILTER Operator

    SET STATISTICS IO OFF;

    You'll see that the "better way" produces a better plan, with 1/4th the scan counts and 35 reads instead of 814 reads for 12 rows (no reads or scans if you lose the system table).

    Lastly, ditto the last Thursday of the month:

    SET STATISTICS IO ON;

    PRINT 'Old way:';

    SELECT DISTINCT MAX([Date]) OVER (PARTITION BY [Year],[MonthName],[WeekdayName]) AS [Date]

    FROM dbo.itvf_CALENDAR('01/01/2014','12/31/2014')

    WHERE [WeekdayName] = 'Thursday'

    ORDER BY [Date]

    PRINT CHAR(10)+'better way:';

    SELECT MAX([Date])

    FROM dbo.itvf_CALENDAR('01/01/2014','12/31/2014')

    WHERE [Weekday] = 5

    GROUP BY [Month]

    SET STATISTICS IO OFF;

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Alan.B (3/29/2016)

    Lastly, ditto the last Thursday of the month:

    SET STATISTICS IO ON;

    PRINT 'Old way:';

    SELECT DISTINCT MAX([Date]) OVER (PARTITION BY [Year],[MonthName],[WeekdayName]) AS [Date]

    FROM dbo.itvf_CALENDAR('01/01/2014','12/31/2014')

    WHERE [WeekdayName] = 'Thursday'

    ORDER BY [Date]

    PRINT CHAR(10)+'better way:';

    SELECT MAX([Date])

    FROM dbo.itvf_CALENDAR('01/01/2014','12/31/2014')

    WHERE [Weekday] = 5

    GROUP BY [Month]

    SET STATISTICS IO OFF;

    One certainly doesn't need any calendar table at all just to get the last given day of a month(s). Just a standard tally table, to enumerate the months, and a simple mathematical calc to get the day:

    DECLARE @start_date date

    DECLARE @end_date date

    SET @start_date = '20140101'

    SET @end_date = '20141231'

    SET STATISTICS IO ON;

    SELECT last_day_of_month, DATEADD(DAY, -DATEDIFF(DAY, Thursday, last_day_of_month) % 7, last_day_of_month) AS last_thurs_of_month

    FROM dbo.tally t

    CROSS APPLY (

    SELECT DATEADD(DAY, -1, DATEADD(MONTH, t.N + 1, @start_date)) AS last_day_of_month,

    3 AS Thursday --0 for Mon, 1 for Tue, etc..

    ) AS assign_alias_names

    WHERE t.N BETWEEN 0 AND DATEDIFF(MONTH, @start_date, @end_date)

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who give you the shirt off *someone else's* back.

  • ScottPletcher (3/30/2016)


    Alan.B (3/29/2016)

    Lastly, ditto the last Thursday of the month:

    SET STATISTICS IO ON;

    PRINT 'Old way:';

    SELECT DISTINCT MAX([Date]) OVER (PARTITION BY [Year],[MonthName],[WeekdayName]) AS [Date]

    FROM dbo.itvf_CALENDAR('01/01/2014','12/31/2014')

    WHERE [WeekdayName] = 'Thursday'

    ORDER BY [Date]

    PRINT CHAR(10)+'better way:';

    SELECT MAX([Date])

    FROM dbo.itvf_CALENDAR('01/01/2014','12/31/2014')

    WHERE [Weekday] = 5

    GROUP BY [Month]

    SET STATISTICS IO OFF;

    One certainly doesn't need any calendar table at all just to get the last given day of a month(s). Just a standard tally table, to enumerate the months, and a simple mathematical calc to get the day:

    His function is using a tally table so you and I are basically doing the same thing more or less. I was demonstrating a better way to get the last Thursday of the month than the example he included in his article (which uses the Calendar iTVF). His method uses DISTINCT MAX([Date]) OVER (PARTITION BY... whereas mine gets the job done with a simple GROUP BY and therefore no worktable, a fraction of the reads, etc...

    On a separate note I used to use a tally table for these kinds of date calculations but have found that a nicely indexed calendar table (or dim_date table) outperforms a tally table.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Alan.B (3/30/2016)


    ScottPletcher (3/30/2016)


    Alan.B (3/29/2016)

    Lastly, ditto the last Thursday of the month:

    SET STATISTICS IO ON;

    PRINT 'Old way:';

    SELECT DISTINCT MAX([Date]) OVER (PARTITION BY [Year],[MonthName],[WeekdayName]) AS [Date]

    FROM dbo.itvf_CALENDAR('01/01/2014','12/31/2014')

    WHERE [WeekdayName] = 'Thursday'

    ORDER BY [Date]

    PRINT CHAR(10)+'better way:';

    SELECT MAX([Date])

    FROM dbo.itvf_CALENDAR('01/01/2014','12/31/2014')

    WHERE [Weekday] = 5

    GROUP BY [Month]

    SET STATISTICS IO OFF;

    One certainly doesn't need any calendar table at all just to get the last given day of a month(s). Just a standard tally table, to enumerate the months, and a simple mathematical calc to get the day:

    His function is using a tally table so you and I are basically doing the same thing more or less. I was demonstrating a better way to get the last Thursday of the month than the example he included in his article (which uses the Calendar iTVF). His method uses DISTINCT MAX([Date]) OVER (PARTITION BY... whereas mine gets the job done with a simple GROUP BY and therefore no worktable, a fraction of the reads, etc...

    On a separate note I used to use a tally table for these kinds of date calculations but have found that a nicely indexed calendar table (or dim_date table) outperforms a tally table.

    Huh? How can I/O outperform NO I/O?? It's just arithmetic. The tally is used only for the number of months, NOT in the date/day of week calc at all.

    Edit: I am using a physical tally table only because my filter at work does not allow "UNION ALL" in code (it assumes it's "sql injection" and rejects it(!)). A tally cte, esp. for a tiny number such as the number of months, would also be zero I/O.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who give you the shirt off *someone else's* back.

  • ScottPletcher (3/30/2016)


    Alan.B (3/30/2016)


    ScottPletcher (3/30/2016)


    Alan.B (3/29/2016)

    Lastly, ditto the last Thursday of the month:

    SET STATISTICS IO ON;

    PRINT 'Old way:';

    SELECT DISTINCT MAX([Date]) OVER (PARTITION BY [Year],[MonthName],[WeekdayName]) AS [Date]

    FROM dbo.itvf_CALENDAR('01/01/2014','12/31/2014')

    WHERE [WeekdayName] = 'Thursday'

    ORDER BY [Date]

    PRINT CHAR(10)+'better way:';

    SELECT MAX([Date])

    FROM dbo.itvf_CALENDAR('01/01/2014','12/31/2014')

    WHERE [Weekday] = 5

    GROUP BY [Month]

    SET STATISTICS IO OFF;

    One certainly doesn't need any calendar table at all just to get the last given day of a month(s). Just a standard tally table, to enumerate the months, and a simple mathematical calc to get the day:

    His function is using a tally table so you and I are basically doing the same thing more or less. I was demonstrating a better way to get the last Thursday of the month than the example he included in his article (which uses the Calendar iTVF). His method uses DISTINCT MAX([Date]) OVER (PARTITION BY... whereas mine gets the job done with a simple GROUP BY and therefore no worktable, a fraction of the reads, etc...

    On a separate note I used to use a tally table for these kinds of date calculations but have found that a nicely indexed calendar table (or dim_date table) outperforms a tally table.

    Huh? How can I/O outperform NO I/O?? It's just arithmetic. The tally is used only for the number of months, NOT in the date/day of week calc at all.

    Edit: I am using a physical tally table only because my filter at work does not allow "UNION ALL" in code (it assumes it's "sql injection" and rejects it(!)). A tally cte, esp. for a tiny number such as the number of months, would also be zero I/O.

    Busy work day... I'll put together a performance test a little later tonight and you can tell if you think I'm missing something.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Alan.B (3/30/2016)


    ScottPletcher (3/30/2016)


    Alan.B (3/30/2016)


    ScottPletcher (3/30/2016)


    Alan.B (3/29/2016)

    Lastly, ditto the last Thursday of the month:

    SET STATISTICS IO ON;

    PRINT 'Old way:';

    SELECT DISTINCT MAX([Date]) OVER (PARTITION BY [Year],[MonthName],[WeekdayName]) AS [Date]

    FROM dbo.itvf_CALENDAR('01/01/2014','12/31/2014')

    WHERE [WeekdayName] = 'Thursday'

    ORDER BY [Date]

    PRINT CHAR(10)+'better way:';

    SELECT MAX([Date])

    FROM dbo.itvf_CALENDAR('01/01/2014','12/31/2014')

    WHERE [Weekday] = 5

    GROUP BY [Month]

    SET STATISTICS IO OFF;

    One certainly doesn't need any calendar table at all just to get the last given day of a month(s). Just a standard tally table, to enumerate the months, and a simple mathematical calc to get the day:

    His function is using a tally table so you and I are basically doing the same thing more or less. I was demonstrating a better way to get the last Thursday of the month than the example he included in his article (which uses the Calendar iTVF). His method uses DISTINCT MAX([Date]) OVER (PARTITION BY... whereas mine gets the job done with a simple GROUP BY and therefore no worktable, a fraction of the reads, etc...

    On a separate note I used to use a tally table for these kinds of date calculations but have found that a nicely indexed calendar table (or dim_date table) outperforms a tally table.

    Huh? How can I/O outperform NO I/O?? It's just arithmetic. The tally is used only for the number of months, NOT in the date/day of week calc at all.

    Edit: I am using a physical tally table only because my filter at work does not allow "UNION ALL" in code (it assumes it's "sql injection" and rejects it(!)). A tally cte, esp. for a tiny number such as the number of months, would also be zero I/O.

    Busy work day... I'll put together a performance test a little later tonight and you can tell if you think I'm missing something.

    Argh! I had a longer reply and accidentally refreshed (CTRL+R is cool in SSMS, terrible in google Chrome). Anyhow, here's a couple tests I did with a bunch of dates. Not realistic to go back to the 1800's but I wanted enough rows to work with. The reads BTW is a red herring too because the calendar table generates virtually 0 reads. I have some better tests somewhere but could not find them so I through this together real quick. Note the superior calendar table execution plan (filter - free, all the action from non-clustered index seeks)

    USE tempdb

    GO

    /****************************************************************************************

    (1) Create simplified calendar table

    ****************************************************************************************/

    IF OBJECT_ID('dbo.calendar') IS NOT NULL DROP TABLE dbo.calendar;

    CREATE TABLE dbo.calendar

    (

    DateTxt date primary key,

    YearNbr smallint NOT NULL,

    MonthNbr tinyint NOT NULL,

    DayOfMonthNbr tinyint NOT NULL,

    DayOfWeekNbr tinyint NOT NULL

    );

    -- On a real calendar table you want this index present with the required INCLUDE columns

    CREATE UNIQUE NONCLUSTERED INDEX uq_xxx

    ON dbo.calendar(YearNbr, MonthNbr, DateTxt);

    CREATE NONCLUSTERED INDEX nc_xxx ON dbo.calendar (DayOfWeekNbr) INCLUDE (DateTxt,YearNbr,MonthNbr);

    CREATE NONCLUSTERED INDEX nc_xxx2 ON dbo.calendar (DayOfMonthNbr) INCLUDE (DateTxt);

    -- Populate the calendar table

    WITH dates(DateTxt) AS

    (

    SELECT TOP (DATEDIFF(DAY,'18000101','22000101'))

    CAST(DATEADD(DAY,ROW_NUMBER() OVER (ORDER BY (SELECT 1))-1,'18000102') AS date)

    FROM sys.all_columns a, sys.all_columns b

    )

    INSERT dbo.calendar

    SELECT DateTxt, DATEPART(YEAR,DateTxt), DATEPART(MONTH,DateTxt), DATEPART(DAY,DateTxt), DATEPART(WEEKDAY,DateTxt)

    FROM dates;

    GO

    /****************************************************************************************

    (2) Create Tally Table function (using Jeff Moden's "fnTally")

    ****************************************************************************************/

    CREATE FUNCTION dbo.fnTally

    /**********************************************************************************************************************

    Purpose:

    Return a column of BIGINTs from @ZeroOrOne up to and including @MaxN with a max value of 1 Trillion.

    As a performance note, it takes about 00:02:10 (hh:mm:ss) to generate 1 Billion numbers to a throw-away variable.

    Usage:

    --===== Syntax example (Returns BIGINT)

    SELECT t.N

    FROM dbo.fnTally(@ZeroOrOne,@MaxN) t

    ;

    Notes:

    1. Based on Itzik Ben-Gan's cascading CTE (cCTE) method for creating a "readless" Tally Table source of BIGINTs.

    Refer to the following URLs for how it works and introduction for how it replaces certain loops.

    http://www.sqlservercentral.com/articles/T-SQL/62867/

    http://sqlmag.com/sql-server/virtual-auxiliary-table-numbers

    2. To start a sequence at 0, @ZeroOrOne must be 0 or NULL. Any other value that's convertable to the BIT data-type

    will cause the sequence to start at 1.

    3. If @ZeroOrOne = 1 and @MaxN = 0, no rows will be returned.

    5. If @MaxN is negative or NULL, a "TOP" error will be returned.

    6. @MaxN must be a positive number from >= the value of @ZeroOrOne up to and including 1 Billion. If a larger

    number is used, the function will silently truncate after 1 Billion. If you actually need a sequence with

    that many values, you should consider using a different tool. Wink

    7. There will be a substantial reduction in performance if "N" is sorted in descending order. If a descending

    sort is required, use code similar to the following. Performance will decrease by about 27% but it's still

    very fast especially compared with just doing a simple descending sort on "N", which is about 20 times slower.

    If @ZeroOrOne is a 0, in this case, remove the "+1" from the code.

    DECLARE @MaxN BIGINT;

    SELECT @MaxN = 1000;

    SELECT DescendingN = @MaxN-N+1

    FROM dbo.fnTally(1,@MaxN);

    8. There is no performance penalty for sorting "N" in ascending order because the output is explicity sorted by

    ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    Revision History:

    Rev 00 - Unknown - Jeff Moden

    - Initial creation with error handling for @MaxN.

    Rev 01 - 09 Feb 2013 - Jeff Moden

    - Modified to start at 0 or 1.

    Rev 02 - 16 May 2013 - Jeff Moden

    - Removed error handling for @MaxN because of exceptional cases.

    Rev 03 - 22 Apr 2015 - Jeff Moden

    - Modify to handle 1 Trillion rows for experimental purposes.

    **********************************************************************************************************************/

    (@ZeroOrOne BIT, @MaxN BIGINT)

    RETURNS TABLE WITH SCHEMABINDING AS

    RETURN WITH

    E1(N) AS (SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1) --10E1 or 10 rows

    , E4(N) AS (SELECT 1 FROM E1 a, E1 b, E1 c, E1 d) --10E4 or 10 Thousand rows

    ,E12(N) AS (SELECT 1 FROM E4 a, E4 b, E4 c) --10E12 or 1 Trillion rows

    SELECT N = 0 WHERE ISNULL(@ZeroOrOne,0)= 0 --Conditionally start at 0.

    UNION ALL

    SELECT TOP(@MaxN) N = ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E12 -- Values from 1 to @MaxN

    ;

    GO

    --SELECT * FROM dbo.calendar

    /****************************************************************************************

    (3) Perf test1, just generate a bunch of dates

    ****************************************************************************************/

    SET NOCOUNT ON;

    PRINT 'Using calendar table:';

    GO

    DECLARE @x date, @st datetime = getdate();

    SELECT @x = DateTxt FROM dbo.calendar WHERE DateTxt BETWEEN '18000101' AND '22000101' ;

    PRINT DATEDIFF(MS,@st,getdate());

    GO 6

    PRINT 'Using tally function:';

    GO

    DECLARE @x date, @st datetime = getdate();

    SELECT @x = CAST(DATEADD(DAY,N,'18000101') AS date)

    FROM dbo.fnTally(0,DATEDIFF(DAY,'18000101','22000101')-1)

    PRINT DATEDIFF(MS,@st,getdate());

    GO 6

    /*

    Using calendar table:

    Beginning execution loop

    60

    36

    33

    36

    33

    33

    Batch execution completed 6 times.

    Using tally function:

    Beginning execution loop

    53

    53

    50

    50

    48

    50

    Batch execution completed 6 times.

    */

    /****************************************************************************************

    (4) Perf test2, last day and last thursday

    ****************************************************************************************/

    SET STATISTICS IO ON;

    GO

    DECLARE @start_date date = '18000101', @end_date date = '22000101', @st datetime = getdate(), @x1 date, @x2 date;

    SELECT @x1 = DATEADD(DAY,-1,DateTxt), @x2 = DATEADD(DAY, -DATEDIFF(DAY, 3, DATEADD(DAY,-1,DateTxt)) % 7,DATEADD(DAY,-1,DateTxt))

    FROM dbo.calendar

    WHERE DayOfMonthNbr = 1

    AND DateTxt BETWEEN DATEADD(MONTH,1,@start_date) AND DATEADD(MONTH,1,@end_date);

    PRINT DATEDIFF(MS,@st,getdate());

    SET STATISTICS IO OFF;

    GO 6

    DECLARE @start_date date = '18000101', @end_date date = '22000101', @st datetime = getdate(), @x1 date, @x2 date;

    SELECT @x1 = last_day_of_month, @x2 = DATEADD(DAY, -DATEDIFF(DAY, Thursday, last_day_of_month) % 7, last_day_of_month)

    FROM fnTally(0,DATEDIFF(MONTH, @start_date, @end_date)) t

    CROSS APPLY

    (

    SELECT DATEADD(DAY, -1, DATEADD(MONTH, t.N + 1, @start_date)) AS last_day_of_month,

    3 AS Thursday --0 for Mon, 1 for Tue, etc..

    ) AS assign_alias_names

    WHERE t.N BETWEEN 0 AND DATEDIFF(MONTH, @start_date, @end_date)

    PRINT DATEDIFF(MS,@st,getdate());

    GO 6

    /*

    Beginning execution loop

    Table 'calendar'. Scan count 1, logical reads 8, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    120

    16

    10

    10

    13

    10

    Batch execution completed 6 times.

    Beginning execution loop

    23

    23

    23

    20

    20

    20

    Batch execution completed 6 times.

    */

    Agian my experience has been that calendar table has outperformed a permanent or virtual tally table. This after lots of testing.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Alan.B (3/30/2016)


    Alan.B (3/30/2016)


    ScottPletcher (3/30/2016)


    Alan.B (3/30/2016)


    ScottPletcher (3/30/2016)


    Alan.B (3/29/2016)

    Lastly, ditto the last Thursday of the month:

    SET STATISTICS IO ON;

    PRINT 'Old way:';

    SELECT DISTINCT MAX([Date]) OVER (PARTITION BY [Year],[MonthName],[WeekdayName]) AS [Date]

    FROM dbo.itvf_CALENDAR('01/01/2014','12/31/2014')

    WHERE [WeekdayName] = 'Thursday'

    ORDER BY [Date]

    PRINT CHAR(10)+'better way:';

    SELECT MAX([Date])

    FROM dbo.itvf_CALENDAR('01/01/2014','12/31/2014')

    WHERE [Weekday] = 5

    GROUP BY [Month]

    SET STATISTICS IO OFF;

    One certainly doesn't need any calendar table at all just to get the last given day of a month(s). Just a standard tally table, to enumerate the months, and a simple mathematical calc to get the day:

    His function is using a tally table so you and I are basically doing the same thing more or less. I was demonstrating a better way to get the last Thursday of the month than the example he included in his article (which uses the Calendar iTVF). His method uses DISTINCT MAX([Date]) OVER (PARTITION BY... whereas mine gets the job done with a simple GROUP BY and therefore no worktable, a fraction of the reads, etc...

    On a separate note I used to use a tally table for these kinds of date calculations but have found that a nicely indexed calendar table (or dim_date table) outperforms a tally table.

    Huh? How can I/O outperform NO I/O?? It's just arithmetic. The tally is used only for the number of months, NOT in the date/day of week calc at all.

    Edit: I am using a physical tally table only because my filter at work does not allow "UNION ALL" in code (it assumes it's "sql injection" and rejects it(!)). A tally cte, esp. for a tiny number such as the number of months, would also be zero I/O.

    Busy work day... I'll put together a performance test a little later tonight and you can tell if you think I'm missing something.

    Argh! I had a longer reply and accidentally refreshed (CTRL+R is cool in SSMS, terrible in google Chrome). Anyhow, here's a couple tests I did with a bunch of dates. Not realistic to go back to the 1800's but I wanted enough rows to work with. The reads BTW is a red herring too because the calendar table generates virtually 0 reads. I have some better tests somewhere but could not find them so I through this together real quick. Note the superior calendar table execution plan (filter - free, all the action from non-clustered index seeks)

    USE tempdb

    GO

    /****************************************************************************************

    (1) Create simplified calendar table

    ****************************************************************************************/

    IF OBJECT_ID('dbo.calendar') IS NOT NULL DROP TABLE dbo.calendar;

    CREATE TABLE dbo.calendar

    (

    DateTxt date primary key,

    YearNbr smallint NOT NULL,

    MonthNbr tinyint NOT NULL,

    DayOfMonthNbr tinyint NOT NULL,

    DayOfWeekNbr tinyint NOT NULL

    );

    -- On a real calendar table you want this index present with the required INCLUDE columns

    CREATE UNIQUE NONCLUSTERED INDEX uq_xxx

    ON dbo.calendar(YearNbr, MonthNbr, DateTxt);

    CREATE NONCLUSTERED INDEX nc_xxx ON dbo.calendar (DayOfWeekNbr) INCLUDE (DateTxt,YearNbr,MonthNbr);

    CREATE NONCLUSTERED INDEX nc_xxx2 ON dbo.calendar (DayOfMonthNbr) INCLUDE (DateTxt);

    -- Populate the calendar table

    WITH dates(DateTxt) AS

    (

    SELECT TOP (DATEDIFF(DAY,'18000101','22000101'))

    CAST(DATEADD(DAY,ROW_NUMBER() OVER (ORDER BY (SELECT 1))-1,'18000102') AS date)

    FROM sys.all_columns a, sys.all_columns b

    )

    INSERT dbo.calendar

    SELECT DateTxt, DATEPART(YEAR,DateTxt), DATEPART(MONTH,DateTxt), DATEPART(DAY,DateTxt), DATEPART(WEEKDAY,DateTxt)

    FROM dates;

    GO

    /****************************************************************************************

    (2) Create Tally Table function (using Jeff Moden's "fnTally")

    ****************************************************************************************/

    CREATE FUNCTION dbo.fnTally

    /**********************************************************************************************************************

    Purpose:

    Return a column of BIGINTs from @ZeroOrOne up to and including @MaxN with a max value of 1 Trillion.

    As a performance note, it takes about 00:02:10 (hh:mm:ss) to generate 1 Billion numbers to a throw-away variable.

    Usage:

    --===== Syntax example (Returns BIGINT)

    SELECT t.N

    FROM dbo.fnTally(@ZeroOrOne,@MaxN) t

    ;

    Notes:

    1. Based on Itzik Ben-Gan's cascading CTE (cCTE) method for creating a "readless" Tally Table source of BIGINTs.

    Refer to the following URLs for how it works and introduction for how it replaces certain loops.

    http://www.sqlservercentral.com/articles/T-SQL/62867/

    http://sqlmag.com/sql-server/virtual-auxiliary-table-numbers

    2. To start a sequence at 0, @ZeroOrOne must be 0 or NULL. Any other value that's convertable to the BIT data-type

    will cause the sequence to start at 1.

    3. If @ZeroOrOne = 1 and @MaxN = 0, no rows will be returned.

    5. If @MaxN is negative or NULL, a "TOP" error will be returned.

    6. @MaxN must be a positive number from >= the value of @ZeroOrOne up to and including 1 Billion. If a larger

    number is used, the function will silently truncate after 1 Billion. If you actually need a sequence with

    that many values, you should consider using a different tool. Wink

    7. There will be a substantial reduction in performance if "N" is sorted in descending order. If a descending

    sort is required, use code similar to the following. Performance will decrease by about 27% but it's still

    very fast especially compared with just doing a simple descending sort on "N", which is about 20 times slower.

    If @ZeroOrOne is a 0, in this case, remove the "+1" from the code.

    DECLARE @MaxN BIGINT;

    SELECT @MaxN = 1000;

    SELECT DescendingN = @MaxN-N+1

    FROM dbo.fnTally(1,@MaxN);

    8. There is no performance penalty for sorting "N" in ascending order because the output is explicity sorted by

    ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    Revision History:

    Rev 00 - Unknown - Jeff Moden

    - Initial creation with error handling for @MaxN.

    Rev 01 - 09 Feb 2013 - Jeff Moden

    - Modified to start at 0 or 1.

    Rev 02 - 16 May 2013 - Jeff Moden

    - Removed error handling for @MaxN because of exceptional cases.

    Rev 03 - 22 Apr 2015 - Jeff Moden

    - Modify to handle 1 Trillion rows for experimental purposes.

    **********************************************************************************************************************/

    (@ZeroOrOne BIT, @MaxN BIGINT)

    RETURNS TABLE WITH SCHEMABINDING AS

    RETURN WITH

    E1(N) AS (SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1) --10E1 or 10 rows

    , E4(N) AS (SELECT 1 FROM E1 a, E1 b, E1 c, E1 d) --10E4 or 10 Thousand rows

    ,E12(N) AS (SELECT 1 FROM E4 a, E4 b, E4 c) --10E12 or 1 Trillion rows

    SELECT N = 0 WHERE ISNULL(@ZeroOrOne,0)= 0 --Conditionally start at 0.

    UNION ALL

    SELECT TOP(@MaxN) N = ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E12 -- Values from 1 to @MaxN

    ;

    GO

    --SELECT * FROM dbo.calendar

    /****************************************************************************************

    (3) Perf test1, just generate a bunch of dates

    ****************************************************************************************/

    SET NOCOUNT ON;

    PRINT 'Using calendar table:';

    GO

    DECLARE @x date, @st datetime = getdate();

    SELECT @x = DateTxt FROM dbo.calendar WHERE DateTxt BETWEEN '18000101' AND '22000101' ;

    PRINT DATEDIFF(MS,@st,getdate());

    GO 6

    PRINT 'Using tally function:';

    GO

    DECLARE @x date, @st datetime = getdate();

    SELECT @x = CAST(DATEADD(DAY,N,'18000101') AS date)

    FROM dbo.fnTally(0,DATEDIFF(DAY,'18000101','22000101')-1)

    PRINT DATEDIFF(MS,@st,getdate());

    GO 6

    /*

    Using calendar table:

    Beginning execution loop

    60

    36

    33

    36

    33

    33

    Batch execution completed 6 times.

    Using tally function:

    Beginning execution loop

    53

    53

    50

    50

    48

    50

    Batch execution completed 6 times.

    */

    /****************************************************************************************

    (4) Perf test2, last day and last thursday

    ****************************************************************************************/

    SET STATISTICS IO ON;

    GO

    DECLARE @start_date date = '18000101', @end_date date = '22000101', @st datetime = getdate(), @x1 date, @x2 date;

    SELECT @x1 = DATEADD(DAY,-1,DateTxt), @x2 = DATEADD(DAY, -DATEDIFF(DAY, 3, DATEADD(DAY,-1,DateTxt)) % 7,DATEADD(DAY,-1,DateTxt))

    FROM dbo.calendar

    WHERE DayOfMonthNbr = 1

    AND DateTxt BETWEEN DATEADD(MONTH,1,@start_date) AND DATEADD(MONTH,1,@end_date);

    PRINT DATEDIFF(MS,@st,getdate());

    SET STATISTICS IO OFF;

    GO 6

    DECLARE @start_date date = '18000101', @end_date date = '22000101', @st datetime = getdate(), @x1 date, @x2 date;

    SELECT @x1 = last_day_of_month, @x2 = DATEADD(DAY, -DATEDIFF(DAY, Thursday, last_day_of_month) % 7, last_day_of_month)

    FROM fnTally(0,DATEDIFF(MONTH, @start_date, @end_date)) t

    CROSS APPLY

    (

    SELECT DATEADD(DAY, -1, DATEADD(MONTH, t.N + 1, @start_date)) AS last_day_of_month,

    3 AS Thursday --0 for Mon, 1 for Tue, etc..

    ) AS assign_alias_names

    WHERE t.N BETWEEN 0 AND DATEDIFF(MONTH, @start_date, @end_date)

    PRINT DATEDIFF(MS,@st,getdate());

    GO 6

    /*

    Beginning execution loop

    Table 'calendar'. Scan count 1, logical reads 8, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    120

    16

    10

    10

    13

    10

    Batch execution completed 6 times.

    Beginning execution loop

    23

    23

    23

    20

    20

    20

    Batch execution completed 6 times.

    */

    Agian my experience has been that calendar table has outperformed a permanent or virtual tally table. This after lots of testing.

    Recursion is known to be a poorly-performing method of generating numbers. Yeah, a crippled function probably will perform worse. Use an in-line cte and the performance is vastly better. Or even a properly clustered physical tally table.

    Also, most people don't have custom nonclustered indexes on their calendar table, "because it's so small it doesn't matter anyway". I don't have any objection per se to adding them, it's just not commonly done.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who give you the shirt off *someone else's* back.

Viewing 15 posts - 16 through 30 (of 39 total)

You must be logged in to reply to this topic. Login to reply