Dealing with custom date formats in T-SQL

  • Formatting dates in the presentation layer is best, because how the application wants dates formatted and how reporting wants dates formatted will be different, even if you get them to initially agree on a standard.

    Still, formatting dates within the SELECT statement is tolerable and sometimes necessary.

    However, containing date/time values in VARCHAR columns is perhaps one of the worst database anti-patterns ever. Not only does it require more effort to insure that all applications INSERT using a consistent format, but VARCHAR "dates" consume more storage space, and depending on the format, it often requires the use non-indexable conversion expressions in the WHERE clause.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Eric M Russell (3/23/2012)


    Formatting dates in the presentation layer is best, because how the application wants dates formatted and how reporting wants dates formatted will be different, even if you get them to initially agree on a standard.

    Still, formatting dates within the SELECT statement is tolerable and sometimes necessary.

    However, containing date/time values in VARCHAR columns is perhaps one of the worst database anti-patterns ever. Not only does it require more effort to insure that all applications INSERT using a consistent format, but VARCHAR "dates" consume more storage space, and depending on the format, it often requires the use non-indexable conversion expressions in the WHERE clause.

    I completely agree.

    Not to mention the inability to sort properly, perform date/time specific calculations, check against invalid values (february 29 on non-leap years etc.)... and I could go on forever!

    Another date/time anti-pattern that I saw implemented in some shops is a separate column for each date part: one (int!!!!!!!) column for the year, one column for the month etc... Fits well with a DWH calendar table, but is a horrible choice for OLTP.

    -- Gianluca Sartori

  • Gianluca Sartori (3/23/2012)


    Eric M Russell (3/23/2012)


    Formatting dates in the presentation layer is best, because how the application wants dates formatted and how reporting wants dates formatted will be different, even if you get them to initially agree on a standard.

    Still, formatting dates within the SELECT statement is tolerable and sometimes necessary.

    However, containing date/time values in VARCHAR columns is perhaps one of the worst database anti-patterns ever. Not only does it require more effort to insure that all applications INSERT using a consistent format, but VARCHAR "dates" consume more storage space, and depending on the format, it often requires the use non-indexable conversion expressions in the WHERE clause.

    I completely agree.

    Not to mention the inability to sort properly, perform date/time specific calculations, check against invalid values (february 29 on non-leap years etc.)... and I could go on forever!

    Another date/time anti-pattern that I saw implemented in some shops is a separate column for each date part: one (int!!!!!!!) column for the year, one column for the month etc... Fits well with a DWH calendar table, but is a horrible choice for OLTP.

    When confronted with a VARCHAR date, it is still possible for us to impose our will upon it at the DDL level. The following check constraint not only requires the value to be coded in the format YYYYMMDD, but it will require that the value be a valid calendar date. For example, it rejects an insert of February 29, 2011 but accepts February 29, 2012.

    create table foo

    (

    foo_date varchar(30) not null

    constraint ck_foo_date_yyyymmdd

    check (foo_date = convert(char(8),cast(foo_date as datetime),112))

    );

    insert into foo (foo_date) values ('2011/02/28');

    Error 547: The INSERT statement conflicted with the CHECK constraint "ck_foo_date_yyyymmdd".

    :satisfied:

    insert into foo (foo_date) values ('20110229');

    Error 242: The conversion of a varchar data type to a datetime data type resulted in an out-of-range value.

    :satisfied:

    insert into foo (foo_date) values ('20120229');

    (1 row(s) affected)

    😎

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Very nice article. I was especially interested where you showed that CLR methods performed much better. I've heard this about CLR performing better in most cases.,

  • gclausen (3/23/2012)


    Very nice article. I was especially interested where you showed that CLR methods performed much better. I've heard this about CLR performing better in most cases.,

    Well, it depends on the task you have to accomplish.

    When it comes down to complex string manipulations, CLR is nearly always the best choice.

    The same can't be said for data manipulation.

    -- Gianluca Sartori

  • Agree - it just depends on the task at hand.

  • Very nice article. It provides a set of useful solutions to a common issue, while also highlighting the fact that some common "mantras" (i.e, Inline TVFs are more efficient than scalar UDFs) are not always true.

    I'd have liked to see some charts showing how the different solutions scale from 1 to millions of rows, but that's really not something I feel should be required of a short article like this, and you've provided enough information for me to test this on my own if I want to take the time.

    Finally one question: did you do your testing on a preview or RTM version of SQL Server 2012? If it's a preview version, there might be some debugging code slowing down the new function.

  • sknox (3/23/2012)


    Very nice article. It provides a set of useful solutions to a common issue, while also highlighting the fact that some common "mantras" (i.e, Inline TVFs are more efficient than scalar UDFs) are not always true.

    I'd have liked to see some charts showing how the different solutions scale from 1 to millions of rows, but that's really not something I feel should be required of a short article like this, and you've provided enough information for me to test this on my own if I want to take the time.

    Finally one question: did you do your testing on a preview or RTM version of SQL Server 2012? If it's a preview version, there might be some debugging code slowing down the new function.

    Ha! Nice catch.

    No, I tested the code with CTP3 and RC0, because RTM was not ready yet at the time.

    As far as the chart is concerned, 20 seconds of Excel produced this:

    -- Gianluca Sartori

  • Excellent functions. I've already tested them on my own server (2008) and added them to my toolbox of functions. In case you are interested, I added a tiny modification to both functions to handle Day Name Suffixes such as "1st," "2nd," "3rd," and "4th." This code piece can just be added right at the end of the "formattedDate" SELECT statement.

    WHEN 'x' THEN

    CASE

    WHEN CAST(DAY(@date) AS INT) IN (1,21,31) THEN N'st'

    WHEN CAST(DAY(@date) AS INT) IN (2,22) THEN N'nd'

    WHEN CAST(DAY(@date) AS INT) IN (3,23) THEN N'rd'

    ELSE N'th'

    END

    Now a mask like this can be used 'DDDD, MMMM DDx, YYYY' to produce 'Friday, March 23rd, 2012'.

    Also, the difference between 'hh' and 'h' wasn't clear and didn't seem to produce different results no matter what dates I used for input. Since the 'MM' vs 'M', 'DD' vs 'D', 'HH' vs 'H' all produced the month or day with or without a leading zero respectively, I would assume the same would be the case for 'hh' and 'h'. So I modified the code as:

    WHEN 'h' THEN

    CASE WHEN LEFT(

    RIGHT('00' + CAST(

    CASE DATEPART(hour,@date)

    WHEN 12 THEN 12

    ELSE DATEPART(hour,@date) % 12

    END AS NVARCHAR(2)),2),1) = 0 THEN

    RIGHT('00' + CAST(

    CASE DATEPART(hour,@date)

    WHEN 12 THEN 12

    ELSE DATEPART(hour,@date) % 12

    END AS NVARCHAR(2)),1)

    ELSE

    RIGHT('00' + CAST(

    CASE DATEPART(hour,@date)

    WHEN 12 THEN 12

    ELSE DATEPART(hour,@date) % 12

    END AS NVARCHAR(2)),2)

    END

    Perhaps there is a more efficient way of doing this, but I basically just took the code for 'hh' formatting and turned it into a case statement.

    Now, to be thorough, I also added to the masking function of the ParseString function to allow for filtering out Day Name suffixes:

    Added to the 'allowedTokens'

    UNION ALL

    SELECT

    'x'

    ,'DAYSUFFIX'

    Added to the 'pivotedTokens'

    ,daysuffix_value = ''

    and

    ,[DAYSUFFIX]

    Added to the 'verifiedTokens' pivot list

    ,[DAYSUFFIX]

    WITHOUT these changes this: 'October 26th 2011', 'MMMM DD YYYY' will cause an error

    WITH these changes this: 'October 26th 2011', 'MMMM DDx YYYY' parses the date successfully

     

  • charles.southey (3/23/2012)


    Yes - it's relevant to the article and provides a possible solution for those grappling with the problem under discussion that they may be looking for. We would never post something that is not directly relevant. Not everyone wants a commercial solution - but I think it's worth knowing that it's there if you do. BTW the product is perpetually free for use on Developer Edition.

    OK, Charles, since you're here with your potential solution... how does your product stack up against the author's custom CLR function and the rest?

    When doing exactly the task(s) described in the article, how long does it take?

    I'm sure there are other potential comparisons (flexibility, ease of use, etc), and if you discussed those here I'd consider it an advertisement for sure.

    But speed is really a comparison that would add to this conversation, IMHO, without being an "ad". The perf disparity between FORMAT() and the custom function is very interesting and has made me very curious. Is there some general pattern here, what can we learn?

    Thanks in advance,

    >L<

  • Lisa Slater Nicholls (3/23/2012)


    charles.southey (3/23/2012)


    Yes - it's relevant to the article and provides a possible solution for those grappling with the problem under discussion that they may be looking for. We would never post something that is not directly relevant. Not everyone wants a commercial solution - but I think it's worth knowing that it's there if you do. BTW the product is perpetually free for use on Developer Edition.

    OK, Charles, since you're here with your potential solution... how does your product stack up against the author's custom CLR function and the rest?

    When doing exactly the task(s) described in the article, how long does it take?

    I'm sure there are other potential comparisons (flexibility, ease of use, etc), and if you discussed those here I'd consider it an advertisement for sure.

    But speed is really a comparison that would add to this conversation, IMHO, without being an "ad". The perf disparity between FORMAT() and the custom function is very interesting and has made me very curious. Is there some general pattern here, what can we learn?

    Thanks in advance,

    >L<

    Oh My! 🙂 There's another "me" out there! I was going to ask Charles for his performance stats on 10,000, 100,000, and 1 million rows of dates and even offer to build the test table for him. Well done, Lisa!

    The only bad part about Charles doing the testing is having the proverbial mouse guard the cheese. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Gianluca, my ol' and trusted friend,

    Very nice and well written article with lots of well fomatted, easy to read, documented code. Nicely done.

    Shifting gears, I have a couple of questions, please.

    1. What is the configuration and speed of the computer you did the testing on?

    2. Which edition and version of SQL Server and Windows did you run your tests on?

    3. When you did your duration testing, where did you have the result set return to? A table? Grid mode on the screen? A throw-away variable? or ???

    4. When you did your duration testing, how did you measure the duration? STATISTICS TIME ON? SQL Profiler? or ???

    5. When you did your duration testing, what was the configuration of your test table? Single column of dates? Indexes or not?

    6. Is there any chance of you attaching your test data generator and your test harness to this thread so I can make sure that my testing is done the same way you did it?

    7. Since I don't even know how to spell C#, would it be possible for you to script the CLR assembly out in a "Paul White" fashion so that I could install it on my machine for a little testing of my own?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • The only bad part about Charles doing the testing is having the proverbial mouse guard the cheese.

    Actually, Jeff, in this case, I look at it as "put up or shut up" ;-).

    Regardless of the author's original computer specs (although that made me curious too, thanks for asking hiim), I'd like to see the vendor run the code in this article, on any machine or set of machines, plus his own version, and provide all stats on a level playing field.

    Here's a case where we *know* the test was not vendor-designed. If Charles runs exactly that code, plus his, in like conditions (and I invite you to add conditions: start sql server up cold? anything?), we should really learn something.

    >L<

  • Really good article, I like it very much. So please don't take the following as suggesting that it isn't, because that would be utterly wrong.

    There are a couple of things that perhaps should have been mentioned (to warn people that they need to be considered when relevant, not to offer solutions in the article because covering everyting would make the article too long and complicated and anyway most people don't need to worry about either of these things).

    1) Everything is varchar. But I may want to be able to handle languages that don't work with varchar (256 is quite a small number, there are non-alphabetic languages that have to be coped with and 256 characters is sometimes far too few). I don't think there's any need to cope with more than one language at a time, but if there were that would make it even harder to fit into varchar.

    2) None of your functions will work if you are outside the Gregorian calendar. We can use T-SQL cast/convert to get Kuwaiti Hijri, as for example

    select convert(varchar(32),CONVERT(datetime2,'2012-03-23 21:00:00.000'),131)

    using SQL Server's capability to handle Kuwaiti dates (but I wouldn't bet on it delivering correct dates in the future, as one form or another of fundamentalist may succede in forcing a change in Kuwaiti dates). But this will not work for Saudi format, or Iranian format, or.... (the year numbers will be different hardly ever, the month numbers will be different a little of the time, the day numbers are likely to be different a lot of the time). So the argument that built in date handling by CONVERT solves the Hijri problem doesn't work, because only Kuwaiti dates can be handled. Islamic Hijri isn't the only calendar in current use whose months and years don't match up with the Gregorian calendar, even the Julian calendar would be a trivial example if anyone actually used it any more.

    So people need to watch out for alphabets which don't have 256 or fewer characters and for calendars which have different (possibly variable) year lengths and different (possibly variable) month lengths from our Western culture.

    Tom

  • Steven Willis (3/23/2012)


    Excellent functions. I've already tested them on my own server (2008) and added them to my toolbox of functions. In case you are interested, I added a tiny modification to both functions to handle Day Name Suffixes such as "1st," "2nd," "3rd," and "4th." This code piece can just be added right at the end of the "formattedDate" SELECT statement.

    ...

    Now a mask like this can be used 'DDDD, MMMM DDx, YYYY' to produce 'Friday, March 23rd, 2012'.

    Thank you, Steven. This could be a valuable addition to the code.

    However, I tried to stick to the date format placeholders you can find on this page on MSDN. I tried parsing a date containing the day name suffix with the built-in PARSE function and it did not work.

    Also, the difference between 'hh' and 'h' wasn't clear and didn't seem to produce different results no matter what dates I used for input. Since the 'MM' vs 'M', 'DD' vs 'D', 'HH' vs 'H' all produced the month or day with or without a leading zero respectively, I would assume the same would be the case for 'hh' and 'h'. So I modified the code as:

    WHEN 'h' THEN

    CASE WHEN LEFT(

    RIGHT('00' + CAST(

    CASE DATEPART(hour,@date)

    WHEN 12 THEN 12

    ELSE DATEPART(hour,@date) % 12

    END AS NVARCHAR(2)),2),1) = 0 THEN

    RIGHT('00' + CAST(

    CASE DATEPART(hour,@date)

    WHEN 12 THEN 12

    ELSE DATEPART(hour,@date) % 12

    END AS NVARCHAR(2)),1)

    ELSE

    RIGHT('00' + CAST(

    CASE DATEPART(hour,@date)

    WHEN 12 THEN 12

    ELSE DATEPART(hour,@date) % 12

    END AS NVARCHAR(2)),2)

    END

    Perhaps there is a more efficient way of doing this, but I basically just took the code for 'hh' formatting and turned it into a case statement.

     

    Nice catch, thank you.

    -- Gianluca Sartori

Viewing 15 posts - 16 through 30 (of 32 total)

You must be logged in to reply to this topic. Login to reply