SQLServerCentral Article

Dealing with custom date formats in T-SQL

,

One of the questions I see asked over and over on the forums is “How do I format this datetime column with this particular format string?”. Or, even more often: “How do I parse a date from this character column?”.

The canned answers people usually get are along the lines of “Don’t format dates in T-SQL” and “Don’t store dates in character columns”, with slight variations. While I heartedly agree with those answers, I must admit that I had to deal with some cases where the conversion had to be performed in the database and I will probably need that again someday.

SQL Server offers different possibilities to accomplish the task: in this article I will try to discuss and compare some of them and hopefully, help you decide which one best suits your needs.

The best method

Don’t format dates: you don’t need to and you don’t have to.

SQL Server is not a presentation software, it is a relational database engine, which stores dates in their internal binary representation and returns values to external applications using the client software’s binary representation.

In turn, the applications that retrieve temporal data from SQL Server treat dates in their own internal format and must use a formatting function to convert the internal representation in a human readable format.

Humans can read strings, but applications can handle temporal data only when it is stored in the internal representation of the programming language. This is exactly why you don’t want to store dates in a varchar column; you don’t want to feed applications with dates stored as strings because that makes the information impossible to compare, sort, and treat properly with date functions.

An application that receives a date in its string representation has to parse it to convert it back to a date: it is like using MSPaint as a word processor and extracting the text with an OCR software!

However, in some particular cases, a formatting function in T-SQL can come handy, especially when the database and application layers are collapsed. A typical example is database mail, where the sp_send_dbmail stored procedure can accept a query parameter to extract some data and attach it to the message: in this case, there is no way to control how a date column gets formatted and the only option is choosing the appropriate format in advance.

The same considerations can be applied to parsing dates from a string: the best place to do this is your application, so you’d better save yourself the headaches that doing it in SQLServer brings.

The built-in method (so far)

As an evidence that converting from string to date and back is not an heretic crime, T-SQL comes with the built-in CONVERT function, which can handle the most common date and time formats.

If you are lucky enough to find the desired format in the Date and Time styles table showing in BOL, you can use the “style” parameter to obtain the formatted string.

SELECT CONVERT(varchar(30), GETDATE(), 113) AS EuropeanDate

EuropeanDate

------------------------------

24 Oct 2011 18:18:42:700

Sadly enough, the styles table contains the most common formats in western countries, but doesn’t offer a ready-made option for all the locales around the world. If the desired format is not in the list, you’re out of luck: implementing a custom format can turn into a nightmare and make your code bulky and unreadable.

Some custom formats can be obtained combining substrings of different CONVERT styles. For instance, to obtain the current date and time using the Italian format, you could issue:

SELECT
      CONVERT(varchar(30), GETDATE(), 103) + ' ' +
      CONVERT(varchar(30), GETDATE(), 108) AS ItalianDate

ItalianDate

-------------------

24/10/2011 18:18:42

In my opinion, this method has many shortcomings, starting from the fact that it needs an ugly verbose syntax that could be very difficult to understand and remember.

For the same reasons as above, I’m not a big fan of concatenating the single date parts together:

SELECT
      RIGHT('00'   + CAST(DATEPART(day,    GETDATE()) AS varchar(2)),2) + '/' +
      RIGHT('00'   + CAST(DATEPART(month,  GETDATE()) AS varchar(2)),2) + '/' +
      RIGHT('0000' + CAST(DATEPART(year,   GETDATE()) AS varchar(4)),4) + ' ' +
      RIGHT('00'   + CAST(DATEPART(hour,   GETDATE()) AS varchar(2)),2) + ':' +
      RIGHT('00'   + CAST(DATEPART(minute, GETDATE()) AS varchar(2)),2) + ':' +
      RIGHT('00'   + CAST(DATEPART(second, GETDATE()) AS varchar(2)),2) AS ItalianDate

ItalianDate

-------------------

24/10/2011 18:23:18

Another possible method could be using a scalar UDF to output the formatted date, as I showed in this post. Needless to say that scalar UDFs can become very slow (they are called once for each input row) when dealing with large data sets.

The same applies to parsing dates from a string with a custom format, keeping in mind that parsing is much more complicated than just formatting.

The built-in method (in the near future)

Microsoft must have felt the pain of his users and decided to include new formatting and parsing functions in the next coming version of SQL Server (SQL Server 2012).

The FORMAT function accepts numeric and temporal data and formats the input value into a human-readable representation, using either the default format for the specified locale or a custom format string.

For instance, to output the current date in Italian format, you can use:

SELECT FORMAT(GETDATE(),'dd/MM/yyyy') AS ItalianDate
--or
SELECT FORMAT(GETDATE(),'d', 'it-IT') AS ItalianDate

ItalianDate

------------

24/10/2011

Similarly, the PARSE function accepts a string parameter and does its best to convert it to various data types, including date and time subtypes. In this case, you are bound to using the recognized locales and cannot use a custom format, which is a shame in my opinion.

For instance, you can use the PARSE function to convert the date used in the previous example:

SELECT PARSE('24/10/2011' AS datetime USING 'it-IT') AS ItalianDate

Unsurprisingly, both FORMAT and PARSE functions rely on the .NET Framework CLR and will not work when the CLR is not installed. For the same reason, the first time you call any of these functions, you have to wait for the CLR runtime to load.

The CLR method

With a few rows of C#, it‘s easy enough to put together a function to convert a date using a custom format. I’m not a C# guru and I hope this code is clear enough to make the point.

//Scalar CLR function: dateFormat
[Microsoft.SqlServer.Server.SqlFunction]
public static SqlString dateFormat(DateTime value, String formatString)
{
    return new SqlString(value.ToString(formatString));
}

//struct used by the FillRowMethod
private struct ReturnValues
{
    public SqlString Value;
}

//Called by FillRow
private static void FillValues(object obj, out SqlString TheValue)
{
    ReturnValues ReturnVals = (ReturnValues)obj;
    TheValue = ReturnVals.Value;
}
//Table-valued CLR function: dateFormatITVF
[Microsoft.SqlServer.Server.SqlFunction(DataAccess = DataAccessKind.None,
    IsDeterministic = true,
    IsPrecise = true,
    SystemDataAccess = SystemDataAccessKind.None,
    FillRowMethodName = "FillValues",
    TableDefinition = "formattedDate nvarchar(50)")]
public static System.Collections.IEnumerable TVF_Streaming(DateTime value, String formatString)
{
    ReturnValues Vals = new ReturnValues();
    Vals.Value = new SqlString(value.ToString(formatString));
    yield return Vals;
}

This code creates a CLR assembly with two functions: the first one is a scalar UDF and the second one is a ITVF and both rely on the “toString” built-in function to format the input date.

Let’s see how these CLR functions can be used:

--Scalar CLR UDF
SELECT dbo.dateFormat(GETDATE(), 'yyyy-MM-dd') AS formattedDate

formattedDate

-------------

2011-10-25

--Table-valued CLR UDF
SELECT result AS formattedDate
FROM dbo.dateFormatITVF(GETDATE(), 'yyyy-MM-dd')

formattedDate

-------------

2011-10-25

Using a custom CLR assembly could be the fastest way to accomplish the same tasks as format and parse functions in SQL Server versions prior to 2012. However, some shops don’t allow the execution of custom CLR code, which would obviously make this method non applicable.

Two ITVFs to format dates

From a performance standpoint, inline table-valued functions (ITVFs) are usually a better choice than scalar UDFs, but they bring in some problematic limitations. For instance, an ITVF can only contain a single SELECT statement, wrapped in a RETURN command, which takes looping and branching constructs out of the available tools list.

Actually, you can still perform a loop using a recursive CTE, which is one of the accepted constructs in ITVFs. Keeping this in mind, I came up with this code:

-- =============================================
-- Author:      Gianluca Sartori - @spaghettidba
-- Create date: 2011-10-14
-- Description: Formats a date using the supplied
--              format string
-- =============================================
CREATE FUNCTION [dbo].[formatDateWithReplace](@date AS datetime, @format_string varchar(50))
RETURNS TABLE
AS
RETURN (
    -- ================================================
    -- create the allowedTokens list
    -- ================================================
    WITH allowedTokens (id, code, value) AS (
        SELECT id,
            code COLLATE Latin1_General_CS_AS,
            value
        FROM (
                      SELECT  1, 'YYYY', RIGHT('0000' + CAST(YEAR(@date) AS varchar(4)),4)
            UNION ALL SELECT  2, 'YY',   RIGHT('00' + CAST(YEAR(@date) AS varchar(4)),2)
            UNION ALL SELECT  3, 'Y',    CAST(CAST(RIGHT('00' + CAST(YEAR(@date) AS varchar(4)),2) AS int) AS varchar(2))
            UNION ALL SELECT  4, 'MMMM', CHAR(1)
            UNION ALL SELECT  5, 'MM',   RIGHT('00' + CAST(MONTH(@date) AS varchar(2)),2)
            UNION ALL SELECT  6, 'M',    CAST(MONTH(@date) AS varchar(2))
            UNION ALL SELECT  7, 'DDDD', CHAR(2)
            UNION ALL SELECT  8, 'DD',   RIGHT('00' + CAST(DAY(@date) AS varchar(2)),2)
            UNION ALL SELECT  9, 'D',    CAST(DAY(@date) AS varchar(2))
            UNION ALL SELECT 10, 'HH',   RIGHT('00' + CAST(DATEPART(hour,@date) AS varchar(2)),2)
            UNION ALL SELECT 11, 'H',    CAST(DATEPART(hour,@date) AS varchar(2))
            UNION ALL SELECT 12, 'hh',   RIGHT('00' + CAST(DATEPART(hour, @date) - (12 * CEILING((DATEPART(hour, @date) - 12)*.1)) AS varchar(2)),2)
            UNION ALL SELECT 13, 'h',    CAST(DATEPART(hour, @date) - (12 * CEILING((DATEPART(hour, @date) - 12)*.1)) AS varchar(2))
            UNION ALL SELECT 14, 'mm',   RIGHT('00' + CAST(DATEPART(minute,@date) AS varchar(2)),2)
            UNION ALL SELECT 15, 'm',    CAST(DATEPART(minute,@date) AS varchar(2))
            UNION ALL SELECT 16, 'ss',   RIGHT('00' + CAST(DATEPART(second,@date) AS varchar(2)),2)
            UNION ALL SELECT 17, 's',    CAST(DATEPART(second,@date) AS varchar(2))
            UNION ALL SELECT 18, 'fff',  RIGHT('000' + CAST(DATEPART(millisecond,@date) AS varchar(3)),3)
            UNION ALL SELECT 19, 'f',    CAST(DATEPART(millisecond,@date) AS varchar(3))
            UNION ALL SELECT 20, 'tt',   CASE WHEN DATEPART(hour,@date) >= 12 THEN 'PM' ELSE 'AM' END
            UNION ALL SELECT 21, 't',    CASE WHEN DATEPART(hour,@date) >= 12 THEN 'P' ELSE 'A' END
            UNION ALL SELECT 22, CHAR(1),DATENAME(month, @date)
            UNION ALL SELECT 23, CHAR(2),DATENAME(weekday, @date)
        ) AS susbst (id, code, value)
    ),
    -- ================================================
    -- create the list of values to replace in the input
    -- ================================================
    substitutions (id, code, value, maxval) AS (
        SELECT ROW_NUMBER() OVER (ORDER BY id, set_id), code, value, COUNT(*) OVER ()
        FROM (
            SELECT 0 AS set_id, id, code, value
            FROM allowedTokens
        ) AS src
    ),
    -- ================================================
    -- make the format string case-sensitive
    -- ================================================
    formatStrings (formatString) AS (
        SELECT @format_string COLLATE Latin1_General_CS_AS
    ),
    -- ================================================
    -- build a recursive CTE to replace tokens one at a time
    -- ================================================
    recursiveReplace AS (
        SELECT s.id,
            REPLACE(
                f.formatString,
                s.code,
                s.value) AS formattedDate,
                s.maxval
        FROM formatStrings AS f
        INNER JOIN substitutions AS s
            ON s.id = 1
        UNION ALL
        SELECT s.id,
            REPLACE(
                r.formattedDate,
                s.code,
                s.value) AS formattedDate,
                s.maxval
        FROM recursiveReplace AS r
        INNER JOIN substitutions AS s
            ON s.id = r.id + 1
    )
    -- ================================================
    -- selects the formatted string from the last row
    -- in the recursive CTE
    -- ================================================
    SELECT formattedDate
    FROM recursiveReplace
    WHERE id = maxval
)

The code is very simple and I don’t think it needs to be discussed beyond the description given in the inline comments. Using this function, you can format a datetime value with something like this:

SELECT *
FROM (
      VALUES (GETDATE(), 'YYYY-MM-DD'),
             (GETDATE(), 'DD/MM/YYYY'),
             (GETDATE(), 'MMMM DD YYYY'),
             (GETDATE(), 'YYYY-MM-DD HH:mm:ss'),
             (GETDATE(), 'YYYY-MM-DD hh:mm:ss tt')
) AS testValues (input_date, format_string)
CROSS APPLY dbo.formatDateWithReplace(input_date, format_string)

The output of the above query looks like this:

input_date              format_string          formattedDate

----------------------- ---------------------- ----------------------

2011-10-26 17:34:38.943 YYYY-MM-DD             2011-10-26

2011-10-26 17:34:38.943 DD/MM/YYYY             26/10/2011

2011-10-26 17:34:38.943 MMMM DD YYYY           October 26 2011

2011-10-26 17:34:38.943 YYYY-MM-DD HH:mm:ss    2011-10-26 17:34:38

2011-10-26 17:34:38.943 YYYY-MM-DD hh:mm:ss tt 2011-10-26 05:34:38 PM

However, while perfectly working, I don’t find this code completely satisfactory. Recursive CTEs are not the optimal choice when it comes down to performance and, while “declarative”, loops are always loops.

Wait a minute: who needs a loop when you can have a Tally table? Once joined to a tally table, parsing a format string to locate date parts placeholders turns into a very simple “islands” problem. Let’s take another stab:

-- =============================================
-- Author:      Gianluca Sartori - @spaghettidba
-- Create date: 2011-10-14
-- Description: Formats a date using the supplied
--              format string
-- =============================================
CREATE FUNCTION [dbo].[formatDateWithIslands](@date AS datetime,  @format_string varchar(50))
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN (
    WITH
    -- ================================================
    -- Build 4 rows
    -- ================================================
    fourRows (N) AS (
        SELECT 1
        UNION ALL
        SELECT 2
        UNION ALL
        SELECT 3
        UNION ALL
        SELECT 4
    ),
    -- ================================================
    -- build a small 50 rows tally table
    -- ================================================
    cteTally (N) AS (
        SELECT TOP(50) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
        FROM fourRows AS A
        CROSS JOIN fourRows AS B
        CROSS JOIN fourRows AS C
        ORDER BY 1
    ),
    -- ================================================
    -- join the format string to the tally table
    -- and add some rankings to identify the "islands"
    -- ================================================
    tokenizedString AS (
        SELECT N, C,
            groupId = DENSE_RANK() OVER(ORDER BY C, _groupId)
        FROM (
            SELECT N,
                SUBSTRING(@format_string COLLATE Latin1_General_CS_AS, N, 1) AS C,
                _groupId =
                    N - ROW_NUMBER() OVER (
                        PARTITION BY SUBSTRING(@format_string COLLATE Latin1_General_CS_AS, N, 1)
                        ORDER BY N
                    )
            FROM cteTally
            WHERE N <= LEN(@format_string)
        ) AS fs
    )
      SELECT formattedDate = (
            SELECT
                  CASE REPLICATE(MIN(C),COUNT(*))
                        WHEN 'YYYY' THEN RIGHT('0000' + CAST(YEAR(@date) AS nvarchar(4)),4)
                        WHEN 'YY'   THEN RIGHT('00' + CAST(YEAR(@date) AS nvarchar(4)),2)
                        WHEN 'Y'    THEN CAST(CAST(RIGHT('00' + CAST(YEAR(@date) AS nvarchar(4)),2) AS int) AS nvarchar(2))
                        WHEN 'MMMM' THEN DATENAME(month, @date)
                        WHEN 'MM'   THEN RIGHT('00' + CAST(MONTH(@date) AS nvarchar(2)),2)
                        WHEN 'M'    THEN CAST(MONTH(@date) AS nvarchar(2))
                        WHEN 'DDDD' THEN DATENAME(weekday, @date)
                        WHEN 'DD'   THEN RIGHT('00' + CAST(DAY(@date) AS nvarchar(2)),2)
                        WHEN 'D'    THEN CAST(DAY(@date) AS nvarchar(2))
                        WHEN 'HH'   THEN RIGHT('00' + CAST(DATEPART(hour,@date) AS nvarchar(2)),2)
                        WHEN 'H'    THEN CAST(DATEPART(hour,@date) AS nvarchar(2))
                        WHEN 'hh'   THEN RIGHT('00' + CAST(CASE DATEPART(hour, @date) WHEN 12 THEN 12 ELSE DATEPART(hour, @date) % 12 END AS nvarchar(2)),2)
                        WHEN 'h'    THEN CAST(DATEPART(hour, @date) - (12 * CEILING((DATEPART(hour, @date) - 12)*.1)) AS nvarchar(2))
                        WHEN 'mm'   THEN RIGHT('00' + CAST(DATEPART(minute,@date) AS nvarchar(2)),2)
                        WHEN 'm'    THEN CAST(DATEPART(minute,@date) AS nvarchar(2))
                        WHEN 'ss'   THEN RIGHT('00' + CAST(DATEPART(second,@date) AS nvarchar(2)),2)
                        WHEN 's'    THEN CAST(DATEPART(second,@date) AS nvarchar(2))
                        WHEN 'fff'  THEN RIGHT('000' + CAST(DATEPART(millisecond,@date) AS nvarchar(3)),3)
                        WHEN 'f'    THEN CAST(DATEPART(millisecond,@date) AS nvarchar(3))
                        WHEN 'tt'   THEN CASE WHEN DATEPART(hour,@date) >= 12 THEN N'PM' ELSE N'AM' END
                        WHEN 't'    THEN CASE WHEN DATEPART(hour,@date) >= 12 THEN N'P' ELSE N'A' END
                        ELSE MIN(C)
                  END
            FROM tokenizedString
            GROUP BY groupId
            ORDER BY MIN(N)
            FOR XML PATH(''), TYPE
      ).value('(./text())[1]','nvarchar(50)')
)

Ok, much better now. Let’s see how the function can be used:

SELECT *
FROM (
      VALUES (GETDATE(), 'YYYY-MM-DD'),
             (GETDATE(), 'DD/MM/YYYY'),
             (GETDATE(), 'MMMM DD YYYY'),
             (GETDATE(), 'YYYY-MM-DD HH:mm:ss'),
             (GETDATE(), 'YYYY-MM-DD hh:mm:ss tt')
) AS testValues (input_date, format_string)
CROSS APPLY dbo.formatDateWithIslands(input_date, format_string)

Which produces this output:

input_date              format_string          formattedDate

----------------------- ---------------------- ----------------------

2011-10-26 17:46:52.013 YYYY-MM-DD             2011-10-26

2011-10-26 17:46:52.013 DD/MM/YYYY             26/10/2011

2011-10-26 17:46:52.013 MMMM DD YYYY           October 26 2011

2011-10-26 17:46:52.013 YYYY-MM-DD HH:mm:ss    2011-10-26 17:46:52

2011-10-26 17:46:52.013 YYYY-MM-DD hh:mm:ss tt 2011-10-26 05:46:52 PM

An ITVF to parse a date from a string

The “islands” method is particularly useful when dealing with the opposite problem: parsing a string into a date.

Things here become slightly more complicated, since the strings to parse are two now: the format string, which tells you the sequence of the date parts, and the input string, which contains the actual values. In this case, the format string cannot be used to identify the start index of the date parts in the input string, but describes the order of the values inside the input string.

Since the date parts can be either separated by a delimiter or have a fixed length, the format string must be broken into groups (separated by a delimiter) and subgroups (positional fixed-length tokens).

For instance, though unusual, the format string could be formed this way:

         YYYY-MM-DD HHmmss

group    11112334556777777

subgroup 11111111111112233

This complicates things a bit and makes the code look like Phil Factor’s T-SQL poem, but it’s not an impossible deal.

-- =============================================
-- Author:      Gianluca Sartori - @spaghettidba
-- Create date: 2011-10-14
-- Description: Parses a date from its string
--              representation, using the supplied
--              format string.
-- =============================================
CREATE FUNCTION [dbo].[parseDate](@date AS varchar(50), @format_string varchar(50))
RETURNS TABLE
AS
RETURN (
    WITH
    -- ================================================
    -- Build 4 rows
    -- ================================================
    fourRows (N) AS (
        SELECT 1
        UNION ALL
        SELECT 2
        UNION ALL
        SELECT 3
        UNION ALL
        SELECT 4
    ),
    -- ================================================
    -- build a small 64 rows tally table
    -- ================================================
    cteTally (N) AS (
        SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
        FROM fourRows AS A
        CROSS JOIN fourRows AS B
        CROSS JOIN fourRows AS C
    ),
    allowedTokens (code, tokenGroup) AS (
        SELECT code COLLATE Latin1_General_CS_AS, value
        FROM (
                      SELECT 'YYYY', 'YEAR'
            UNION ALL SELECT 'YY',   'YEAR'
            UNION ALL SELECT 'Y',    'YEAR'
            UNION ALL SELECT 'MMMM', 'MONTH'
            UNION ALL SELECT 'MM',   'MONTH'
            UNION ALL SELECT 'M',    'MONTH'
            UNION ALL SELECT 'DD',   'DAY'
            UNION ALL SELECT 'D',    'DAY'
            UNION ALL SELECT 'HH',   'HOUR'
            UNION ALL SELECT 'H',    'HOUR'
            UNION ALL SELECT 'hh',   'HOUR'
            UNION ALL SELECT 'h',    'HOUR'
            UNION ALL SELECT 'mm',   'MINUTE'
            UNION ALL SELECT 'm',    'MINUTE'
            UNION ALL SELECT 'ss',   'SECOND'
            UNION ALL SELECT 's',    'SECOND'
            UNION ALL SELECT 'fff',  'MILLISECOND'
            UNION ALL SELECT 'f',    'MILLISECOND'
            UNION ALL SELECT 'tt',   'AMPM'
            UNION ALL SELECT 't',    'AMPM'
        ) AS susbst (code, value)
    ),
    allowedDelimiters (delimiter) AS (
                  SELECT ' '
        UNION ALL SELECT '-'
        UNION ALL SELECT '/'
        UNION ALL SELECT '.'
        UNION ALL SELECT ':'
        UNION ALL SELECT ','
    ),
    -- ================================================
    -- make the parameter case-sensitive
    -- ================================================
    formatStringCS (format_string) AS (
        SELECT @format_string COLLATE Latin1_General_CS_AS
    ),
    inputStringCS (input_string) AS (
        SELECT @date COLLATE Latin1_General_CS_AS
    ),
    -- ================================================
    -- join the format string to the tally table
    -- and add some rankings to identify the "islands"
    -- ================================================
    tokenizedFormatString AS (
        SELECT *,
            subGroupId =
                DENSE_RANK() OVER(
                    ORDER BY C, _subGroupId
                ),
            delimitedGroupId =
                DENSE_RANK() OVER(
                    ORDER BY isDelimiter, _delimitedGroupId
                )
        FROM (
            SELECT N,
                SUBSTRING(format_string, N, 1) AS C,
                _subGroupId =
                    N - ROW_NUMBER() OVER (
                        PARTITION BY SUBSTRING(format_string, N, 1)
                        ORDER BY N
                    ),
                _delimitedGroupId =
                    ROW_NUMBER() OVER (
                        ORDER BY N
                    ) -
                    ROW_NUMBER() OVER (
                        PARTITION BY CASE WHEN AD.delimiter IS NULL THEN 0 ELSE 1 END
                        ORDER BY N
                    ),
                isDelimiter = CASE WHEN AD.delimiter IS NULL THEN 0 ELSE 1 END
            FROM cteTally AS TALLY
            CROSS JOIN formatStringCS AS FMT
            LEFT JOIN allowedDelimiters AS AD
                ON delimiter = SUBSTRING(format_string, TALLY.N, 1)
            WHERE N <= LEN(@format_string)
        ) AS fs
    ),
    -- ================================================
    -- join the input string to the tally table
    -- and add some rankings to identify the "islands"
    -- ================================================
    tokenizedInputString AS (
        SELECT *,
            delimitedGroupId =
                DENSE_RANK() OVER(
                    ORDER BY G, _delimitedGroupId
                )
        FROM (
            SELECT N, C, G,
                _delimitedGroupId =
                        N - ROW_NUMBER() OVER (
                            PARTITION BY G
                            ORDER BY N
                        )
            FROM (
                SELECT N,
                    SUBSTRING(input_string, N, 1) AS C,
                    CASE WHEN AD.delimiter IS NULL THEN 0 ELSE 1 END AS G
                FROM cteTally AS TALLY
                CROSS JOIN inputStringCS AS INSTR
                LEFT JOIN allowedDelimiters AS AD
                    ON delimiter = SUBSTRING(input_string, TALLY.N, 1)
                WHERE N <= LEN(input_string)
            ) AS in_str
        ) AS in_str
    ),
    -- ================================================
    -- Aggregate groups on format string
    -- ================================================
    groupedFormatString AS (
        SELECT *,
            delimitedGroupStart =
                MIN(subGroupStart) OVER(
                    PARTITION BY delimitedGroupId
                ),
            subGroupCount =
                COUNT(*) OVER(
                    PARTITION BY delimitedGroupId
                )
        FROM (
            SELECT delimitedGroupId,
                subGroupId,
                subGroupLen = COUNT(*),
                subGroupStart = MIN(N),
                C = MIN(C)
            FROM tokenizedFormatString
            GROUP BY delimitedGroupId, subGroupId
        ) AS src
    ),
    -- ================================================
    -- Aggregate groups on input string
    -- ================================================
    groupedInputString AS (
        SELECT delimitedGroupId,
            groupLen = COUNT(*),
            delimitedGroupStart = MIN(N),
            G = MIN(G)
        FROM tokenizedInputString
        GROUP BY delimitedGroupId
    ),
    -- ================================================
    -- Repeat ranking for format string
    -- ================================================
    rankedFormatString AS (
        SELECT *,
            delimitedGroupRank =
                DENSE_RANK() OVER (
                    ORDER BY delimitedGroupStart
                ) ,
            subGroupRank =
                DENSE_RANK() OVER (
                    PARTITION BY delimitedGroupId
                    ORDER BY subGroupStart
                )
        FROM groupedFormatString AS GFS
        LEFT JOIN allowedTokens AS AT
            ON REPLICATE(GFS.C,GFS.subGroupLen) = AT.code
    ),
    -- ================================================
    -- Repeat ranking for input string
    -- ================================================
    rankedInputString AS (
        SELECT *,
            delimitedGroupRank =
                DENSE_RANK() OVER (
                    ORDER BY delimitedGroupStart
                )
        FROM groupedInputString AS GFS
    ),
    -- ================================================
    -- Discover values in the input string
    -- ================================================
    valuedInputString AS (
        SELECT RFS.tokenGroup, dateValues.value
        FROM rankedInputString AS RIS
        INNER JOIN rankedFormatString AS RFS
            ON  RIS.delimitedGroupRank = RFS.delimitedGroupRank
        CROSS APPLY (
            SELECT
                CASE subGroupCount
                    WHEN 1 THEN
                        SUBSTRING(@date, RIS.delimitedGroupStart, RIS.groupLen)
                    ELSE
                        SUBSTRING(@date,
                                  RIS.delimitedGroupStart +
                                      RFS.subGroupStart -
                                      RFS.delimitedGroupStart,
                                  RFS.subGroupLen)
                END
        ) AS dateValues (value)
        WHERE tokenGroup IS NOT NULL
    ),
    -- ================================================
    -- Verify tokens
    -- This is just to ensure that only one value per group
    -- is specified in the format string, disallowing ambiguity
    -- ================================================
    verifiedTokens AS (
        SELECT tokenGroup, value,
            patternVerify = CONVERT(datetime,
                CASE COUNT(*) OVER (PARTITION BY tokenGroup)
                    WHEN 1 THEN '2000-01-01 00:00:00.000' -- valid format
                    ELSE '2000-5 :00.000' -- random invalid format
                END,
                121)
        FROM valuedInputString AS VIS
    ),
    -- ================================================
    -- Collapse all tokens in a single line
    -- ================================================
    pivotedTokens AS (
        SELECT year_value      = LEFT('2000',4 - LEN(ISNULL([YEAR],''))) + ISNULL([YEAR],''),
            month_value        = CASE
                                     WHEN ISNUMERIC([MONTH]) = 1
                                         THEN RIGHT('00' + ISNULL([MONTH],''),2)
                                     ELSE [MONTH]
                                 END,
            day_value          = RIGHT('00'   + ISNULL([DAY],''),2),
            hour_value         = RIGHT('00'   + ISNULL(
                                       CASE
                                           WHEN CAST([HOUR] AS int) < 12 AND LEFT([AMPM],1) = 'P'
                                               THEN CAST(CAST([HOUR] AS int) + 12 AS varchar(2))
                                           ELSE [HOUR]
                                       END
                                       ,''),2),
            minute_value       = RIGHT('00'   + ISNULL([MINUTE],''),2),
            second_value       = RIGHT('00'   + ISNULL([SECOND],''),2),
            millisecond_value  = RIGHT('000'  + ISNULL([MILLISECOND],''),3)
        FROM (
            SELECT [YEAR], [MONTH],  [DAY],
                   [HOUR], [MINUTE], [SECOND],
                   [MILLISECOND],    [AMPM]
            FROM verifiedTokens AS VT
            PIVOT (
                MIN(value)
                FOR tokenGroup IN (
                    [YEAR], [MONTH],  [DAY],
                    [HOUR], [MINUTE], [SECOND],
                    [MILLISECOND],    [AMPM]
                )
            ) AS p
        ) AS src
    )
    -- ================================================
    -- Build the output date
    -- ================================================
    SELECT parsedDate =
        CASE
            WHEN ISNUMERIC(month_value) = 1 THEN
                CONVERT(datetime,
                       year_value + '-' + month_value  + '-' + day_value    + ' ' +
                       hour_value + ':' + minute_value + ':' + second_value + '.' + millisecond_value,
                       121)
            ELSE
                CONVERT(datetime,
                       month_value + ' ' + day_value    + ' ' + year_value   + ' ' +
                       hour_value  + ':' + minute_value + ':' + second_value + ':' + millisecond_value,
                       109)
        END
    FROM pivotedTokens
)

I must admit that the code is very long, but I also hope it is readable and clear enough. In the end, it was hard to write, so I guess it must be at least non-trivial to read.

The above function can be used to parse a datetime value out of a string formatted in any way. Taking the same values used in the previous example, a typical call to this function would look like this:

SELECT *
FROM (
    VALUES ( '2011-10-26'             ,'YYYY-MM-DD'            ),
           ( '26/10/2011'             ,'DD/MM/YYYY'            ),
           ( 'October 26 2011'        ,'MMMM DD YYYY'          ),
           ( '2011-10-26 17:46:52'    ,'YYYY-MM-DD HH:mm:ss'   ),
           ( '2011-10-26 05:46:52 PM' ,'YYYY-MM-DD hh:mm:ss tt')
) AS testValues (input_date, format_string)
CROSS APPLY dbo.parseDate(input_date, format_string)

The statement produces this output:

input_date             format_string          parsedDate

---------------------- ---------------------- -----------------------

2011-10-26             YYYY-MM-DD             2011-10-26 00:00:00.000

26/10/2011             DD/MM/YYYY             2011-10-26 00:00:00.000

October 26 2011        MMMM DD YYYY           2011-10-26 00:00:00.000

2011-10-26 17:46:52    YYYY-MM-DD HH:mm:ss    2011-10-26 17:46:52.000

2011-10-26 05:46:52 PM YYYY-MM-DD hh:mm:ss tt 2011-10-26 17:46:52.000

Performance considerations

As usual, performance tests are quite an eye-opener and things don’t always work the way you expect. The following table shows the average elapsed times for each method described in this article, based on a 10.000 rows input over 100 repeated runs.

Method

Elapsed Time MS

CLR scalar function

60

CLR ITVF

142

Built-in format function

445

T-SQL Scalar UDF

463

T-SQL Islands ITVF

1378

T-SQL Recursive CTE ITVF

30166

Turns out that the CLR scalar function outperforms even the built-in format function, which relies on CLR assemblies too. This is probably due to the fact that the format function can do much more than my straight “toString” call: that “more” seems to have a non-negligible cost.

While seeing the recursive CTE at the bottom of the classification is not surprising, I would have expected the “islands” ITVF to perform better than it did, but I suspect that there is still room for a small improvement tweaking the code slightly.

Conclusions

While performing date/string conversions on the database side is not a best practice, there are real-world scenarios that require this conversion to take place. Seasoned database experts will often disagree with this, arguing with good reason that presentation belongs to the application layer. However, you cannot always assume that everything around you has been designed and implemented properly and you do not always have the power or strength to enforce good modularity and good interface design. Often that means you will need to convert dates and times to strings in some defined format before passing them up to the application: pretending that it doesn’t happen or that the DBA should magically find a way of not doing the conversion in T-SQL, is just ignoring the truth.

Hopefully, this article described in enough detail all the options that SQL Server offers for performing the conversion with built-in functions, plus some custom functions that can be coded using either the CLR runtime or plain T-SQL.

Rate

4.76 (33)

You rated this post out of 5. Change rating

Share

Share

Rate

4.76 (33)

You rated this post out of 5. Change rating