Convert International Character-String Date Formats

  • We are receiving some international text file(s) with date(s) (and other fields) in languages other than english (dez, déc, etc). Does anyone know of a method to try and convert dates into a us-english format?

    Here is what was tried at the moment:

    DECLARE @Date varchar(20), @Date1 VARCHAR(20), @Date2 VARCHAR(20) 
    SET @Date = '25 déc 2020'
    SET @Date1 = '25 dez 2020'
    SET @Date2 = 'Dec 25 2020'

    SELECT @@LANGUAGE, @Date AS OrigValFR, @Date1 AS OrigValDE, @Date2 AS OrigValUS
    ,TRY_CONVERT(SMALLDATETIME, @Date) AS ConvertDateFR
    ,TRY_CONVERT(SMALLDATETIME, @Date1) AS ConvertDateDE
    ,TRY_CONVERT(SMALLDATETIME, @Date2) AS ConvertDateUS
    ,TRY_PARSE(@date AS SMALLDATETIME USING 'fr-FR') FrenchCult
    ,TRY_PARSE(@date AS SMALLDATETIME USING 'en-US') EnglishCult
    ,COALESCE(TRY_PARSE(@date AS SMALLDATETIME USING 'en-US'),TRY_PARSE(@date AS SMALLDATETIME USING 'fr-FR')) CoalesceCult


    SET LANGUAGE french --français;

    SELECT @@LANGUAGE, @Date AS OrigValFR, @Date1 AS OrigValDE, @Date2 AS OrigValUS
    ,TRY_CONVERT(SMALLDATETIME, @Date) AS ConvertDateFR
    ,TRY_CONVERT(SMALLDATETIME, @Date1) AS ConvertDateDE
    ,TRY_CONVERT(SMALLDATETIME, @Date2) AS ConvertDateUS

    SET LANGUAGE german;

    SELECT @@LANGUAGE, @Date AS OrigValFR, @Date1 AS OrigValDE, @Date2 AS OrigValUS
    ,TRY_CONVERT(SMALLDATETIME, @Date) AS ConvertDateFR
    ,TRY_CONVERT(SMALLDATETIME, @Date1) AS ConvertDateDE
    ,TRY_CONVERT(SMALLDATETIME, @Date2) AS ConvertDateUS

    SET LANGUAGE us_english;
    SELECT @@LANGUAGE;

    • This topic was modified 3 years, 2 months ago by  DB_Newbie2007.
    • This topic was modified 3 years, 2 months ago by  DB_Newbie2007.

    Argue for your limitations, and sure enough they're yours (Richard Bach, Illusions)

  • If possible - go back to the sender and have them send the data using ISO standard date format of YYYY-MM-DD or YYYYMMDD instead of locale specific format.

    Barring that - you are going to need to do what you are doing through multiple passes identifying each format and converting the data.  Then you must follow up with a process to validate the conversion is working for all possible inputs - and hope new inputs are not utilized.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Either you to do it by CASTing/CONVERTing one language at a time, or have them include a language code in the file.

    You could, though, use CASE so that you could try all conversions to a single output column.  I don't know why the French abbreviated date doesn't work.

    ;WITH cte_test_dates AS (
    SELECT * FROM ( VALUES(CAST('25 déc 2020' AS nvarchar(25))), ('25 décembre 2020'),
    ('25 dez 2020'), ('Dec 25 2020') ) dates(date)
    )
    SELECT
    date,
    COALESCE( TRY_PARSE(date AS date USING 'en-US'), TRY_PARSE(date AS date USING 'de-DE'),
    TRY_PARSE(date AS date USING 'fr-FR') /*, ...*/ ) AS date_parsed_raw,
    CONVERT(varchar(25), COALESCE(TRY_PARSE(date AS date USING 'en-US'), TRY_PARSE(date AS date USING 'de-DE'),
    TRY_PARSE(date AS date USING 'fr-FR') /*, ...*/ ), 106) AS date_parsed_to_eng
    FROM cte_test_dates

     

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • The reason '25 déc 2020' does not convert is because the abbreviation for December in French is: 'déc.'  You can confirm this using: Select format(cast('2020-12-25' As date), 'dd MMM yyyy', 'fr-FR')

    This goes back to the original problem...you can attempt to convert all the possible values being sent, but as soon as a new value or format is introduced you need to change/update your code.  TRY_PARSE will fail if the format is not a known format - or the abbreviation is not correct - or the specified language isn't included.

    The best option is to have the senders use an ISO format for dates and avoid the issue of having to convert the data.

     

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Or just make sure they send a string that is valid for conversion in the language they are using!

    If I write '25 dek 2020' in English, then that's my error, and no one would be expected to be able to fix that blatant an error.

     

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Thanks for all of the replies.

    Unfortunately, we have little control over the format of  the date field, since it is coming from a large international vendor's invoicing system. In the 13 years I have been doing this, I have never encountered this problem before... but we *do* have confidence that the date field will contain a date!

    I found an interesting article about doing conversions...

    http://blog.sqlpositive.com/2010/03/using-convert-with-collate-to-strip-accents-from-unicode-strings/

    so taking an earlier query from Scott, here is my modification:

    ;WITH cte_test_dates AS (
    SELECT CONVERT(VARCHAR(25),date) COLLATE Cyrillic_General_CI_AI AS date FROM ( VALUES(CAST('25 déc 2020' AS nvarchar(25))), ('25 décembre 2020'),
    ('25 dez 2020'), ('Dec 25 2020') ) dates(date)
    )


    SELECT
    date,
    COALESCE( TRY_PARSE(date AS date USING 'en-US'), TRY_PARSE(date AS date USING 'de-DE'),
    TRY_PARSE(date AS date USING 'fr-FR') /*, ...*/ ) AS date_parsed_raw,
    CONVERT(varchar(25), COALESCE(TRY_PARSE(date AS date USING 'en-US'), TRY_PARSE(date AS date USING 'de-DE'),
    TRY_PARSE(date AS date USING 'fr-FR') /*, ...*/ ), 106) AS date_parsed_eng
    FROM cte_test_dates

     

    Argue for your limitations, and sure enough they're yours (Richard Bach, Illusions)

  • Wow - a large international company that is not using ISO standards in their software or in their exported data?  That is truly concerning...

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • I once was getting horribly flawed data from FedEx (missing closing quotes, missing delimiters, etc.; genuinely not expected, given their rep).  They would NEVER fix the data.  We had to write our own code to fix the data (and it was a complex process).  That's just how some companies are.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Oh - I know that and have run into the same kind of data issues with other large organizations.  However - you would think an international corporation that must deal with data from all over the world would be aware of and utilize ISO standards, especially when dealing with dates.

    Either they are storing the values in a string column - or they are converting the data on output.  My guess is they store the external value provided and convert that to a UTC date and time and store that in an actual datetime/datetime2/datetimeoffset data type in their system (assuming SQL Server - could be a different platform).

    Either way - expecting your customers to convert data from a file provided isn't good customer service, however you look at it.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Do records with different locale settings arrive within the same file?

    most likely not. Different ERP systems could send dates in different formats, but those formats would be consistent within all the data feeds from the same source.

    if it’s the case you may create a lookup table which would allocate locales to the source systems, and then the import process would identify where the file has come from and use the corresponding locale settings for parsing the data.

    _____________
    Code for TallyGenerator

  • If each individual file is not language specific (i.e. dates within the file are mixture of languages) my inclination would be to extract all the DISTINCT month names, ideally in combination with something from elsewhere in the record which would confirm the language, and see what I got.

    My hope would be that the list was succinct, and there was nothing ambiguous (e.g. "Jan" referred to January in one language, but June in another ... 🙁 )

    If that list looked OK I would consider just doing a REPLACE based on a table of substitutions, and then I could treat all the date conversions as US-English

  • DB_Newbie2007 wrote:

    Does anyone know of a method to try and convert dates into a us-english format?

    Heh... I didn't until an hour or so ago. 🙂

    This should help a bit.  Don't let the size scare you.  It's mostly comments.  The cool part is that you don't need to list every language with TRY_PARSE, although that might be faster.  That also means that this code is "self-healing" in that if MS introduces a new language, the code will auto-magically pick up on it.

    It would be really cool if you attached a file with a couple of thousand dates so that we can test for performance.

    Note that the usage example is in the "Usage Example:" comments section in the code where it belongs.  Also, take time to read the comments.  This function relies on another that you're going to have to go get and install before you install this function.

    Also note that this forum "ate" two of the months in the usage example because this forum can't apparently handle unicode.

    Last but not least, this is yet another reason why I'm po'd at MS for not including elemental ordinals in the return of the STRING_SPLIT() function.  What the hell were they thinking when they wrote that??? 🙁

     CREATE FUNCTION dbo.TranslateMonthDate
    /**********************************************************************************************************************
    Purpose:
    Given a string date with a 1 or 2 digit day of the month, a 4 digit year, and a non-numeric abbreviated month name
    in a supported language according to the sys.sysLanguages table, do a translation and return the date as a DATE
    datatype.
    -----------------------------------------------------------------------------------------------------------------------
    Usage Examples:
    NOTE: Use an NVARCHAR column to hold non-English date strings because some languages require it.
    ALSO BE ADVISED THAT STRING DATES THAT CANNOT BE TRANSLATED WILL SILENTLY DROP ROWS IF YOU USE CROSS APPLY
    INSTEAD OF THE OUTER APPLY DEMONSTRATED BELOW.

    --===== Create a test table. This is NOT a part of the solution.
    -- We're just setting up some test data here.
    DROP TABLE IF EXISTS #TestTable;
    SELECT v.StringDate
    INTO #TestTable
    FROM (VALUES
    (N'8 ??? 2020') --This particular language requires NVARCHAR for the month so use NVARCHAR for ALL dates.
    ,(N'9 Ara 2020')
    ,(N'18 ?e? 2020') --This particular language requires NVARCHAR for the month so use NVARCHAR for ALL dates.
    ,(N'19 joulu 2020')
    ,(N'20 Dic 2020')
    ,(N'21 déc 2020')
    ,(N'22 dez 2020')
    ,(N'Dec 23 2020') --Note the order of month, day, and year don't matter
    ,(N'2020 Dec 24') --Note the order of month, day, and year don't matter
    ,(N'2020 25 Dec') --Note the order of month, day, and year don't matter
    )v(StringDate)
    ;
    --===== Demo the usage against the test table.
    SELECT OriginalDateString = tst.StringDate
    ,mo.TranslatedDate
    FROM #TestTable tst
    OUTER APPLY dbo.TranslateMonthDate(tst.StringDate) mo
    ;
    -----------------------------------------------------------------------------------------------------------------------
    Programmer Notes:
    1. This function and the embedded dbo.DelimitedSplitN4K function are high performance iTVFs (inline Table Valued
    Functions). Tell your DBA not to worry. This runs a whole lot faster than Scalar or mTFV (multi-statment Table
    Valued Functions). Use this function as if it were a parameterized view in the FROM clause.
    2. The dbo.DelimitedSplitN4K is available in the "Resources:" section at the bottom of the article at the following
    URL: https://www.sqlservercentral.com/articles/tally-oh-an-improved-sql-8k-%e2%80%9ccsv-splitter%e2%80%9d-function
    -----------------------------------------------------------------------------------------------------------------------
    Revision History:
    Rev 01 - 12 Feb 2021 - Jeff Moden
    - Initial Creation (Advisory: Not fully unit tested but logically sound. Use with your eyes open.)
    - Original Problem: https://www.sqlservercentral.com/forums/topic/convert-international-character-date-formats
    **********************************************************************************************************************/
    --===== Define the I/O for this function
    (@pDateString NVARCHAR(40))
    RETURNS TABLE AS --Cannot use WITH SCHEMABINDING because of the inherent "COLLATE CATALOG_DEFAULT"
    RETURN WITH cteFindMonth AS
    (--==== Find the part of the given date string that is not numeric.
    -- This should be the "month" unless from a country that uses Roman Numerals for all the date parts.
    SELECT SearchMonth = split.Item
    FROM dbo.DelimitedSplitN4K(@pDateString,' ') split
    WHERE split.Item LIKE '%[^0-9]%' --Reject numeric parts leaving on the Alpha Month
    )
    ,cteTranslator AS
    (--==== Do a conditional split from the language table to find the correct month.
    SELECT TOP 1 --The month names are unique by month number so we only need to find the first one.
    MonthAbbv = sm.Item
    ,MonthTranslated = CHOOSE(sm.ItemNumber,'Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec')
    FROM sys.syslanguages sl
    CROSS JOIN cteFindMonth mo
    CROSS APPLY dbo.DelimitedSplitN4K(sl.shortmonths,N',') sm
    WHERE CHARINDEX(mo.SearchMonth,sl.shortmonths) > 0 --Split only those strings that contain the SearchMonth
    AND mo.SearchMonth = sm.Item --Optimization to search only one set of short month names.
    )--==== Replace the foreign language month abbreviation with the English month abbreviation and convert to a DATE.
    SELECT TranslatedDate = CONVERT(DATE,REPLACE(@pDateString,MonthAbbv,MonthTranslated))
    FROM cteTranslator
    ;
    GO

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Kristen-173977 wrote:

    If each individual file is not language specific (i.e. dates within the file are mixture of languages) my inclination would be to extract all the DISTINCT month names, ideally in combination with something from elsewhere in the record which would confirm the language, and see what I got.

    My hope would be that the list was succinct, and there was nothing ambiguous (e.g. "Jan" referred to January in one language, but June in another ... 🙁 )

    If that list looked OK I would consider just doing a REPLACE based on a table of substitutions, and then I could treat all the date conversions as US-English

    Such a list exists in SQL Server in the sys.syslanguages system compatibility view, which has not yet been replaced.  And, you're correct... there's no ambiguity for the short month names.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    Such a list exists in SQL Server in the sys.syslanguages system compatibility view 

    Much better idea 🙂 That will also bring to light, and allow checking for, any month names, in the data, which don't conform to the System Month name List

    Does it need a check, for the future, that no new introductions are ambiguous? or, that the current Collation doesn't introduce a duplicate? (or perhaps use a Binary Collation)

    I didn't find any duplicates, but I don't know what the "sloppiest" collation is. I just tried Case / Accent insensitive. I did get duplicates using  a CAST to varchar - as you mentioned 🙂

  • Kristen-173977 wrote:

    Jeff Moden wrote:

    Such a list exists in SQL Server in the sys.syslanguages system compatibility view 

    Much better idea 🙂 That will also bring to light, and allow checking for, any month names, in the data, which don't conform to the System Month name List

    Does it need a check, for the future, that no new introductions are ambiguous? or, that the current Collation doesn't introduce a duplicate? (or perhaps use a Binary Collation)

    I didn't find any duplicates, but I don't know what the "sloppiest" collation is. I just tried Case / Accent insensitive. I did get duplicates using  a CAST to varchar - as you mentioned 🙂

    All great points, Kristen.

    When you say "duplicates" and to clarify for others reading this thread, I'm assuming that you mean that a given month name abbreviation would appear for more than one month number.  Like you, I checked for that in the sys.syslanguages compatibility view and found none but, I absolutely agree... although extremely unlikely, it's still possible that such a thing could happen and should probably be checked for after any upgrade to a new version of SQL Server.

    I also agree that Collation could easily end up being a problem that I had only partially considered when I wrote the code.  I think that binary collation might be a bit of a problem because people using, for example, the English language, can spell the abbreviation for December as "Dec", "DEC", or "dec".  You might have a perfectly good date in the date string that is not automatically convertible because of the default collation that was setup on the server during installation.

    I believe that the same collation problem will also exist for things like TRY_PARSE.

    There are three penultimate problems here...

    1. The various language-handling components in SQL Server can change, as they have in the past especially considering collation, and must be checked for during version changes at the very least.

    2. As you say, Collation IS a problem that must be considered and the code may need to have Collation hints added depending on the instance of the SQL Server.  It's such an important problem that SQL Server won't even allow WITH SCHEMABINDING in the function I wrote because of Collation.  Instead, it throws the following error if you try to use WITH SCHEMABINDING...

    Msg 12842, Level 16, State 4, Procedure TranslateMonthDate, Line 53 [Batch Start Line 0]

    The COLLATE CATALOG_DEFAULT clause cannot be used in a constraint, computed column, index filter expression, or any schema-bound object.

    3. As with any data even from known sources, any code that makes a translation of date strings needs to have checks built in.  For example, if the server is setup with a Collation that uses case sensitivity, the system must check for NULLs in the translated date column and maybe even provide an "errata listing" that is stored in a table for further action.  That was a large part of the reason why I specifically brought attention to using the function with OUTER APPLY instead of the usual CROSS APPLY in the documentation in the code... it helps provide NULLs where translations didn't occur.

    4. You really have to know the tools that you're working with.  As Jeffrey Williams stated in the first reply on this thread, the original problem could be avoided for dates if such data transmissions followed ISO standards for dates and times.  Even that's a problem in SQL Server, though. As Jeffrey points out, the two ISO standards for dates are YYYYMMDD and YYYY-MM-DD except that SQL Server silently uses the undocumented YYYY-DD-MM format instead of the latter format for many languages including but not limited to the French language.

    The ultimate bottom line is that you have to check the quality of any and all imported data, especially character based data and especially if it's multilingual in nature.  Shoot... I can't even get my car dealership to accurately quote the "out the door" price on lease buyout I'm trying to arrange via email.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply