Converting file name MMDDYYYY.txt to a Date Field

  • Hello,

    Converting file name MMDDYYYY.txt to a Date Field but CONVERT or CAST do not seem to work perhaps because I also have a REPLACE?  Can someone help me understand how to do this if it is possible?  See sample code below and many thanks in advance!

    CREATE TABLE #t (getFileName varchar(100))
    INSERT INTO #T (getFileName) VALUES ('09142023.txt')

    SELECT REPLACE(getFileName, '.txt','') FROM #t
  • Put the date into YYYYMMDD format before casting and it should work.

    DECLARE @x VARCHAR(100) = '09142023';

    SELECT @x
    ,SomeDate = CAST (CONCAT (RIGHT(@x, 4), LEFT(@x, 2), SUBSTRING (@x, 3, 2)) AS DATE);

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • SELECT CAST(SUBSTRING((getFileName, 5, 4) + 
    LEFT(getFileName, 4) AS date) AS FileNameDate
    FROM #t

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Make the filename use the international ISO format YYYYMMDD instead of that quaint american date format and you'll have no trouble at all. 🙂

  • kaj wrote:

    Make the filename use the international ISO format YYYYMMDD instead of that quaint american date format and you'll have no trouble at all. 🙂

    +1, this has the added benefit of being sortable in Windows File Explorer.

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • ... but if you can't change the date format in the file name...

    This would also be a possible solution:

    DECLARE @x VARCHAR(100) = '09142023';

    SELECT @x as [@x]
    , CONVERT(date,STUFF(STUFF(@x,3,0,'/'),6,0,'/'),101) as SomeDate

    I use STUFF to insert a slash in the right places so that the string now conforms to the 101 style used by CONVERT.

  • Using the OP's original example:

    SELECT *
    , file_date = datefromparts(f.string_date % 10000, f.string_date / 1000000, f.string_date / 10000 % 100)
    FROM #t
    CROSS APPLY (VALUES (substring(getFileName, 1, charindex('.', getFileName, 1) - 1))) AS f(string_date);

    Assumptions: the filename is formatted as MMDDYYYY.txt.  Using CROSS APPLY to remove the extension it is then just math to parse out month, day and year.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Here's a quick question for you because we have to enforce such things where I work... do you want the conversion to fail if the date contains less than 8 characters?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Deleted.  Made a mistake and didn't check before I posted. 🙁

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • IF the MMDDYYYY formatted string date is guaranteed to always be 8 characters, the following works (most of the posted code doesn't consider the inclusion of the ".txt") according to the 'MMDDYYYY.txt' string format and, because it does not use CHARINDEX, it's more than twice as fast as all the other code that works against the OPs data so far (with the exception of Scott Pletcher's code, which I missed on the first pass).  It will produce an error if it's shorter and probably an error if it's longer but no guarantee there.  If you want errors to convert to NULLs, use TRY_CONVERT or TRY_CAST on systems that support it (2016+).

    DECLARE @BitBucket DATE;
    SET STATISTICS TIME ON;
    SELECT @BitBucket = CONVERT(DATE,SUBSTRING(getFileName,5,4)+SUBSTRING(getFileName,1,4)) --SWAPS MMDD and YYYY to make ISO date.
    FROM #T
    SET STATISTICS TIME OFF;
    GO 5

    Now, full disclosure says the time difference is only about 4.5 seconds faster on 10 million rows so ... who cares?

    Consider this... if you wrote everything on your server to run a little more than twice as fast, your server workload would run twice as fast.  The trouble is that few look at that big picture when they're writing their code and so their server is dying from the "SQL Death by a Million Cuts".

    If you'd like to test, here's some test code to build the 10 Million rows according to the spec the OP wrote.

    --=====================================================================================================================
    -- Presets (This creates 10 million rows in about 5 seconds on my laptop)
    --=====================================================================================================================
    --===== If the test table already exists, drop it to make reruns in SSMS easier.
    -- Using "old" method for pre-2012 folks.
    IF OBJECT_ID('TempDB.dbo.#T') IS NOT NULL
    DROP TABLE #T
    ;
    GO
    --===== Parameters to control the desired range of dates and and number of rows.
    DECLARE @LoDateLimit DATETIME = '2020' --Inclusive, same as 2020-01-01
    ,@HiDateLimit DATETIME = '2030' --Exclusive, same as 2030-01-01
    ,@RowCount INT = 10000000 --Currently, 10 million rows
    ;
    --===== Local constants to simplify formulas and increase performance.
    DECLARE @DayCount INT = DATEDIFF(dd,@LoDateLimit,@HiDateLimit)
    ;
    --=====================================================================================================================
    -- Create and populate the table (HEAP) on the fly using given parameters.
    --=====================================================================================================================
    SELECT TOP (@RowCount) --Needs to be hardocded for 2005 or less.
    getFileName = SUBSTRING(ca.ISODateTxt,5,8)+SUBSTRING(ca.ISODateTxt,1,4)+'.txt' --SWAPS YYYY and MMDD to make MMDDYYYY date.
    INTO #T
    FROM sys.all_columns ac1
    CROSS JOIN sys.all_columns ac2
    CROSS APPLY (VALUES (CONVERT(CHAR(8),DATEADD(dd,ABS(CHECKSUM(NEWID())%@DayCount),@LoDateLimit),112)))ca(ISODateTxt)
    ;
    --===== Display a sample of the contents of the Temp Table for verification.
    SELECT TOP (100) * FROM #T

    After finding an old post of mine from 2018, if you want to scrape off slightly more than an extra 100 ms (a suggestion on my part but written by Eirikur Eiriksson), the following will to the trick.  It's similar to a lot of the other code but uses DATEFROMPARTS with explicit conversions to INT.  Here's that bit of code...

    DECLARE @BitBucket DATE;
    SET STATISTICS TIME ON;
    SELECT @BitBucket = DATEFROMPARTS(
    CONVERT(INT,SUBSTRING(getFileName,5,4),0)
    ,CONVERT(INT,SUBSTRING(getFileName,1,2),0)
    ,CONVERT(INT,SUBSTRING(getFileName,3,2),0)
    )
    FROM #T;
    SET STATISTICS TIME OFF;
    GO 5

    And, finally, you'd think that Integer Math would be the fastest but the LEFT(8) or SUBSTRING(1,8)  in the CROSS APPLY slows it down by about 600 ms.

    DECLARE @BitBucket DATE;
    SET STATISTICS TIME ON;
    SELECT @BitBucket = DATEFROMPARTS(ca.IntDate%10000, ca.IntDate/1000000, ca.IntDate/10000%100)
    FROM dbo.TestIntDateTime t
    CROSS APPLY (VALUES (CONVERT(INT,SUBSTRING(t.getFileName,1,8))))ca(IntDate)
    SET STATISTICS TIME OFF;
    GO 5

    • This reply was modified 7 months, 2 weeks ago by  Jeff Moden. Reason: Missed Scott Pletcher's code and added credit to him there

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • For the love of whatever you hold dear, stop using “MMDDYYYY” formatting!

    It doesn't sort chronology as text. You should be using “YYYYMMDD”.

  • Jeff Moden wrote:

    IF the MMDDYYYY formatted string date is guaranteed to always be 8 characters, the following works (most of the posted code doesn't consider the inclusion of the ".txt") according to the 'MMDDYYYY.txt' string format and, because it does not use CHARINDEX, it's more than twice as fast as all the other code that works against the OPs data so far.  It will produce an error if it's shorter and probably an error if it's longer but no guarantee there.  If you want errors to convert to NULLs, use TRY_CONVERT or TRY_CAST on systems that support it (2016+).

    DECLARE @BitBucket DATE;
    SET STATISTICS TIME ON;
    SELECT @BitBucket = CONVERT(DATE,SUBSTRING(getFileName,5,4)+SUBSTRING(getFileName,1,4)) --SWAPS MMDD and YYYY to make ISO date.
    FROM #T
    SET STATISTICS TIME OFF;
    GO 5

    Now, full disclosure says the time difference is only about 4.5 seconds faster on 10 million rows so ... who cares?

    Consider this... if you wrote everything on your server to run a little more than twice as fast, your server workload would run twice as fast.  The trouble is that few look at that big picture when they're writing their code and so their server is dying from the "SQL Death by a Million Cuts".

    If you'd like to test, here's some test code to build the 10 Million rows according to the spec the OP wrote.

    --=====================================================================================================================
    -- Presets (This creates 10 million rows in about 5 seconds on my laptop)
    --=====================================================================================================================
    --===== If the test table already exists, drop it to make reruns in SSMS easier.
    -- Using "old" method for pre-2012 folks.
    IF OBJECT_ID('TempDB.dbo.#T') IS NOT NULL
    DROP TABLE #T
    ;
    GO
    --===== Parameters to control the desired range of dates and and number of rows.
    DECLARE @LoDateLimit DATETIME = '2020' --Inclusive, same as 2020-01-01
    ,@HiDateLimit DATETIME = '2030' --Exclusive, same as 2030-01-01
    ,@RowCount INT = 10000000 --Currently, 10 million rows
    ;
    --===== Local constants to simplify formulas and increase performance.
    DECLARE @DayCount INT = DATEDIFF(dd,@LoDateLimit,@HiDateLimit)
    ;
    --=====================================================================================================================
    -- Create and populate the table (HEAP) on the fly using given parameters.
    --=====================================================================================================================
    SELECT TOP (@RowCount) --Needs to be hardocded for 2005 or less.
    getFileName = SUBSTRING(ca.ISODateTxt,5,8)+SUBSTRING(ca.ISODateTxt,1,4)+'.txt' --SWAPS YYYY and MMDD to make MMDDYYYY date.
    INTO #T
    FROM sys.all_columns ac1
    CROSS JOIN sys.all_columns ac2
    CROSS APPLY (VALUES (CONVERT(CHAR(8),DATEADD(dd,ABS(CHECKSUM(NEWID())%@DayCount),@LoDateLimit),112)))ca(ISODateTxt)
    ;
    --===== Display a sample of the contents of the Temp Table for verification.
    SELECT TOP (100) * FROM #T

    After finding an old post of mine from 2018, if you want to scrape off slightly more than an extra 100 ms (a suggestion on my part but written by Eirikur Eiriksson), the following will to the trick.  It's similar to a lot of the other code but uses DATEFROMPARTS with explicit conversions to INT.  Here's that bit of code...

    DECLARE @BitBucket DATE;
    SET STATISTICS TIME ON;
    SELECT @BitBucket = DATEFROMPARTS(
    CONVERT(INT,SUBSTRING(getFileName,5,4),0)
    ,CONVERT(INT,SUBSTRING(getFileName,1,2),0)
    ,CONVERT(INT,SUBSTRING(getFileName,3,2),0)
    )
    FROM #T;
    SET STATISTICS TIME OFF;
    GO 5

    And, finally, you'd think that Integer Math would be the fastest but the LEFT(8) or SUBSTRING(1,8)  in the CROSS APPLY slows it down by about 600 ms.

    DECLARE @BitBucket DATE;
    SET STATISTICS TIME ON;
    SELECT @BitBucket = DATEFROMPARTS(ca.IntDate%10000, ca.IntDate/1000000, ca.IntDate/10000%100)
    FROM dbo.TestIntDateTime t
    CROSS APPLY (VALUES (CONVERT(INT,SUBSTRING(t.getFileName,1,8))))ca(IntDate)
    SET STATISTICS TIME OFF;
    GO 5

    The only difference I see between that code and mine is CONVERT rather than CAST ... I didn't realize CAST had that "big overhead".  (Presumably using LEFT rather than SUBSTRING couldn't have that much overhead.)

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • ScottPletcher wrote:

    The only difference I see between that code and mine is CONVERT rather than CAST ... I didn't realize CAST had that "big overhead".  (Presumably using LEFT rather than SUBSTRING couldn't have that much overhead.)

    Crud.  My sincere apologies, Scott.  I missed your code.  You're code is faster that the other code, as well.  I've updated my response to point that out.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Many thanks all for all the replies!

  • rjjh78 wrote:

    Many thanks all for all the replies!

      Thanks for the feedback.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply