Convert string MMDDYYYY to a date

  • First I will say I've found a couple different SQL solutions for this, I'm asking if there is a better way or what would be the most efficient way of doing this.  I'm also making an assumption that either of these ways would be better than using substring to parse each piece.
    My head is a little foggy today, could be the holidays coming up or this cold I got from the grand kids.

    The issue is I will be receiving files from an outside source, the file will be a pipe delimited text file.  The dates on this file will be coming in with the format of MMDDYYYY.  This does not convert in SSIS directly to a date, it will fail converting or casting directly as a date.

    Here is the 2 ways I found to convert this data to a date:
    DECLARE @dt AS CHAR(8);

    SET @dt = '07232018'

    --SELECT @dt, CAST(@Dt AS Date)  -- this fails
    SELECT @dt, CAST(STUFF(STUFF(@Dt, 3, 0, '/'), 6, 0, '/') AS DATE)  -- Sean Lange solution from other post
    SELECT @dt, CAST(RIGHT(RTRIM(@Dt), 4) + LEFT(LTRIM(@Dt), 4) AS DATE)  -- stole from another site

    I don't want to do this conversion in the SSIS package, unless someone can prove it would be the best solution.

    I will import this data from the text files into a 'work' table, each column defined as varchar(500), very little chance of failure on load.  Then I was going to use SQL to 'Transform' the data from the 'Work' table before inserting into the final tables.

    Of course my initial thought is to ask that they send the dates in a different format, but I don't know how open they will be to that.  Right now we are talking a few hundred rows per import, I wouldn't expect this to be more than a couple thousand per import, but you never now.  But I could be bringing in 60 to 70 files per import, I'm not sure, but I'm assuming I'll need to convert at least one date field on each.

    I was going to jump on this 5 year old post, but thought I would create a new one. https://www.sqlservercentral.com/Forums/Topic1437922-149-1.aspx

    Thanks in advance for your help.

    -------------------------------------------------------------
    we travel not to escape life but for life not to escape us
    Don't fear failure, fear regret.

  • below86 - Thursday, December 20, 2018 2:47 PM

    First I will say I've found a couple different SQL solutions for this, I'm asking if there is a better way or what would be the most efficient way of doing this.  I'm also making an assumption that either of these ways would be better than using substring to parse each piece.
    My head is a little foggy today, could be the holidays coming up or this cold I got from the grand kids.

    The issue is I will be receiving files from an outside source, the file will be a pipe delimited text file.  The dates on this file will be coming in with the format of MMDDYYYY.  This does not convert in SSIS directly to a date, it will fail converting or casting directly as a date.

    Here is the 2 ways I found to convert this data to a date:
    DECLARE @dt AS CHAR(8);

    SET @dt = '07232018'

    --SELECT @dt, CAST(@Dt AS Date)  -- this fails
    SELECT @dt, CAST(STUFF(STUFF(@Dt, 3, 0, '/'), 6, 0, '/') AS DATE)  -- Sean Lange solution from other post
    SELECT @dt, CAST(RIGHT(RTRIM(@Dt), 4) + LEFT(LTRIM(@Dt), 4) AS DATE)  -- stole from another site

    I don't want to do this conversion in the SSIS package, unless someone can prove it would be the best solution.

    I will import this data from the text files into a 'work' table, each column defined as varchar(500), very little chance of failure on load.  Then I was going to use SQL to 'Transform' the data from the 'Work' table before inserting into the final tables.

    Of course my initial thought is to ask that they send the dates in a different format, but I don't know how open they will be to that.  Right now we are talking a few hundred rows per import, I wouldn't expect this to be more than a couple thousand per import, but you never now.  But I could be bringing in 60 to 70 files per import, I'm not sure, but I'm assuming I'll need to convert at least one date field on each.

    I was going to jump on this 5 year old post, but thought I would create a new one. https://www.sqlservercentral.com/Forums/Topic1437922-149-1.aspx

    Thanks in advance for your help.

    if the input dates are always 8 digits long (including leading zero) and on that format then I would use
    SELECT @dt, convert(date, RIGHT(@Dt, 4) + LEFT(@Dt, 4), 112)

  • Since you're using SQL Server 2012, DATEFROMPARTS with substrings to isolate the parts might be the fastest.

    Of course my initial thought is to ask that they send the dates in a different format, but I don't know how open they will be to that.


    The answer is always "No" unless you ask.  If they do say "No", then explain to them how it would help them seem much more intelligent to potential new customers because they'd be using the coveted ISO 8601 format that is recognized world wide. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • If you are using a work table for the import, why not just add a calculated column for the conversion?
    😎


    USE TEEST;
    GO
    SET NOCOUNT ON;

    DECLARE @dt AS VARCHAR(500) = '07232018';

    DECLARE @WORKTABLE TABLE
    (
      STR_DATE VARCHAR(500) NOT NULL
     ,CONV_DATE AS (CONVERT(DATE,CONCAT(RIGHT(STR_DATE,4),LEFT(STR_DATE,4)),112)) PERSISTED
    );

    INSERT INTO @WORKTABLE(STR_DATE) VALUES (@Dt);
    SELECT
      WT.STR_DATE
     ,WT.CONV_DATE
    FROM @WORKTABLE WT;

    Suggest that you get the format changed, which is a much better solution, recommend the ISO YYYYMMDD format.

  • Jeff Moden - Thursday, December 20, 2018 10:36 PM

    Since you're using SQL Server 2012, DATEFROMPARTS with substrings to isolate the parts might be the fastest.

    Spot on Jeff, DATEFROMPARTS is the fastest of the methods I've tested.
    😎

    Simple test harness

    USE TEEST;
    GO
    SET NOCOUNT ON;
    --/*
    DECLARE @SAMPLE_SIZE INT = 1000000;

    IF OBJECT_ID(N'dbo.TBL_NON_ISO_DATE_CONVERTION') IS NOT NULL DROP TABLE dbo.TBL_NON_ISO_DATE_CONVERTION;
    CREATE TABLE dbo.TBL_NON_ISO_DATE_CONVERTION
    (
      TTDC_ID INT IDENTITY(1,1) NOT NULL CONSTRAINT PK_DBO_TBL_NON_ISO_DATE_CONVERTION_TTDC_ID PRIMARY KEY CLUSTERED
     ,TTDC_CHAR CHAR(8) NOT NULL
    );

    ;WITH T(N) AS (SELECT N FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) AS X(N))
    , NUMS(N) AS (SELECT TOP(@SAMPLE_SIZE) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N FROM T T1,T T2,T T3,T T4,T T5,T T6,T T7,T T8,T T9)
    ,DATE_STRINGS AS
    (
      SELECT
       CONVERT(VARCHAR(8),DATEADD(DAY,ABS(CHECKSUM(NEWID())) % 100000,0),112) AS STR_DATE
      FROM  NUMS NM
    )
    INSERT INTO dbo.TBL_NON_ISO_DATE_CONVERTION(TTDC_CHAR)
    SELECT
      CONCAT(RIGHT(DS.STR_DATE,4),LEFT(DS.STR_DATE,4))
    FROM DATE_STRINGS DS
    -- */

    DECLARE @TIMER TABLE (T_TEXT VARCHAR(30) NOT NULL, T_TS DATETIME2(7) NOT NULL DEFAULT(SYSDATETIME()));
    DECLARE @INT_BUCKET  INT  = 0;
    DECLARE @TINYINT_BUCKET TINYINT = 0;
    DECLARE @DATE_BUCKET  DATE  = CONVERT(DATE,GETDATE(),0);
    DECLARE @CHAR_BUCKET  CHAR(8) = '';

    INSERT INTO @TIMER(T_TEXT) VALUES('DRY RUN');
    SELECT
      @CHAR_BUCKET  = TC.TTDC_CHAR
    FROM dbo.TBL_NON_ISO_DATE_CONVERTION TC;
    INSERT INTO @TIMER(T_TEXT) VALUES('DRY RUN');

    INSERT INTO @TIMER(T_TEXT) VALUES('CAST STUFF STUFF');
    SELECT
      @DATE_BUCKET  = CAST(STUFF(STUFF(TC.TTDC_CHAR, 3, 0, '/'), 6, 0, '/') AS DATE)
    FROM dbo.TBL_NON_ISO_DATE_CONVERTION TC;
    INSERT INTO @TIMER(T_TEXT) VALUES('CAST STUFF STUFF');

    INSERT INTO @TIMER(T_TEXT) VALUES('CAST RIGHT LEFT TRIM');
    SELECT
      @DATE_BUCKET  = CAST(RIGHT(RTRIM(TC.TTDC_CHAR), 4) + LEFT(LTRIM(TC.TTDC_CHAR), 4) AS DATE)
    FROM dbo.TBL_NON_ISO_DATE_CONVERTION TC;
    INSERT INTO @TIMER(T_TEXT) VALUES('CAST RIGHT LEFT TRIM');

    INSERT INTO @TIMER(T_TEXT) VALUES('CONCAT RIGHT LEFT');
    SELECT
      @DATE_BUCKET  = CONVERT(DATE,CONCAT(RIGHT(TC.TTDC_CHAR,4),LEFT(TC.TTDC_CHAR,4)),112)
    FROM dbo.TBL_NON_ISO_DATE_CONVERTION TC;
    INSERT INTO @TIMER(T_TEXT) VALUES('CONCAT RIGHT LEFT');

    INSERT INTO @TIMER(T_TEXT) VALUES('DATEFROMPARTS');
    SELECT
      @DATE_BUCKET  = DATEFROMPARTS(
       CONVERT(INT,SUBSTRING(TC.TTDC_CHAR,5,4),0)
       ,CONVERT(INT,SUBSTRING(TC.TTDC_CHAR,1,2),0)
       ,CONVERT(INT,SUBSTRING(TC.TTDC_CHAR,3,2),0)
     )
    FROM dbo.TBL_NON_ISO_DATE_CONVERTION TC;
    INSERT INTO @TIMER(T_TEXT) VALUES('DATEFROMPARTS');

    SELECT
      T.T_TEXT
     ,DATEDIFF(MICROSECOND,MIN(T.T_TS),MAX(T.T_TS)) AS DURATION
    FROM @TIMER T
    GROUP BY T.T_TEXT
    ORDER BY DURATION;

    Results

    T_TEXT                DURATION
    --------------------- ----------
    DATEFROMPARTS             873144
    CONCAT RIGHT LEFT        1170835
    CAST RIGHT LEFT TRIM     1204396
    CAST STUFF STUFF         1224418
    DRY RUN                  4139857

  • Thanks Jeff and Eirikur.  Learned something new today, and I made a bad assumption that the substring would be the slowest solution.
    I have sent off an email asking that they change the date formats to the ISO 8601 format YYYYMMDD.  Cross our fingers that they will be able to change it.

    Ugh, I just saw they have at least one 'date' where they have created 2 separate fields, one for the date, one for the time.

    -------------------------------------------------------------
    we travel not to escape life but for life not to escape us
    Don't fear failure, fear regret.

  • Eirikur Eiriksson - Friday, December 21, 2018 2:29 AM

    Jeff Moden - Thursday, December 20, 2018 10:36 PM

    Since you're using SQL Server 2012, DATEFROMPARTS with substrings to isolate the parts might be the fastest.

    Spot on Jeff, DATEFROMPARTS is the fastest of the methods I've tested.
    😎

    Thanks for the test harness and running the test, Eirikur.  I'm frequently on machines that are 2008 or less.

    Heh.. and from your test, it's not only faster but it's significantly faster.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • timings is going to depend hugely on the spec.
    on my laptop (I7-8550-U 1.8GHz - 16 GB ram) with SSD

    times are as follows
    T_TEXT                  DURATION
    DRY RUN                    73741
    DATEFROMPARTS             243135
    CONCAT RIGHT LEFT         266823
    CAST RIGHT LEFT TRIM      291183
    CAST STUFF STUFF          309771

  • frederico_fonseca - Friday, December 21, 2018 8:25 AM

    timings is going to depend hugely on the spec.
    on my laptop (I7-8550-U 1.8GHz - 16 GB ram) with SSD

    times are as follows
    T_TEXT                  DURATION
    DRY RUN                    73741
    DATEFROMPARTS             243135
    CONCAT RIGHT LEFT         266823
    CAST RIGHT LEFT TRIM      291183
    CAST STUFF STUFF          309771

    Agreed but DATEFROMPARTS still wins even there.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • The first time I ran Eirikur's code the 'Cast Right Left Trim' was the fastest, I didn't get a screen shot of that.
    I've ran it about 6 times now, all others have the Datefromparts as the fastest.  But the order of the others changes.
    Here is my last run:
    T_TEXT         DURATION
    ------------------------------ -----------
    DRY RUN        229015
    DATEFROMPARTS      378032
    CONCAT RIGHT LEFT     457026
    CAST RIGHT LEFT TRIM    482045
    CAST STUFF STUFF     961075

    Here is a screen shot of a prior run:

    -------------------------------------------------------------
    we travel not to escape life but for life not to escape us
    Don't fear failure, fear regret.

  • below86 - Thursday, December 20, 2018 2:47 PM

    First I will say I've found a couple different SQL solutions for this, I'm asking if there is a better way or what would be the most efficient way of doing this.  I'm also making an assumption that either of these ways would be better than using substring to parse each piece.
    My head is a little foggy today, could be the holidays coming up or this cold I got from the grand kids.

    The issue is I will be receiving files from an outside source, the file will be a pipe delimited text file.  The dates on this file will be coming in with the format of MMDDYYYY.  This does not convert in SSIS directly to a date, it will fail converting or casting directly as a date.

    Here is the 2 ways I found to convert this data to a date:
    DECLARE @dt AS CHAR(8);

    SET @dt = '07232018'

    --SELECT @dt, CAST(@Dt AS Date)  -- this fails
    SELECT @dt, CAST(STUFF(STUFF(@Dt, 3, 0, '/'), 6, 0, '/') AS DATE)  -- Sean Lange solution from other post
    SELECT @dt, CAST(RIGHT(RTRIM(@Dt), 4) + LEFT(LTRIM(@Dt), 4) AS DATE)  -- stole from another site

    I don't want to do this conversion in the SSIS package, unless someone can prove it would be the best solution.

    I will import this data from the text files into a 'work' table, each column defined as varchar(500), very little chance of failure on load.  Then I was going to use SQL to 'Transform' the data from the 'Work' table before inserting into the final tables.

    Of course my initial thought is to ask that they send the dates in a different format, but I don't know how open they will be to that.  Right now we are talking a few hundred rows per import, I wouldn't expect this to be more than a couple thousand per import, but you never now.  But I could be bringing in 60 to 70 files per import, I'm not sure, but I'm assuming I'll need to convert at least one date field on each.

    I was going to jump on this 5 year old post, but thought I would create a new one. https://www.sqlservercentral.com/Forums/Topic1437922-149-1.aspx

    Thanks in advance for your help.

    I am not sure why you wouldn't do this in SSIS - it can be done using a derived column transformation quite easily and insures that the data is in a standard format that SQL will process as a date automatically.

    In the connection manager for that field - define it as a string with 10 characters (then end result you will want).  Add the derived column transformation and add this date column - the expression will be set to replace the existing column (for example - Replace 'EffectiveDate') - the expression would be: SUBSTRING(EffectiveDate,1,4) + "-" + SUBSTRING(EffectiveDate,5,2) + "-" + SUBSTRING(EffectiveDate,7,2)

    The destination table could then be defined as a date or a string if you are getting bad data.  If the date fields can be NULL then you would need to expand the expression to check for blank and return a NULL or the substrings with a conditional check.

    Jeffrey Williams
    Problems are opportunities brilliantly disguised as insurmountable obstacles.

    How to post questions to get better answers faster
    Managing Transaction Logs

  • You're already importing the source text files using SSIS anyhow, so I agree with others that you should do this transformation using a "derived column" or "import column transformation" task. That way it's a simple Extract->Transform->Load.

    There is no need for intermediate staging table step just to work around the client's formatting choices, which would make it Extract->Load->Transform->Extract->Load.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Jeffrey Williams 3188 - Friday, December 21, 2018 10:36 AM

    below86 - Thursday, December 20, 2018 2:47 PM

    First I will say I've found a couple different SQL solutions for this, I'm asking if there is a better way or what would be the most efficient way of doing this.  I'm also making an assumption that either of these ways would be better than using substring to parse each piece.
    My head is a little foggy today, could be the holidays coming up or this cold I got from the grand kids.

    The issue is I will be receiving files from an outside source, the file will be a pipe delimited text file.  The dates on this file will be coming in with the format of MMDDYYYY.  This does not convert in SSIS directly to a date, it will fail converting or casting directly as a date.

    Here is the 2 ways I found to convert this data to a date:
    DECLARE @dt AS CHAR(8);

    SET @dt = '07232018'

    --SELECT @dt, CAST(@Dt AS Date)  -- this fails
    SELECT @dt, CAST(STUFF(STUFF(@Dt, 3, 0, '/'), 6, 0, '/') AS DATE)  -- Sean Lange solution from other post
    SELECT @dt, CAST(RIGHT(RTRIM(@Dt), 4) + LEFT(LTRIM(@Dt), 4) AS DATE)  -- stole from another site

    I don't want to do this conversion in the SSIS package, unless someone can prove it would be the best solution.

    I will import this data from the text files into a 'work' table, each column defined as varchar(500), very little chance of failure on load.  Then I was going to use SQL to 'Transform' the data from the 'Work' table before inserting into the final tables.

    Of course my initial thought is to ask that they send the dates in a different format, but I don't know how open they will be to that.  Right now we are talking a few hundred rows per import, I wouldn't expect this to be more than a couple thousand per import, but you never now.  But I could be bringing in 60 to 70 files per import, I'm not sure, but I'm assuming I'll need to convert at least one date field on each.

    I was going to jump on this 5 year old post, but thought I would create a new one. https://www.sqlservercentral.com/Forums/Topic1437922-149-1.aspx

    Thanks in advance for your help.

    I am not sure why you wouldn't do this in SSIS - it can be done using a derived column transformation quite easily and insures that the data is in a standard format that SQL will process as a date automatically.

    In the connection manager for that field - define it as a string with 10 characters (then end result you will want).  Add the derived column transformation and add this date column - the expression will be set to replace the existing column (for example - Replace 'EffectiveDate') - the expression would be: SUBSTRING(EffectiveDate,1,4) + "-" + SUBSTRING(EffectiveDate,5,2) + "-" + SUBSTRING(EffectiveDate,7,2)

    The destination table could then be defined as a date or a string if you are getting bad data.  If the date fields can be NULL then you would need to expand the expression to check for blank and return a NULL or the substrings with a conditional check.

    Of course, there are those of us that don't use SSIS for any of this. 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • This was removed by the editor as SPAM

  • Jeff Moden - Friday, December 21, 2018 2:06 PM

    Of course, there are those of us that don't use SSIS for any of this. 😀

    Sounds like heaven Jeff. 🙂  Got any openings?  Maybe I could just stop by for a day, or two, maybe a week to try and soak up some of that knowledge. 🙂  I don't mind using SSIS for Extract and Load, but using the tools in SSIS to do the Transform just isn't something I want to do.  Now I'll setup an 'Execute SQL task' and put all the SQL in it to do any transformation.
    Many years ago, at my prior job, we used a different ETL tool, we had a lot of Transformation built into the tool.. When we switched to SSIS it was a huge pain to get all of that put into SSIS.  Then several years later we had to upgrade to a newer version of SSIS and not all of the transformation converted over, another huge pain.  I guess that's why I'm hesitant to build any transformation in SSIS.  We are using 2008 SSIS now, there has been a project that's sitting out there for over 2 years for us to move to 2016 SSIS.  Who knows when that will happen and what pains will come with that.

    Thanks to everyone for taking the time to read and answer my question.

    -------------------------------------------------------------
    we travel not to escape life but for life not to escape us
    Don't fear failure, fear regret.

Viewing 15 posts - 1 through 15 (of 25 total)

You must be logged in to reply to this topic. Login to reply