December 20, 2018 at 2:47 pm
First I will say I've found a couple different SQL solutions for this, I'm asking if there is a better way or what would be the most efficient way of doing this.  I'm also making an assumption that either of these ways would be better than using substring to parse each piece.
My head is a little foggy today, could be the holidays coming up or this cold I got from the grand kids.
The issue is I will be receiving files from an outside source, the file will be a pipe delimited text file. The dates on this file will be coming in with the format of MMDDYYYY. This does not convert in SSIS directly to a date, it will fail converting or casting directly as a date.
Here is the 2 ways I found to convert this data to a date:DECLARE @Dt AS CHAR(8);
SET @Dt = '07232018'
--SELECT @Dt, CAST(@Dt AS Date)  -- this fails
SELECT @Dt, CAST(STUFF(STUFF(@Dt, 3, 0, '/'), 6, 0, '/') AS DATE)  -- Sean Lange solution from other post
SELECT @Dt, CAST(RIGHT(RTRIM(@Dt), 4) + LEFT(LTRIM(@Dt), 4) AS DATE)  -- stole from another site
I don't want to do this conversion in the SSIS package, unless someone can prove it would be the best solution.
I will import this data from the text files into a 'work' table, each column defined as varchar(500), very little chance of failure on load. Then I was going to use SQL to 'Transform' the data from the 'Work' table before inserting into the final tables.
Of course my initial thought is to ask that they send the dates in a different format, but I don't know how open they will be to that. Right now we are talking a few hundred rows per import, I wouldn't expect this to be more than a couple thousand per import, but you never now. But I could be bringing in 60 to 70 files per import, I'm not sure, but I'm assuming I'll need to convert at least one date field on each.
I was going to jump on this 5 year old post, but thought I would create a new one. https://www.sqlservercentral.com/Forums/Topic1437922-149-1.aspx
Thanks in advance for your help.
-------------------------------------------------------------
we travel not to escape life but for life not to escape us
Don't fear failure, fear regret.
December 20, 2018 at 4:34 pm
below86 - Thursday, December 20, 2018 2:47 PMFirst I will say I've found a couple different SQL solutions for this, I'm asking if there is a better way or what would be the most efficient way of doing this. I'm also making an assumption that either of these ways would be better than using substring to parse each piece.
My head is a little foggy today, could be the holidays coming up or this cold I got from the grand kids.The issue is I will be receiving files from an outside source, the file will be a pipe delimited text file. The dates on this file will be coming in with the format of MMDDYYYY. This does not convert in SSIS directly to a date, it will fail converting or casting directly as a date.
Here is the 2 ways I found to convert this data to a date:
DECLARE @Dt AS CHAR(8);SET @Dt = '07232018'
--SELECT @Dt, CAST(@Dt AS Date) -- this fails
SELECT @Dt, CAST(STUFF(STUFF(@Dt, 3, 0, '/'), 6, 0, '/') AS DATE) -- Sean Lange solution from other post
SELECT @Dt, CAST(RIGHT(RTRIM(@Dt), 4) + LEFT(LTRIM(@Dt), 4) AS DATE) -- stole from another siteI don't want to do this conversion in the SSIS package, unless someone can prove it would be the best solution.
I will import this data from the text files into a 'work' table, each column defined as varchar(500), very little chance of failure on load. Then I was going to use SQL to 'Transform' the data from the 'Work' table before inserting into the final tables.
Of course my initial thought is to ask that they send the dates in a different format, but I don't know how open they will be to that. Right now we are talking a few hundred rows per import, I wouldn't expect this to be more than a couple thousand per import, but you never now. But I could be bringing in 60 to 70 files per import, I'm not sure, but I'm assuming I'll need to convert at least one date field on each.
I was going to jump on this 5 year old post, but thought I would create a new one. https://www.sqlservercentral.com/Forums/Topic1437922-149-1.aspx
Thanks in advance for your help.
if the input dates are always 8 digits long (including leading zero) and on that format then I would use
SELECT  @Dt, convert(date, RIGHT(@Dt, 4) + LEFT(@Dt, 4), 112)
December 20, 2018 at 10:36 pm
Since you're using SQL Server 2012, DATEFROMPARTS with substrings to isolate the parts might be the fastest.
Of course my initial thought is to ask that they send the dates in a different format, but I don't know how open they will be to that.
The answer is always "No" unless you ask.  If they do say "No", then explain to them how it would help them seem much more intelligent to potential new customers because they'd be using the coveted ISO 8601 format that is recognized world wide. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
December 20, 2018 at 11:22 pm
If you are using a work table for the import, why not just add a calculated column for the conversion?
😎
USE TEEST;
GO
SET NOCOUNT ON;
DECLARE @Dt AS VARCHAR(500) = '07232018';
DECLARE @WORKTABLE TABLE
(
  STR_DATE VARCHAR(500) NOT NULL
 ,CONV_DATE AS (CONVERT(DATE,CONCAT(RIGHT(STR_DATE,4),LEFT(STR_DATE,4)),112)) PERSISTED
);
INSERT INTO @WORKTABLE(STR_DATE) VALUES (@Dt);
SELECT
  WT.STR_DATE
 ,WT.CONV_DATE
FROM @WORKTABLE WT;
Suggest that you get the format changed, which is a much better solution, recommend the ISO YYYYMMDD format.
December 21, 2018 at 2:29 am
Spot on Jeff, DATEFROMPARTS is the fastest of the methods I've tested.
😎
Simple test harness
USE TEEST;
GO
SET NOCOUNT ON;
 --/*
DECLARE @SAMPLE_SIZE INT = 1000000;
IF OBJECT_ID(N'dbo.TBL_NON_ISO_DATE_CONVERTION') IS NOT NULL DROP TABLE dbo.TBL_NON_ISO_DATE_CONVERTION;
CREATE TABLE dbo.TBL_NON_ISO_DATE_CONVERTION
(
  TTDC_ID INT IDENTITY(1,1) NOT NULL CONSTRAINT PK_DBO_TBL_NON_ISO_DATE_CONVERTION_TTDC_ID PRIMARY KEY CLUSTERED
 ,TTDC_CHAR CHAR(8) NOT NULL
);
;WITH T(N) AS (SELECT N FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) AS X(N))
, NUMS(N) AS (SELECT TOP(@SAMPLE_SIZE) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N FROM T T1,T T2,T T3,T T4,T T5,T T6,T T7,T T8,T T9)
,DATE_STRINGS AS 
(
  SELECT 
   CONVERT(VARCHAR(8),DATEADD(DAY,ABS(CHECKSUM(NEWID())) % 100000,0),112) AS STR_DATE
  FROM  NUMS NM
)
 INSERT INTO dbo.TBL_NON_ISO_DATE_CONVERTION(TTDC_CHAR)
 SELECT
  CONCAT(RIGHT(DS.STR_DATE,4),LEFT(DS.STR_DATE,4))
 FROM DATE_STRINGS DS
-- */
DECLARE @TIMER TABLE (T_TEXT VARCHAR(30) NOT NULL, T_TS DATETIME2(7) NOT NULL DEFAULT(SYSDATETIME()));
DECLARE @INT_BUCKET  INT  = 0;
DECLARE @TINYINT_BUCKET TINYINT = 0;
DECLARE @DATE_BUCKET  DATE  = CONVERT(DATE,GETDATE(),0);
DECLARE @CHAR_BUCKET  CHAR(8) = '';
INSERT INTO @TIMER(T_TEXT) VALUES('DRY RUN');
SELECT
  @CHAR_BUCKET  = TC.TTDC_CHAR
FROM dbo.TBL_NON_ISO_DATE_CONVERTION TC;
INSERT INTO @TIMER(T_TEXT) VALUES('DRY RUN');
INSERT INTO @TIMER(T_TEXT) VALUES('CAST STUFF STUFF');
SELECT
  @DATE_BUCKET  = CAST(STUFF(STUFF(TC.TTDC_CHAR, 3, 0, '/'), 6, 0, '/') AS DATE) 
FROM dbo.TBL_NON_ISO_DATE_CONVERTION TC;
INSERT INTO @TIMER(T_TEXT) VALUES('CAST STUFF STUFF');
INSERT INTO @TIMER(T_TEXT) VALUES('CAST RIGHT LEFT TRIM');
SELECT
  @DATE_BUCKET  = CAST(RIGHT(RTRIM(TC.TTDC_CHAR), 4) + LEFT(LTRIM(TC.TTDC_CHAR), 4) AS DATE) 
FROM dbo.TBL_NON_ISO_DATE_CONVERTION TC;
INSERT INTO @TIMER(T_TEXT) VALUES('CAST RIGHT LEFT TRIM');
INSERT INTO @TIMER(T_TEXT) VALUES('CONCAT RIGHT LEFT');
SELECT
  @DATE_BUCKET  = CONVERT(DATE,CONCAT(RIGHT(TC.TTDC_CHAR,4),LEFT(TC.TTDC_CHAR,4)),112)
FROM dbo.TBL_NON_ISO_DATE_CONVERTION TC;
INSERT INTO @TIMER(T_TEXT) VALUES('CONCAT RIGHT LEFT');
INSERT INTO @TIMER(T_TEXT) VALUES('DATEFROMPARTS');
SELECT
  @DATE_BUCKET  = DATEFROMPARTS(
   CONVERT(INT,SUBSTRING(TC.TTDC_CHAR,5,4),0)
   ,CONVERT(INT,SUBSTRING(TC.TTDC_CHAR,1,2),0)
   ,CONVERT(INT,SUBSTRING(TC.TTDC_CHAR,3,2),0)
 )
FROM dbo.TBL_NON_ISO_DATE_CONVERTION TC;
INSERT INTO @TIMER(T_TEXT) VALUES('DATEFROMPARTS');
SELECT
  T.T_TEXT
 ,DATEDIFF(MICROSECOND,MIN(T.T_TS),MAX(T.T_TS)) AS DURATION
FROM @TIMER T
GROUP BY T.T_TEXT
ORDER BY DURATION;
Results
T_TEXT                DURATION
--------------------- ----------
DATEFROMPARTS             873144
CONCAT RIGHT LEFT        1170835
CAST RIGHT LEFT TRIM     1204396
CAST STUFF STUFF         1224418
DRY RUN                  4139857
December 21, 2018 at 7:30 am
Thanks Jeff and Eirikur.  Learned something new today, and I made a bad assumption that the substring would be the slowest solution.
I have sent off an email asking that they change the date formats to the ISO 8601 format YYYYMMDD.  Cross our fingers that they will be able to change it.
Ugh, I just saw they have at least one 'date' where they have created 2 separate fields, one for the date, one for the time.
-------------------------------------------------------------
we travel not to escape life but for life not to escape us
Don't fear failure, fear regret.
December 21, 2018 at 7:53 am
Jeff Moden - Thursday, December 20, 2018 10:36 PMSince you're using SQL Server 2012, DATEFROMPARTS with substrings to isolate the parts might be the fastest.
Spot on Jeff, DATEFROMPARTS is the fastest of the methods I've tested.
😎
Thanks for the test harness and running the test, Eirikur. I'm frequently on machines that are 2008 or less.
Heh.. and from your test, it's not only faster but it's significantly faster.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 21, 2018 at 8:25 am
timings is going to depend hugely on the spec.
on my laptop (I7-8550-U 1.8GHz - 16 GB ram) with SSD
times are as followsT_TEXT                  DURATION
DRY RUN                    73741
DATEFROMPARTS             243135
CONCAT RIGHT LEFT         266823
CAST RIGHT LEFT TRIM      291183
CAST STUFF STUFF          309771
December 21, 2018 at 8:43 am
frederico_fonseca - Friday, December 21, 2018 8:25 AMtimings is going to depend hugely on the spec.
on my laptop (I7-8550-U 1.8GHz - 16 GB ram) with SSDtimes are as follows
T_TEXT DURATION
DRY RUN 73741
DATEFROMPARTS 243135
CONCAT RIGHT LEFT 266823
CAST RIGHT LEFT TRIM 291183
CAST STUFF STUFF 309771
Agreed but DATEFROMPARTS still wins even there.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 21, 2018 at 9:22 am
The first time I ran Eirikur's code the 'Cast Right Left Trim' was the fastest, I didn't get a screen shot of that.
I've ran it about 6 times now, all others have the Datefromparts as the fastest.  But the order of the others changes.
Here is my last run:T_TEXT         DURATION
------------------------------ -----------
DRY RUN        229015
DATEFROMPARTS      378032
CONCAT RIGHT LEFT     457026
CAST RIGHT LEFT TRIM    482045
CAST STUFF STUFF     961075
Here is a screen shot of a prior run:
-------------------------------------------------------------
we travel not to escape life but for life not to escape us
Don't fear failure, fear regret.
December 21, 2018 at 10:36 am
below86 - Thursday, December 20, 2018 2:47 PMFirst I will say I've found a couple different SQL solutions for this, I'm asking if there is a better way or what would be the most efficient way of doing this. I'm also making an assumption that either of these ways would be better than using substring to parse each piece.
My head is a little foggy today, could be the holidays coming up or this cold I got from the grand kids.The issue is I will be receiving files from an outside source, the file will be a pipe delimited text file. The dates on this file will be coming in with the format of MMDDYYYY. This does not convert in SSIS directly to a date, it will fail converting or casting directly as a date.
Here is the 2 ways I found to convert this data to a date:
DECLARE @Dt AS CHAR(8);SET @Dt = '07232018'
--SELECT @Dt, CAST(@Dt AS Date) -- this fails
SELECT @Dt, CAST(STUFF(STUFF(@Dt, 3, 0, '/'), 6, 0, '/') AS DATE) -- Sean Lange solution from other post
SELECT @Dt, CAST(RIGHT(RTRIM(@Dt), 4) + LEFT(LTRIM(@Dt), 4) AS DATE) -- stole from another siteI don't want to do this conversion in the SSIS package, unless someone can prove it would be the best solution.
I will import this data from the text files into a 'work' table, each column defined as varchar(500), very little chance of failure on load. Then I was going to use SQL to 'Transform' the data from the 'Work' table before inserting into the final tables.
Of course my initial thought is to ask that they send the dates in a different format, but I don't know how open they will be to that. Right now we are talking a few hundred rows per import, I wouldn't expect this to be more than a couple thousand per import, but you never now. But I could be bringing in 60 to 70 files per import, I'm not sure, but I'm assuming I'll need to convert at least one date field on each.
I was going to jump on this 5 year old post, but thought I would create a new one. https://www.sqlservercentral.com/Forums/Topic1437922-149-1.aspx
Thanks in advance for your help.
I am not sure why you wouldn't do this in SSIS - it can be done using a derived column transformation quite easily and insures that the data is in a standard format that SQL will process as a date automatically.
In the connection manager for that field - define it as a string with 10 characters (then end result you will want). Add the derived column transformation and add this date column - the expression will be set to replace the existing column (for example - Replace 'EffectiveDate') - the expression would be: SUBSTRING(EffectiveDate,1,4) + "-" + SUBSTRING(EffectiveDate,5,2) + "-" + SUBSTRING(EffectiveDate,7,2)
The destination table could then be defined as a date or a string if you are getting bad data. If the date fields can be NULL then you would need to expand the expression to check for blank and return a NULL or the substrings with a conditional check.
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
December 21, 2018 at 12:44 pm
You're already importing the source text files using SSIS anyhow, so I agree with others that you should do this transformation using a "derived column" or "import column transformation" task. That way it's a simple Extract->Transform->Load.
There is no need for intermediate staging table step just to work around the client's formatting choices, which would make it Extract->Load->Transform->Extract->Load.
"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho
December 21, 2018 at 2:06 pm
Jeffrey Williams 3188 - Friday, December 21, 2018 10:36 AMbelow86 - Thursday, December 20, 2018 2:47 PMFirst I will say I've found a couple different SQL solutions for this, I'm asking if there is a better way or what would be the most efficient way of doing this. I'm also making an assumption that either of these ways would be better than using substring to parse each piece.
My head is a little foggy today, could be the holidays coming up or this cold I got from the grand kids.The issue is I will be receiving files from an outside source, the file will be a pipe delimited text file. The dates on this file will be coming in with the format of MMDDYYYY. This does not convert in SSIS directly to a date, it will fail converting or casting directly as a date.
Here is the 2 ways I found to convert this data to a date:
DECLARE @Dt AS CHAR(8);SET @Dt = '07232018'
--SELECT @Dt, CAST(@Dt AS Date) -- this fails
SELECT @Dt, CAST(STUFF(STUFF(@Dt, 3, 0, '/'), 6, 0, '/') AS DATE) -- Sean Lange solution from other post
SELECT @Dt, CAST(RIGHT(RTRIM(@Dt), 4) + LEFT(LTRIM(@Dt), 4) AS DATE) -- stole from another siteI don't want to do this conversion in the SSIS package, unless someone can prove it would be the best solution.
I will import this data from the text files into a 'work' table, each column defined as varchar(500), very little chance of failure on load. Then I was going to use SQL to 'Transform' the data from the 'Work' table before inserting into the final tables.
Of course my initial thought is to ask that they send the dates in a different format, but I don't know how open they will be to that. Right now we are talking a few hundred rows per import, I wouldn't expect this to be more than a couple thousand per import, but you never now. But I could be bringing in 60 to 70 files per import, I'm not sure, but I'm assuming I'll need to convert at least one date field on each.
I was going to jump on this 5 year old post, but thought I would create a new one. https://www.sqlservercentral.com/Forums/Topic1437922-149-1.aspx
Thanks in advance for your help.
I am not sure why you wouldn't do this in SSIS - it can be done using a derived column transformation quite easily and insures that the data is in a standard format that SQL will process as a date automatically.
In the connection manager for that field - define it as a string with 10 characters (then end result you will want). Add the derived column transformation and add this date column - the expression will be set to replace the existing column (for example - Replace 'EffectiveDate') - the expression would be: SUBSTRING(EffectiveDate,1,4) + "-" + SUBSTRING(EffectiveDate,5,2) + "-" + SUBSTRING(EffectiveDate,7,2)
The destination table could then be defined as a date or a string if you are getting bad data. If the date fields can be NULL then you would need to expand the expression to check for blank and return a NULL or the substrings with a conditional check.
Of course, there are those of us that don't use SSIS for any of this. 😀
--Jeff Moden
Change is inevitable... Change for the better is not.
December 27, 2018 at 8:28 am
Jeff Moden - Friday, December 21, 2018 2:06 PMOf course, there are those of us that don't use SSIS for any of this. 😀
Sounds like heaven Jeff. 🙂  Got any openings?  Maybe I could just stop by for a day, or two, maybe a week to try and soak up some of that knowledge. 🙂  I don't mind using SSIS for Extract and Load, but using the tools in SSIS to do the Transform just isn't something I want to do.  Now I'll setup an 'Execute SQL task' and put all the SQL in it to do any transformation.
Many years ago, at my prior job, we used a different ETL tool, we had a lot of Transformation built into the tool.. When we switched to SSIS it was a huge pain to get all of that put into SSIS.  Then several years later we had to upgrade to a newer version of SSIS and not all of the transformation converted over, another huge pain.  I guess that's why I'm hesitant to build any transformation in SSIS.  We are using 2008 SSIS now, there has been a project that's sitting out there for over 2 years for us to move to 2016 SSIS.  Who knows when that will happen and what pains will come with that.
Thanks to everyone for taking the time to read and answer my question.
-------------------------------------------------------------
we travel not to escape life but for life not to escape us
Don't fear failure, fear regret.
December 27, 2018 at 8:45 am
Performing transformation in the SELECT is OK, but staging the data to a table and then performing UPDATE(s) is a waste of time and disk space.
"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho
Viewing 15 posts - 1 through 15 (of 25 total)
You must be logged in to reply to this topic. Login to reply