September 29, 2013 at 6:21 pm
Hi,
I am struggling with changing the date format of dd MMM YYYY in the sample strings below into yyymmdd
Sample Data Input
Column Name:
SourceName
12 Jan 2013 Test.docx
01 February 2001 File.pptx
Meeting 04 Feb 2012.xls
09 September 2011.txt
30 Jan 13.doc
Sample Data Output
Column Name:
SourceName
20130112 Test.docx
20010201 File.pptx
Meeting 20120204.xls
20110911.txt
20130130.doc
September 29, 2013 at 7:10 pm
Post retracted. Hang on for a replacement.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
September 29, 2013 at 7:16 pm
Ok. This should work.
But someone I'm sure will probably come up with a better way.
WITH SampleData (sourcedata) AS
(
SELECT '12 Jan 2013 Test.docx'
UNION ALL SELECT '01 February 2001 File.pptx'
UNION ALL SELECT 'Meeting 04 Feb 2012.xls'
UNION ALL SELECT '09 September 2011.txt'
UNION ALL SELECT '30 Jan 13.doc'
)
SELECT sourcedata, sourcedata2=STUFF(sourcedata, pos1, (pos2+4)-pos1
,yy + COALESCE(m1, m2, m3, m4, m5, m6, m7, m8, m9, m10, m11, m12) + dd)
FROM SampleData a
CROSS APPLY
(
SELECT pos1=PATINDEX('%[0-9][0-9]%', sourcedata)
,x=CASE PATINDEX('%[0-9][0-9][0-9][0-9]%', sourcedata) WHEN 0 THEN 2 ELSE 4 END
) b
CROSS APPLY
(
SELECT pos2=CASE x
WHEN 4 THEN PATINDEX('%[0-9][0-9][0-9][0-9]%', sourcedata)
ELSE PATINDEX('%[0-9][0-9]%', RIGHT(sourcedata, LEN(sourcedata)-(pos1+1))) END
) c
CROSS APPLY
(
SELECT dd=SUBSTRING(sourcedata, pos1, 2)
,yy=CASE x WHEN 4 THEN SUBSTRING(sourcedata, pos2, x)
ELSE '20' + SUBSTRING(sourcedata, 2+pos2, x)
END
) d
CROSS APPLY
(
SELECT m1=RIGHT(100+NULLIF(1*SIGN(CHARINDEX('Jan', sourcedata)), 0), 2)
,m2=RIGHT(100+NULLIF(2*SIGN(CHARINDEX('Feb', sourcedata)), 0), 2)
,m3=RIGHT(100+NULLIF(3*SIGN(CHARINDEX('Mar', sourcedata)), 0), 2)
,m4=RIGHT(100+NULLIF(4*SIGN(CHARINDEX('Apr', sourcedata)), 0), 2)
,m5=RIGHT(100+NULLIF(5*SIGN(CHARINDEX('May', sourcedata)), 0), 2)
,m6=RIGHT(100+NULLIF(6*SIGN(CHARINDEX('Jun', sourcedata)), 0), 2)
,m7=RIGHT(100+NULLIF(7*SIGN(CHARINDEX('Jul', sourcedata)), 0), 2)
,m8=RIGHT(100+NULLIF(8*SIGN(CHARINDEX('Aug', sourcedata)), 0), 2)
,m9=RIGHT(100+NULLIF(9*SIGN(CHARINDEX('Sep', sourcedata)), 0), 2)
,m10=RIGHT(100+NULLIF(10*SIGN(CHARINDEX('Oct', sourcedata)), 0), 2)
,m11=RIGHT(100+NULLIF(11*SIGN(CHARINDEX('Nov', sourcedata)), 0), 2)
,m12=RIGHT(100+NULLIF(12*SIGN(CHARINDEX('Dec', sourcedata)), 0), 2)
) e;
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
September 29, 2013 at 8:14 pm
dwain.c (9/29/2013)
Ok. This should work.But someone I'm sure will probably come up with a better way.
WITH SampleData (sourcedata) AS
(
SELECT '12 Jan 2013 Test.docx'
UNION ALL SELECT '01 February 2001 File.pptx'
UNION ALL SELECT 'Meeting 04 Feb 2012.xls'
UNION ALL SELECT '09 September 2011.txt'
UNION ALL SELECT '30 Jan 13.doc'
)
SELECT sourcedata, sourcedata2=STUFF(sourcedata, pos1, (pos2+4)-pos1
,yy + COALESCE(m1, m2, m3, m4, m5, m6, m7, m8, m9, m10, m11, m12) + dd)
FROM SampleData a
CROSS APPLY
(
SELECT pos1=PATINDEX('%[0-9][0-9]%', sourcedata)
,x=CASE PATINDEX('%[0-9][0-9][0-9][0-9]%', sourcedata) WHEN 0 THEN 2 ELSE 4 END
) b
CROSS APPLY
(
SELECT pos2=CASE x
WHEN 4 THEN PATINDEX('%[0-9][0-9][0-9][0-9]%', sourcedata)
ELSE PATINDEX('%[0-9][0-9]%', RIGHT(sourcedata, LEN(sourcedata)-(pos1+1))) END
) c
CROSS APPLY
(
SELECT dd=SUBSTRING(sourcedata, pos1, 2)
,yy=CASE x WHEN 4 THEN SUBSTRING(sourcedata, pos2, x)
ELSE '20' + SUBSTRING(sourcedata, 2+pos2, x)
END
) d
CROSS APPLY
(
SELECT m1=RIGHT(100+NULLIF(1*SIGN(CHARINDEX('Jan', sourcedata)), 0), 2)
,m2=RIGHT(100+NULLIF(2*SIGN(CHARINDEX('Feb', sourcedata)), 0), 2)
,m3=RIGHT(100+NULLIF(3*SIGN(CHARINDEX('Mar', sourcedata)), 0), 2)
,m4=RIGHT(100+NULLIF(4*SIGN(CHARINDEX('Apr', sourcedata)), 0), 2)
,m5=RIGHT(100+NULLIF(5*SIGN(CHARINDEX('May', sourcedata)), 0), 2)
,m6=RIGHT(100+NULLIF(6*SIGN(CHARINDEX('Jun', sourcedata)), 0), 2)
,m7=RIGHT(100+NULLIF(7*SIGN(CHARINDEX('Jul', sourcedata)), 0), 2)
,m8=RIGHT(100+NULLIF(8*SIGN(CHARINDEX('Aug', sourcedata)), 0), 2)
,m9=RIGHT(100+NULLIF(9*SIGN(CHARINDEX('Sep', sourcedata)), 0), 2)
,m10=RIGHT(100+NULLIF(10*SIGN(CHARINDEX('Oct', sourcedata)), 0), 2)
,m11=RIGHT(100+NULLIF(11*SIGN(CHARINDEX('Nov', sourcedata)), 0), 2)
,m12=RIGHT(100+NULLIF(12*SIGN(CHARINDEX('Dec', sourcedata)), 0), 2)
) e;
It would appear that you have a problem with the MM part of that forumula...
sourcedata sourcedata2
-------------------------- --------------------
12 Jan 2013 Test.docx 20130412 Test.docx
01 February 2001 File.pptx 20010401 File.pptx
Meeting 04 Feb 2012.xls Meeting 20121204.xls
09 September 2011.txt 20110409.txt
30 Jan 13.doc 20130430.doc
The following will work but only with the understanding that no other numeric digits can exist in the string as the current test data would have it.
WITH SampleData (sourcedata) AS
(
SELECT '12 Jan 2013 Test.docx'
UNION ALL SELECT '01 February 2001 File.pptx'
UNION ALL SELECT 'Meeting 04 Feb 2012.xls'
UNION ALL SELECT '09 September 2011.txt'
UNION ALL SELECT '30 Jan 13.doc'
),
cteEnumerate AS
(
SELECT RowNum = ROW_NUMBER()OVER(ORDER BY (SELECT 1)), SourceData
FROM SampleData
),
ctePosition AS
(
SELECT RowNum,SourceData,LeftPos = MIN(t.N), DateLength = MAX(t.N)-MIN(t.N)+2
FROM dbo.Tally t
CROSS JOIN cteEnumerate
WHERE t.N <= LEN(SourceData)
AND SUBSTRING(SourceData,t.N,3) LIKE '[0-9][0-9][^0-9]'
GROUP BY RowNum,SourceData
)
SELECT SourceData,
NewValue = STUFF(SourceData,LeftPos,DateLength,CONVERT(CHAR(8),CAST(SUBSTRING(SourceData,LeftPos,DateLength)AS DATETIME),112))
FROM ctePosition
;
Results:
SourceData NewValue
-------------------------- --------------------
12 Jan 2013 Test.docx 20130112 Test.docx
01 February 2001 File.pptx 20010201 File.pptx
Meeting 04 Feb 2012.xls Meeting 20120204.xls
09 September 2011.txt 20110909.txt
30 Jan 13.doc 20130130.doc
Learn about what a Tally Table is and how it replaces certain loops here...
http://www.sqlservercentral.com/articles/T-SQL/62867/
--Jeff Moden
Change is inevitable... Change for the better is not.
September 29, 2013 at 8:28 pm
Running the code you quoted Jeff, I get these results.
sourcedata sourcedata2
12 Jan 2013 Test.docx 20130112 Test.docx
01 February 2001 File.pptx 20010201 File.pptx
Meeting 04 Feb 2012.xlsMeeting 20120204.xls
09 September 2011.txt 20110909.txt
30 Jan 13.doc 20130130.doc
There were some issues in my earlier attempt. Perhaps you were testing one of those.
Edit: Noting that I'm not particularly proud of my brute force attempt. Certainly could be improved.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
September 29, 2013 at 8:41 pm
dwain.c (9/29/2013)
Running the code you quoted Jeff, I get these results.
sourcedata sourcedata2
12 Jan 2013 Test.docx 20130112 Test.docx
01 February 2001 File.pptx 20010201 File.pptx
Meeting 04 Feb 2012.xlsMeeting 20120204.xls
09 September 2011.txt 20110909.txt
30 Jan 13.doc 20130130.doc
There were some issues in my earlier attempt. Perhaps you were testing one of those.
Edit: Noting that I'm not particularly proud of my brute force attempt. Certainly could be improved.
Agreed. I just reran that code and it came up with the correct answer. We must have been working simultaneously... the last thing I did was refresh the window and quote your post. As you said, I must have copied the code from the earlier post.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 29, 2013 at 9:12 pm
The following will work but only with the understanding that no other numeric digits can exist in the string as the current test data would have it.
Awesome, thanks for the in-depth assistance, works well. Though what will happen if there are other numeric values in the string such as the below?
Sorry, I should have used more meaningful example test data.
3.0 Meeting Minutes 16 January 2013.pdf
0.0 ICT Meeting (12 Jan 2013).doc
ICTAgenda 15January 2013 V1.pptx
ICTAgenda 15 January 2013 V2.pptx
September 29, 2013 at 9:19 pm
dc.pc100 (9/29/2013)
The following will work but only with the understanding that no other numeric digits can exist in the string as the current test data would have it.
Awesome, thanks for the in-depth assistance, works well. Though what will happen if there are other numeric values in the string such as the below?
3.0 Meeting Minutes 16 January 2013.pdf
0.0 ICT Meeting (12 Jan 2013).doc
Mine can actually handle that nuance. The difficulty will arise if you include something else that has two consecutive digits followed by a non-digit. Here's the output from my run...
SourceData NewValue
--------------------------------------- --------------------------------
12 Jan 2013 Test.docx 20130112 Test.docx
01 February 2001 File.pptx 20010201 File.pptx
Meeting 04 Feb 2012.xls Meeting 20120204.xls
09 September 2011.txt 20110909.txt
30 Jan 13.doc 20130130.doc
3.0 Meeting Minutes 16 January 2013.pdf 3.0 Meeting Minutes 20130116.pdf
0.0 ICT Meeting (12 Jan 2013).doc 0.0 ICT Meeting (20130112).doc
Dwain's code will handle the extra data you added, as well.
Before you can ask it, mine will also handle dates that are in the form of MM/DD/YYYY.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 29, 2013 at 9:44 pm
Awesome guys thank you so much. Beats manually going through 200,000 rows in multiple excel files to change the date 🙂
September 29, 2013 at 9:53 pm
dc.pc100 (9/29/2013)
Awesome guys thank you so much. Beats manually going through 200,000 rows in multiple excel files to change the date 🙂
Like everrything else, "It Depends". If I was allowed to drink beer while working the Excel spreadsheet, I'd being inclined to prolong such agony. 😛
--Jeff Moden
Change is inevitable... Change for the better is not.
September 29, 2013 at 11:17 pm
I find it interesting because I'm into self-flagellation and because it seems that all of the date formats you've included in your filenames CAST cleanly to a DATE data type, which can then be formatted to be stuffed back into your filename.
Here's an example using a pattern-based string splitter.
WITH SampleData (sourcedata) AS
(
SELECT '12 Jan 2013 Test.docx'
UNION ALL SELECT '01 February 2001 File.pptx'
UNION ALL SELECT 'Meeting 04 Feb 2012.xls'
UNION ALL SELECT '09 September 2011.txt'
UNION ALL SELECT '30 Jan 13.doc'
UNION ALL SELECT '3.0 Meeting Minutes 16 January 2013.pdf'
UNION ALL SELECT '0.0 ICT Meeting (12 Jan 2013).doc'
UNION ALL SELECT 'ICTAgenda 15January 2013 V1.pptx'
UNION ALL SELECT 'ICTAgenda 15 January 2013 V2.pptx'
),
SplitFNs AS
(
SELECT sourcedata, b.*
,rn=CASE WHEN LEFT(LTRIM(Item), 3) IN
('Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec')
THEN ItemNumber END
FROM SampleData a
CROSS APPLY dbo.PatternSplitCM(sourcedata, '[0-9]') b
),
DateRecs AS
(
SELECT a.sourcedata, b.ItemNumber, b.Item
FROM SplitFNs a
JOIN SplitFNs b
ON b.ItemNumber BETWEEN a.rn - 1 AND a.rn + 1 AND a.sourcedata = b.sourcedata
)
SELECT sourcedata
,sourcedata2=STUFF(sourcedata, CHARINDEX(FileDate, sourcedata), LEN(FileDate),CAST(FileDate AS DATE))
FROM SampleData a
CROSS APPLY
(
SELECT Item + ''
FROM DateRecs b
WHERE a.sourcedata = b.sourcedata
ORDER BY ItemNumber
FOR XML PATH('')
) b(FileDate);
PatternSplitCM can be found in my 4th signature link.
Note that I'm not necessarily recommending this approach. I just thought that odd ability to CAST to DATE might be easily exploitable, but alas I didn't find it so easy.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
September 29, 2013 at 11:59 pm
Delete
September 30, 2013 at 12:06 am
How or where would i add the below
Where [SourceData] like '%[0-9]%' + 'Jan' + '[0-9]%'
as lots of data gets messed up with dwains code, and want to limit it to date specific strings only
e.g
Destination File Name....sourcedata2
20130616.xlsx..............0616.xlsx
201307.xlsx.................07.xlsx
HSE.201306.xlsx...........HSE.06.xlsx
HSE.201307.xlsx...........HSE.07.xlsx
2013 ICT Calendar.xls....ICT Calendar.xls
Inv.10017355.txt..........Optus.Inv.7355.txt
September 30, 2013 at 12:08 am
dc.pc100 (9/30/2013)
How or where would i add the belowWhere [SourceData] like '%[0-9]%' + 'Jan' + '[0-9]%'
as lots of data gets messed up with dwains code
e.g
Destination File Name....sourcedata2
20130616.xlsx..............0616.xlsx
201307.xlsx.................07.xlsx
HSE.201306.xlsx...........HSE.06.xlsx
HSE.201307.xlsx...........HSE.07.xlsx
2013 ICT Calendar.xls....ICT Calendar.xls
Inv.10017355.txt..........Optus.Inv.7355.txt
Yup mine won't handle those.
Have you tried Jeff's code with this case?
I think using LIKE like you are trying to do is going to be a pain because you'd need 12 of them OR'd together.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
September 30, 2013 at 12:11 am
I havent had much experience with Tally tables so do no know how to utilize them.
The "OR" x12 in the where clause it fine for now. Ill probs just create another table and query the words(month names) in there
Not best practice but yea
Just having troubles inserting the where clause, as it doesn't like it anywhere near there Cross Apply
Viewing 15 posts - 1 through 15 (of 19 total)
You must be logged in to reply to this topic. Login to reply