SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Parse Info Out of FileName


Parse Info Out of FileName

Author
Message
monkeypoi
monkeypoi
Forum Newbie
Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)

Group: General Forum Members
Points: 2 Visits: 18
I am new to SSIS and have a package I have been working on. My issue is I don't know how to parse out a portion of the incoming file name so I can use that to help name the final flat file. I have files coming in as both csv and txt files that prevents me from just looking for the file extension. The structure of the incoming file name is C:/Location/Folder/Subfolder/filename20130101.(csv) or (txt). What I need is to get the numbers on the end of the file, store them than do a manipulation to them so I can add the previous day's date in the same format to my final flat file. I also have to include in the logic for end of month, year, etc.
pietlinden
pietlinden
SSChampion
SSChampion (13K reputation)SSChampion (13K reputation)SSChampion (13K reputation)SSChampion (13K reputation)SSChampion (13K reputation)SSChampion (13K reputation)SSChampion (13K reputation)SSChampion (13K reputation)

Group: General Forum Members
Points: 13872 Visits: 14119
I'll give you that I'm not an SSIS guy, but if the format is always the same, you should be able to create a variable for the filename, then strip off the extension (last 4 characters?), and then if the date is always 8 characters, you should be able to use SUBSTRING() to split the pieces out, convert the strings to numbers, and then that whole thing to a real date that you can do math on.
monkeypoi
monkeypoi
Forum Newbie
Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)

Group: General Forum Members
Points: 2 Visits: 18
Unfortunately the files that come in aren't spaced the exact the same. One group of files comes in a csv and are named/spaced one way, however the other file comes in as a txt and is spaced differently. I already tried that route. What I need is to verify that the files have the same date on them then use that date to derive the previous date to name my final file with the previous day's date.
reitzen
reitzen
Forum Newbie
Forum Newbie (1 reputation)Forum Newbie (1 reputation)Forum Newbie (1 reputation)Forum Newbie (1 reputation)Forum Newbie (1 reputation)Forum Newbie (1 reputation)Forum Newbie (1 reputation)Forum Newbie (1 reputation)

Group: General Forum Members
Points: 1 Visits: 14
ltrim(rtrim(SUBSTRING([filename], charindex('Invoice', [filename])+8, LEN([filename]) - (CHARINDEX('Invoice', [filename])+11))))

This snippet pulls out the invoice number from a filename. The filename is in a consistent format "MyCustomerName_Invoice_12345678.pdf". The snippet above uses the word "Invoice" as the anchor to get the invoice number and it doesn't care what the file extension is. You could use a CASE statement to test for a TXT or CSV extension. You could also use CASE to test for any consistent part of the filename. This is a T-SQL snippet so I don't know if this going to help you out of the box, but for what it's worth...




SQL & Cold Fusion
EricEyster
EricEyster
Right there with Babe
Right there with Babe (734 reputation)Right there with Babe (734 reputation)Right there with Babe (734 reputation)Right there with Babe (734 reputation)Right there with Babe (734 reputation)Right there with Babe (734 reputation)Right there with Babe (734 reputation)Right there with Babe (734 reputation)

Group: General Forum Members
Points: 734 Visits: 520
At some point, this type of logic becomes easier to write and maintain as a script task embedded in a package.
Ed Wagner
Ed Wagner
SSC-Forever
SSC-Forever (47K reputation)SSC-Forever (47K reputation)SSC-Forever (47K reputation)SSC-Forever (47K reputation)SSC-Forever (47K reputation)SSC-Forever (47K reputation)SSC-Forever (47K reputation)SSC-Forever (47K reputation)

Group: General Forum Members
Points: 47941 Visits: 10844
EricEyster (2/24/2014)
At some point, this type of logic becomes easier to write and maintain as a script task embedded in a package.

Agreed. I'm not at all an SSIS guy beyond spelling it, but if you're going to do this in T-SQL, an alternate approach would be to stop thinking of it in terms of parsing the string by character and start thinking of splitting it out by delimiter. If you're not familiar with Jeff Moden's DelimitedSplit8K function, the article is at http://www.sqlservercentral.com/articles/Tally+Table/72993/. It'll change the way you look at data and think about performance.

with cte(filename) as (
select 'MyCustomerName_Invoice_12345678.pdf' union all
select 'YourCustomerName_Invoice_23456.pdf' union all
select 'SomeLongerCustomerName_Invoice_123.pdf' union all
select 'ShortName_Invoice_7891234567890.pdf')
select filename, s.item
from cte
cross apply DelimitedSplit8K(cte.filename, '_') s
where s.ItemNumber = 3;




Tally Tables - Performance Personified
String Splitting with True Performance
Best practices on how to ask questions
dwain.c
dwain.c
SSCoach
SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)

Group: General Forum Members
Points: 17551 Visits: 6431
I'm going to go out on a limb here since I am not an "SSIS guy" either and say that if you can use DelimitedSplit8K you can also use PatternSplitCM (see the 4th article in my signature links). In that article it gives a number of examples for parsing a file name, some of which may be applicable to your case.


My mantra: No loops! No CURSORs! No RBAR! Hoo-uh!

My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.


Need to UNPIVOT? Why not CROSS APPLY VALUES instead?
Since random numbers are too important to be left to chance, let's generate some!
Learn to understand recursive CTEs by example.
Splitting strings based on patterns can be fast!
My temporal SQL musings: Calendar Tables, an Easter SQL, Time Slots and Self-maintaining, Contiguous Effective Dates in Temporal Tables
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (212K reputation)SSC Guru (212K reputation)SSC Guru (212K reputation)SSC Guru (212K reputation)SSC Guru (212K reputation)SSC Guru (212K reputation)SSC Guru (212K reputation)SSC Guru (212K reputation)

Group: General Forum Members
Points: 212527 Visits: 41977
This should handle pretty much anything you can throw at it.

-- DROP TABLE #FileName
--===== Create a test table with some normal and "odd" file/path names.
SELECT FileName = 'C:/Location/Folder/Subfolder/filename20130101.xxx'
INTO #FileName UNION ALL
SELECT FileName = 'C:/Location/Folder/Subfolder/filename20130102' UNION ALL
SELECT FileName = 'filename20130103' UNION ALL
SELECT FileName = 'filename20130104.xxx' UNION ALL
SELECT FileName = '20130105.xxx' UNION ALL
SELECT FileName = '20130106' UNION ALL
SELECT FileName = '20130107def' UNION ALL
SELECT FileName = 'abc20130108def' UNION ALL
SELECT FileName = 'abc20130109def.xxx' UNION ALL
SELECT FileName = '123abc20130110def.xxx' UNION ALL
SELECT FileName = 'abcdef.xxx' --will not show up because has no numbers
;
--===== Get just the last set of numbers.
SELECT fn.FileName, DateOnly = LEFT(FromTheNumbers,ISNULL(NULLIF(PATINDEX('%[0-9][^0-9]%',FromTheNumbers),0),8000))
FROM #FileName fn
CROSS APPLY (SELECT RIGHT(FileName,ISNULL(NULLIF(PATINDEX('%[0-9][^0-9]%',REVERSE(FileName)),0),8000))) ca (FromTheNumbers)
WHERE fn.FileName LIKE '%[0-9]%'
;




Results:


FileName DateOnly
------------------------------------------------- --------
C:/Location/Folder/Subfolder/filename20130101.xxx 20130101
C:/Location/Folder/Subfolder/filename20130102 20130102
filename20130103 20130103
filename20130104.xxx 20130104
20130105.xxx 20130105
20130106 20130106
20130107def 20130107
abc20130108def 20130108
abc20130109def.xxx 20130109
123abc20130109def.xxx 20130110
(10 row(s) affected)



--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Ed Wagner
Ed Wagner
SSC-Forever
SSC-Forever (47K reputation)SSC-Forever (47K reputation)SSC-Forever (47K reputation)SSC-Forever (47K reputation)SSC-Forever (47K reputation)SSC-Forever (47K reputation)SSC-Forever (47K reputation)SSC-Forever (47K reputation)

Group: General Forum Members
Points: 47941 Visits: 10844
Jeff Moden (2/25/2014)
This should handle pretty much anything you can throw at it.

-- DROP TABLE #FileName
--===== Create a test table with some normal and "odd" file/path names.
SELECT FileName = 'C:/Location/Folder/Subfolder/filename20130101.xxx'
INTO #FileName UNION ALL
SELECT FileName = 'C:/Location/Folder/Subfolder/filename20130102' UNION ALL
SELECT FileName = 'filename20130103' UNION ALL
SELECT FileName = 'filename20130104.xxx' UNION ALL
SELECT FileName = '20130105.xxx' UNION ALL
SELECT FileName = '20130106' UNION ALL
SELECT FileName = '20130107def' UNION ALL
SELECT FileName = 'abc20130108def' UNION ALL
SELECT FileName = 'abc20130109def.xxx' UNION ALL
SELECT FileName = '123abc20130110def.xxx' UNION ALL
SELECT FileName = 'abcdef.xxx' --will not show up because has no numbers
;
--===== Get just the last set of numbers.
SELECT fn.FileName, DateOnly = LEFT(FromTheNumbers,ISNULL(NULLIF(PATINDEX('%[0-9][^0-9]%',FromTheNumbers),0),8000))
FROM #FileName fn
CROSS APPLY (SELECT RIGHT(FileName,ISNULL(NULLIF(PATINDEX('%[0-9][^0-9]%',REVERSE(FileName)),0),8000))) ca (FromTheNumbers)
WHERE fn.FileName LIKE '%[0-9]%'
;




Results:


FileName DateOnly
------------------------------------------------- --------
C:/Location/Folder/Subfolder/filename20130101.xxx 20130101
C:/Location/Folder/Subfolder/filename20130102 20130102
filename20130103 20130103
filename20130104.xxx 20130104
20130105.xxx 20130105
20130106 20130106
20130107def 20130107
abc20130108def 20130108
abc20130109def.xxx 20130109
123abc20130109def.xxx 20130110
(10 row(s) affected)


Now that's a cool approach, Jeff, and it does 1M rows in about 12 seconds.


Tally Tables - Performance Personified
String Splitting with True Performance
Best practices on how to ask questions
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (212K reputation)SSC Guru (212K reputation)SSC Guru (212K reputation)SSC Guru (212K reputation)SSC Guru (212K reputation)SSC Guru (212K reputation)SSC Guru (212K reputation)SSC Guru (212K reputation)

Group: General Forum Members
Points: 212527 Visits: 41977
Ed Wagner (2/26/2014)
... and it does 1M rows in about 12 seconds.


Heh... damn. I must be gettin' old and starting to slip. :-P

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search