Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12»»

Parse Info Out of FileName Expand / Collapse
Author
Message
Posted Friday, September 20, 2013 1:40 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Monday, September 23, 2013 6:30 PM
Points: 2, Visits: 18
I am new to SSIS and have a package I have been working on. My issue is I don't know how to parse out a portion of the incoming file name so I can use that to help name the final flat file. I have files coming in as both csv and txt files that prevents me from just looking for the file extension. The structure of the incoming file name is C:/Location/Folder/Subfolder/filename20130101.(csv) or (txt). What I need is to get the numbers on the end of the file, store them than do a manipulation to them so I can add the previous day's date in the same format to my final flat file. I also have to include in the logic for end of month, year, etc.
Post #1497027
Posted Saturday, September 21, 2013 8:51 PM
SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Yesterday @ 9:18 PM
Points: 839, Visits: 5,423
I'll give you that I'm not an SSIS guy, but if the format is always the same, you should be able to create a variable for the filename, then strip off the extension (last 4 characters?), and then if the date is always 8 characters, you should be able to use SUBSTRING() to split the pieces out, convert the strings to numbers, and then that whole thing to a real date that you can do math on.
Post #1497177
Posted Monday, September 23, 2013 10:15 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Monday, September 23, 2013 6:30 PM
Points: 2, Visits: 18
Unfortunately the files that come in aren't spaced the exact the same. One group of files comes in a csv and are named/spaced one way, however the other file comes in as a txt and is spaced differently. I already tried that route. What I need is to verify that the files have the same date on them then use that date to derive the previous date to name my final file with the previous day's date.
Post #1497459
Posted Monday, February 24, 2014 2:12 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Tuesday, June 10, 2014 12:29 PM
Points: 1, Visits: 13
ltrim(rtrim(SUBSTRING([filename], charindex('Invoice', [filename])+8, LEN([filename]) - (CHARINDEX('Invoice', [filename])+11))))

This snippet pulls out the invoice number from a filename. The filename is in a consistent format "MyCustomerName_Invoice_12345678.pdf". The snippet above uses the word "Invoice" as the anchor to get the invoice number and it doesn't care what the file extension is. You could use a CASE statement to test for a TXT or CSV extension. You could also use CASE to test for any consistent part of the filename. This is a T-SQL snippet so I don't know if this going to help you out of the box, but for what it's worth...




SQL & Cold Fusion
Post #1544710
Posted Monday, February 24, 2014 5:25 PM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Monday, May 5, 2014 6:31 AM
Points: 288, Visits: 519
At some point, this type of logic becomes easier to write and maintain as a script task embedded in a package.
Post #1544745
Posted Tuesday, February 25, 2014 11:13 AM
SSCarpal Tunnel

SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

Group: General Forum Members
Last Login: Today @ 11:54 AM
Points: 4,246, Visits: 3,686
EricEyster (2/24/2014)
At some point, this type of logic becomes easier to write and maintain as a script task embedded in a package.

Agreed. I'm not at all an SSIS guy beyond spelling it, but if you're going to do this in T-SQL, an alternate approach would be to stop thinking of it in terms of parsing the string by character and start thinking of splitting it out by delimiter. If you're not familiar with Jeff Moden's DelimitedSplit8K function, the article is at http://www.sqlservercentral.com/articles/Tally+Table/72993/. It'll change the way you look at data and think about performance.

with cte(filename) as (
select 'MyCustomerName_Invoice_12345678.pdf' union all
select 'YourCustomerName_Invoice_23456.pdf' union all
select 'SomeLongerCustomerName_Invoice_123.pdf' union all
select 'ShortName_Invoice_7891234567890.pdf')
select filename, s.item
from cte
cross apply DelimitedSplit8K(cte.filename, '_') s
where s.ItemNumber = 3;




Tally Tables - Performance Personified
String Splitting with True Performance
Best practices on how to ask questions
Post #1545074
Posted Tuesday, February 25, 2014 5:14 PM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Tuesday, October 7, 2014 10:53 PM
Points: 3,421, Visits: 5,359
I'm going to go out on a limb here since I am not an "SSIS guy" either and say that if you can use DelimitedSplit8K you can also use PatternSplitCM (see the 4th article in my signature links). In that article it gives a number of examples for parsing a file name, some of which may be applicable to your case.


My mantra: No loops! No CURSORs! No RBAR! Hoo-uh!

My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.


Need to UNPIVOT? Why not CROSS APPLY VALUES instead?
Since random numbers are too important to be left to chance, let's generate some!
Learn to understand recursive CTEs by example.
Splitting strings based on patterns can be fast!
Post #1545177
Posted Tuesday, February 25, 2014 9:18 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 8:08 AM
Points: 35,366, Visits: 31,906
This should handle pretty much anything you can throw at it.

-- DROP TABLE #FileName
--===== Create a test table with some normal and "odd" file/path names.
SELECT FileName = 'C:/Location/Folder/Subfolder/filename20130101.xxx'
INTO #FileName UNION ALL
SELECT FileName = 'C:/Location/Folder/Subfolder/filename20130102' UNION ALL
SELECT FileName = 'filename20130103' UNION ALL
SELECT FileName = 'filename20130104.xxx' UNION ALL
SELECT FileName = '20130105.xxx' UNION ALL
SELECT FileName = '20130106' UNION ALL
SELECT FileName = '20130107def' UNION ALL
SELECT FileName = 'abc20130108def' UNION ALL
SELECT FileName = 'abc20130109def.xxx' UNION ALL
SELECT FileName = '123abc20130110def.xxx' UNION ALL
SELECT FileName = 'abcdef.xxx' --will not show up because has no numbers
;
--===== Get just the last set of numbers.
SELECT fn.FileName, DateOnly = LEFT(FromTheNumbers,ISNULL(NULLIF(PATINDEX('%[0-9][^0-9]%',FromTheNumbers),0),8000))
FROM #FileName fn
CROSS APPLY (SELECT RIGHT(FileName,ISNULL(NULLIF(PATINDEX('%[0-9][^0-9]%',REVERSE(FileName)),0),8000))) ca (FromTheNumbers)
WHERE fn.FileName LIKE '%[0-9]%'
;


Results:

FileName                                          DateOnly
------------------------------------------------- --------
C:/Location/Folder/Subfolder/filename20130101.xxx 20130101
C:/Location/Folder/Subfolder/filename20130102 20130102
filename20130103 20130103
filename20130104.xxx 20130104
20130105.xxx 20130105
20130106 20130106
20130107def 20130107
abc20130108def 20130108
abc20130109def.xxx 20130109
123abc20130109def.xxx 20130110
(10 row(s) affected)



--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1545205
Posted Wednesday, February 26, 2014 5:46 AM
SSCarpal Tunnel

SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

Group: General Forum Members
Last Login: Today @ 11:54 AM
Points: 4,246, Visits: 3,686
Jeff Moden (2/25/2014)
This should handle pretty much anything you can throw at it.

-- DROP TABLE #FileName
--===== Create a test table with some normal and "odd" file/path names.
SELECT FileName = 'C:/Location/Folder/Subfolder/filename20130101.xxx'
INTO #FileName UNION ALL
SELECT FileName = 'C:/Location/Folder/Subfolder/filename20130102' UNION ALL
SELECT FileName = 'filename20130103' UNION ALL
SELECT FileName = 'filename20130104.xxx' UNION ALL
SELECT FileName = '20130105.xxx' UNION ALL
SELECT FileName = '20130106' UNION ALL
SELECT FileName = '20130107def' UNION ALL
SELECT FileName = 'abc20130108def' UNION ALL
SELECT FileName = 'abc20130109def.xxx' UNION ALL
SELECT FileName = '123abc20130110def.xxx' UNION ALL
SELECT FileName = 'abcdef.xxx' --will not show up because has no numbers
;
--===== Get just the last set of numbers.
SELECT fn.FileName, DateOnly = LEFT(FromTheNumbers,ISNULL(NULLIF(PATINDEX('%[0-9][^0-9]%',FromTheNumbers),0),8000))
FROM #FileName fn
CROSS APPLY (SELECT RIGHT(FileName,ISNULL(NULLIF(PATINDEX('%[0-9][^0-9]%',REVERSE(FileName)),0),8000))) ca (FromTheNumbers)
WHERE fn.FileName LIKE '%[0-9]%'
;


Results:

FileName                                          DateOnly
------------------------------------------------- --------
C:/Location/Folder/Subfolder/filename20130101.xxx 20130101
C:/Location/Folder/Subfolder/filename20130102 20130102
filename20130103 20130103
filename20130104.xxx 20130104
20130105.xxx 20130105
20130106 20130106
20130107def 20130107
abc20130108def 20130108
abc20130109def.xxx 20130109
123abc20130109def.xxx 20130110
(10 row(s) affected)


Now that's a cool approach, Jeff, and it does 1M rows in about 12 seconds.



Tally Tables - Performance Personified
String Splitting with True Performance
Best practices on how to ask questions
Post #1545345
Posted Wednesday, February 26, 2014 10:32 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 8:08 AM
Points: 35,366, Visits: 31,906
Ed Wagner (2/26/2014)
... and it does 1M rows in about 12 seconds.


Heh... damn. I must be gettin' old and starting to slip.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1545697
« Prev Topic | Next Topic »

Add to briefcase 12»»

Permissions Expand / Collapse