Parse Info Out of FileName

  • I am new to SSIS and have a package I have been working on. My issue is I don't know how to parse out a portion of the incoming file name so I can use that to help name the final flat file. I have files coming in as both csv and txt files that prevents me from just looking for the file extension. The structure of the incoming file name is C:/Location/Folder/Subfolder/filename20130101.(csv) or (txt). What I need is to get the numbers on the end of the file, store them than do a manipulation to them so I can add the previous day's date in the same format to my final flat file. I also have to include in the logic for end of month, year, etc.

  • I'll give you that I'm not an SSIS guy, but if the format is always the same, you should be able to create a variable for the filename, then strip off the extension (last 4 characters?), and then if the date is always 8 characters, you should be able to use SUBSTRING() to split the pieces out, convert the strings to numbers, and then that whole thing to a real date that you can do math on.

  • Unfortunately the files that come in aren't spaced the exact the same. One group of files comes in a csv and are named/spaced one way, however the other file comes in as a txt and is spaced differently. I already tried that route. What I need is to verify that the files have the same date on them then use that date to derive the previous date to name my final file with the previous day's date.

  • ltrim(rtrim(SUBSTRING([filename], charindex('Invoice', [filename])+8, LEN([filename]) - (CHARINDEX('Invoice', [filename])+11))))

    [p]This snippet pulls out the invoice number from a filename. The filename is in a consistent format "MyCustomerName_Invoice_12345678.pdf". The snippet above uses the word "Invoice" as the anchor to get the invoice number and it doesn't care what the file extension is. You could use a CASE statement to test for a TXT or CSV extension. You could also use CASE to test for any consistent part of the filename. This is a T-SQL snippet so I don't know if this going to help you out of the box, but for what it's worth...[/p]

    SQL & Cold Fusion

  • At some point, this type of logic becomes easier to write and maintain as a script task embedded in a package.

  • EricEyster (2/24/2014)


    At some point, this type of logic becomes easier to write and maintain as a script task embedded in a package.

    Agreed. I'm not at all an SSIS guy beyond spelling it, but if you're going to do this in T-SQL, an alternate approach would be to stop thinking of it in terms of parsing the string by character and start thinking of splitting it out by delimiter. If you're not familiar with Jeff Moden's DelimitedSplit8K function, the article is at http://www.sqlservercentral.com/articles/Tally+Table/72993/. It'll change the way you look at data and think about performance.

    with cte(filename) as (

    select 'MyCustomerName_Invoice_12345678.pdf' union all

    select 'YourCustomerName_Invoice_23456.pdf' union all

    select 'SomeLongerCustomerName_Invoice_123.pdf' union all

    select 'ShortName_Invoice_7891234567890.pdf')

    select filename, s.item

    from cte

    cross apply DelimitedSplit8K(cte.filename, '_') s

    where s.ItemNumber = 3;

  • I'm going to go out on a limb here since I am not an "SSIS guy" either and say that if you can use DelimitedSplit8K you can also use PatternSplitCM (see the 4th article in my signature links). In that article it gives a number of examples for parsing a file name, some of which may be applicable to your case.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • This should handle pretty much anything you can throw at it.

    -- DROP TABLE #FileName

    --===== Create a test table with some normal and "odd" file/path names.

    SELECT FileName = 'C:/Location/Folder/Subfolder/filename20130101.xxx'

    INTO #FileName UNION ALL

    SELECT FileName = 'C:/Location/Folder/Subfolder/filename20130102' UNION ALL

    SELECT FileName = 'filename20130103' UNION ALL

    SELECT FileName = 'filename20130104.xxx' UNION ALL

    SELECT FileName = '20130105.xxx' UNION ALL

    SELECT FileName = '20130106' UNION ALL

    SELECT FileName = '20130107def' UNION ALL

    SELECT FileName = 'abc20130108def' UNION ALL

    SELECT FileName = 'abc20130109def.xxx' UNION ALL

    SELECT FileName = '123abc20130110def.xxx' UNION ALL

    SELECT FileName = 'abcdef.xxx' --will not show up because has no numbers

    ;

    --===== Get just the last set of numbers.

    SELECT fn.FileName, DateOnly = LEFT(FromTheNumbers,ISNULL(NULLIF(PATINDEX('%[0-9][^0-9]%',FromTheNumbers),0),8000))

    FROM #FileName fn

    CROSS APPLY (SELECT RIGHT(FileName,ISNULL(NULLIF(PATINDEX('%[0-9][^0-9]%',REVERSE(FileName)),0),8000))) ca (FromTheNumbers)

    WHERE fn.FileName LIKE '%[0-9]%'

    ;

    Results:

    FileName DateOnly

    ------------------------------------------------- --------

    C:/Location/Folder/Subfolder/filename20130101.xxx 20130101

    C:/Location/Folder/Subfolder/filename20130102 20130102

    filename20130103 20130103

    filename20130104.xxx 20130104

    20130105.xxx 20130105

    20130106 20130106

    20130107def 20130107

    abc20130108def 20130108

    abc20130109def.xxx 20130109

    123abc20130109def.xxx 20130110

    (10 row(s) affected)

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (2/25/2014)


    This should handle pretty much anything you can throw at it.

    -- DROP TABLE #FileName

    --===== Create a test table with some normal and "odd" file/path names.

    SELECT FileName = 'C:/Location/Folder/Subfolder/filename20130101.xxx'

    INTO #FileName UNION ALL

    SELECT FileName = 'C:/Location/Folder/Subfolder/filename20130102' UNION ALL

    SELECT FileName = 'filename20130103' UNION ALL

    SELECT FileName = 'filename20130104.xxx' UNION ALL

    SELECT FileName = '20130105.xxx' UNION ALL

    SELECT FileName = '20130106' UNION ALL

    SELECT FileName = '20130107def' UNION ALL

    SELECT FileName = 'abc20130108def' UNION ALL

    SELECT FileName = 'abc20130109def.xxx' UNION ALL

    SELECT FileName = '123abc20130110def.xxx' UNION ALL

    SELECT FileName = 'abcdef.xxx' --will not show up because has no numbers

    ;

    --===== Get just the last set of numbers.

    SELECT fn.FileName, DateOnly = LEFT(FromTheNumbers,ISNULL(NULLIF(PATINDEX('%[0-9][^0-9]%',FromTheNumbers),0),8000))

    FROM #FileName fn

    CROSS APPLY (SELECT RIGHT(FileName,ISNULL(NULLIF(PATINDEX('%[0-9][^0-9]%',REVERSE(FileName)),0),8000))) ca (FromTheNumbers)

    WHERE fn.FileName LIKE '%[0-9]%'

    ;

    Results:

    FileName DateOnly

    ------------------------------------------------- --------

    C:/Location/Folder/Subfolder/filename20130101.xxx 20130101

    C:/Location/Folder/Subfolder/filename20130102 20130102

    filename20130103 20130103

    filename20130104.xxx 20130104

    20130105.xxx 20130105

    20130106 20130106

    20130107def 20130107

    abc20130108def 20130108

    abc20130109def.xxx 20130109

    123abc20130109def.xxx 20130110

    (10 row(s) affected)

    Now that's a cool approach, Jeff, and it does 1M rows in about 12 seconds.

  • Ed Wagner (2/26/2014)


    ... and it does 1M rows in about 12 seconds.

    Heh... damn. I must be gettin' old and starting to slip. 😛

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (2/26/2014)


    Ed Wagner (2/26/2014)


    ... and it does 1M rows in about 12 seconds.

    Heh... damn. I must be gettin' old and starting to slip. 😛

    If you keep the underscores and use DelimitedSplit8K, it does 1M in about 8 seconds, but the regular expression has never been known to be a speed demon in any language. Nonetheless, for what it does, I'd say that he OP is not going to have 1M incoming files in the file system without running into a host of other problems.

  • I guess "keeping the underscores" would be a prob lem for the OP, though. His original format had no underscores.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (2/27/2014)


    I guess "keeping the underscores" would be a prob lem for the OP, though. His original format had no underscores.

    DOH!!! You're absolutely right. Now my question is where I got the underscores in the first place. :crazy: Now I must be getting old...starting to see things.

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply