SSIS import multiple csv files by name

  • Hi,

    I have an ever increasing number of files in a single directory (will probably end up with hundreds). There's 5 different types of files, denoted by their file name. For example, I may have (for each file typ):

    ABCZZZZ123.CSV

    DZZZZ872.CSV

    123999_ZZZZ.CSV

    87182XXXXASDFA.CSV

    DX11XXXXA1.CSV

    D21ZZXXXX.CSV

    All files-of-type have identical structures/column headers, etc. That is, all ZZZZ files are identical, all XXX files are identical, etc.

    I need to load all of these files, by type, into SQL Server. I have created a table for each of the different file types. I can successfully process a single file (explicitly specifying its name) into its table.

    So, to load all csv files, I guess I need a ForEachLoop container in SSIS, and some script to split the file name to pick out the type of file it is? My SSIS skills are pretty dire, and I don't really know how best to do this.

    Any guidance/solution would be gratefully received.

    Thanks,

    Andrew

  • Not sure if it'll work but it's something to try...

    Try the Conditional Split Task.

    Capture the file name and parse out the file type in the loop and store in variables.

    Then set the conditions of the splits to the type of file.

    set the destination for each condition path.

    Probably need to set the DelayValidation to True and Mapping might be an issue.

    Probably have to map each file type one at a time first and live with the red dot for a bit.

    I think I've done something like this before with the Condition split but it's been 6 years....

    ---------------------------------------------------------------
    Mike Hahn - MCSomething someday:-)
    Right way to ask for help!!
    http://www.sqlservercentral.com/articles/Best+Practices/61537/
    I post so I can see my avatar :hehe:
    I want a personal webpage 😎
    I want to win the lotto 😀
    I want a gf like Tiffa :w00t: Oh wait I'm married!:-D

  • - removed -

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • I re-read the original post this morning and realized I had misread an important requirement so I removed my last post which was way off the mark.

    I do not think a Conditional Split is necessary here. Since all files have the same structure you can have a ForEach Loop Container (FELC) with a single Data Flow Task. The Data Flow Task will contain a Flat File Source and an OLE DB Destination. The Flat File Source will refer to a Flat File Connection Manager that will build it's ConnectionString using the Variable the FELC uses to store the file name. The Data access mode of the OLE DB Destination will be Table name or view name variable. The Variable it points to will have EvaluateAsExpression set to True and will use the file name Variable in its Expression to determine the table name.

    e.g. this expression says, if the filename contains ZZZ then return TableZZZ, else if it contains XXX return TableXXX, else return TableDefault. You can expand this as necessary just use parentheses liberally to ensure it remains organized

    FINDSTRING( @[User::FileNameVariable], "ZZZ", 1 ) > 0 ? "TableZZZ" : (FINDSTRING( @[User::FileNameVariable], "XXX", 1 ) > 0 ? "TableXXX" : "TableDefault")

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • adb2303 (2/8/2013)


    Hi,

    I have an ever increasing number of files in a single directory (will probably end up with hundreds). There's 5 different types of files, denoted by their file name. For example, I may have (for each file typ):

    ABCZZZZ123.CSV

    DZZZZ872.CSV

    123999_ZZZZ.CSV

    87182XXXXASDFA.CSV

    DX11XXXXA1.CSV

    D21ZZXXXX.CSV

    How do you intend to keep track of which files were loaded and which were not?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • The users will be responsible for clearing out the directory these files are stored in. My instructions are to load whatever's in that directory every time the solution runs, even if the same file gets loaded over and over again. I have a process in place within the SSIS package to archive/de-duplicate data.

  • adb2303 (2/10/2013)


    My instructions are to load whatever's in that directory every time the solution runs, even if the same file gets loaded over and over again.

    Wow. Whoever wrote those instructions has a serious hooie problem. If the database is designed correctly, it won't let you make such a mistake as what they're askig for.

    Since you don't have to actually move files, rename them, or anything else that requires any real power, why not simplify all of this just by using BULK INSERT right straight from some scheduled T-SQL stored procedures? After all, the data is just some consitently formatted CSV files, isn't it?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • SSIS OLE DB Destinations are using the same API that BULK INSERT would. The only difference here is that SSIS can access the file system natively whereas xp_cmdshell would have to be enabled if wanting to do the same using T-SQL, a clear disadvantage imho.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • opc.three (2/10/2013)


    SSIS OLE DB Destinations are using the same API that BULK INSERT would. The only difference here is that SSIS can access the file system natively whereas xp_cmdshell would have to be enabled if wanting to do the same using T-SQL, a clear disadvantage imho.

    Absolutely not true, Orlando. There is absolutely no need of xp_CmdShell for this given "if it's there, load it" circumstance.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • My understanding is the file names, while showing similar patterns, change from day to day.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • opc.three (2/10/2013)


    My understanding is the file names, while showing similar patterns, change from day to day.

    Understood and agreed. But, you don't need xp_CmdShell to read those file names. Just use xp_DirTree and Bob's your uncle. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I could be wrong because I do not do much of this type of programming but I think xp_dirtree will only get you the list of directories however not the files within them. I think to get the list of files within a directory you'd need to use xp_cmdshell.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Heh... you just have to know that I wouldn't say such a thing unless I knew. Please try the following command. The thing that most people don't know about xp_DirTree is that it actually takes 3 parameters. If the third parameter is something other than "0", it will list the file names along with an extra column to identify what is a file and what is not.

    EXEC xp_DirTree 'C:\',1,1

    And, yes... I'm in the process of writing an article on it's use in this area.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Nice. I figured there was something else up your sleeve otherwise you would not have said anything. That could be handy for some as it relates to their ability to enable xp_cmdshell in their environment although you know how I feel about accessing the file system from T-SQL regardless of the means 😉 It's a shame it is officially an undocumented procedure. I maybe could see myself using it for some discovery work in a pinch (if PowerShell remote management, Remote Desktop and xp_cmdshell, yes I use it in a pinch from time to time although I shower as soon as possible afterward, were all unavailable) but could not see myself using it in deployed production code.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply