How to import Excel log files into SQL table

  • I've been tasked in creating a nightly process that grabs the prior day's CSV file from a source folder into a sql table. There are potentially multiple files per day, so the SSIS pkg would need to keep track of these. I've included a snapshot of the files for reference. My question is what's the best way to create this ETL in BIDS?

    Regards,

    John

  • You can use a For Each Loop to loop over the folder and pick up each .csv file and process it seperately.

    Use the For Each File enumeration with the following expression for files: *.csv

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • I created the For Each Loop w/Enumerator, but in order for it to look in the directory and grab every CSV file and dump it into a SQL table, I also had to add a Data Flow Task inside of the loop to pull from each CSV file as a Data Flow Source, then point to the SQL table as the Data Flow Destination and some minor Derived Column task in between. I also had to create a Variable for the file name(s) in the Loop at the package level, then had to assign that as an expression in the Connection Manager for the source files.

    The loop now looks in the folder share; pulls every qualified CSV file in there; dumps the data into a SQL table. However, my next challenge is this: every day one or more new CSV files are added into the share folder, so I need my process to look in the share every day at some point and ONLY pull those new CSV files and dump the data into the SQL table, while ignoring the other ones that were already pulled in. How do I accomplish this??

    Thanks,

    John

    :unsure:

  • latingntlman (4/11/2011)


    I created the For Each Loop w/Enumerator, but in order for it to look in the directory and grab every CSV file and dump it into a SQL table, I also had to add a Data Flow Task inside of the loop to pull from each CSV file as a Data Flow Source, then point to the SQL table as the Data Flow Destination and some minor Derived Column task in between. I also had to create a Variable for the file name(s) in the Loop at the package level, then had to assign that as an expression in the Connection Manager for the source files.

    The loop now looks in the folder share; pulls every qualified CSV file in there; dumps the data into a SQL table. However, my next challenge is this: every day one or more new CSV files are added into the share folder, so I need my process to look in the share every day at some point and ONLY pull those new CSV files and dump the data into the SQL table, while ignoring the other ones that were already pulled in. How do I accomplish this??

    Thanks,

    John

    :unsure:

    Move the files to another shared folder after processing. (This is one way).

  • I thought of it already, but I do not own that network share; it's owned by the Exchange server admin b/c these are logs generated by email blasts.

  • After processing the CSVs, could you rename them to .CSV.[date-time processed]? This, or physically moving the processed files as already suggested, is the easiest way of allowing your FEL to work simply.

    Otherwise, the whole thing is going to get more and more laboured as

    a) the number of files in the folder increases

    b) the number of files already processed (but still in the folder) increases.

    You could do something a bit more fancy - eg record details of last file processed and last run date in a SQL Server table and use those bits of info next time you run. But it's not an elegant solution.


Viewing 6 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply