Read multiple files of same name and process in DB

  • Hi All,

    I have a requirement like, we will receive multiple csv files in a shared path and those files needs to be processed to the corresponding tables.

    Say for eg: if Employee file has 3 files like employee_2022_08_08_11_11 / employee_2022_08_07_12_00 / employee_2022_08_08_16_07 i need to process these three files to my employee table and say for Department Department_2022_08_08_12_08 to Department table in SQL Server.

    Previously I had a reference table where the file name has stored as

    1, employee

    2,Department

    3,fact_salary

    4,fact_revision

    So we were processing the file names by seeing the reference table, but the requirement has changed , need to process whatever file is there in the path irrespective of dates, after processing it will be moved to another folder(this part is working fine). But how to take multiple files and process without altering the existing logic with the reference table.

    Bulk insert option has been used to load the data to read from Shared path in SSIS package.

    Is there any approach or any suggestions.

     

  • I don't really understand what this reference table is doing.

    To load the files, can't you have 2 x ForEach Loops, with their collections set to each set of files (wildcard matching) so the Employee loop picks up all the employee prefix files, and the Department loop picks up all the Department prefix files.

    Then within each loop, add a dataflow task.  Make the source component dynamic by adding an expression to the connection string property and pointing it at the variable that the Foreach loop populates with the current filename on each iteration of the loop.

    The Destination of the dataflow will be your static employee or department tables.

     

  • Reference table is the place where the actual file name was. Initially it has only one file for each table, so we have a reference table and ensured that we will receive the i/p file name with the particular name, but later it has changed now.

  • I guess I don't understand the issue here.  Just load up all the files name for, say, the employee table and process them in date order.  Then, move the processed files out to the "processed" folder either en masse when the whole roll is done or as each file is processed.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I see a further problem - the files obviously have timestamps associated with the file.  Processing them in 'any order' could be problematic if there could be actions/updates for the same employee(s) in subsequent files.

    For example - let's say employee A was updated in all three files and the correct value for that employee is in the latest file.  If the loop processes the last file first - and the 'earliest file' last then the data in the table is now incorrect.

    Processing that correctly is going to need more than just an SSIS foreach loop.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • The reference table makes little sense because as I understand it, each row represents a target table.  You can't really feed that into a single Dataflow because each table will have different schema/metadata.

    So then I'd expect to see hardcoded dataflows per target table.  Each of those can sit inside its own ForEach loop to only pick up all the files for each table, or a single ForEach with a script task to check each filename and send the file to the relevant dataflow.  So you can add new files without changing the package, but if you add a new table, then I'd expect to see the package changed to add a new data flow.

    Or are you using a Script task as the data source and dynamically creating output paths based on the file name?

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply