Advice sought on structure of SSIS project

  • Hello,

    since I am a newbie to SSIS I would like to get input from some more experienced folks on how to approach this task.

    I get a monthly delivery of 12 csv files which each need to be imported into a staging table and then processed into scd-type2 tables. In the files, columns can be added over time, but I will be informed about this ahead of time. Over the last few years the format of the files changed 5 times by adding on columns (5 different versions so to speak). Unfortunately the data has no headers (hoping to change that for future deliveries) but the data quality is reliable.

    My goal is to import the data from the last few years into a slowly changing dimension table and setup a reasonably easy process for future regular deliveries. I have gone through the process for one of the files by setting up different packages for each version and using MERGE to process the changes... so far so good.

    I am struggling with the "organization" and design aspect of the packages. Would I make a new package for each file, for each version or would it be better to make a bigger package containing the import routines for all files from one delivery (one package to execute)? Maybe someone has some suggested reading on best practices for this? Or some words of wisdom?

    One more question came up during my first attempt. I built two packages:
    a) import to staging table -> sql task to merge to scd-table -> file clean up
    b) import to temp table -> sql task to insert from temp table to staging table -> sql task to merge to scd-table -> file clean up

    On a table of 1.4 Million rows package A took 16 minutes while package B took 1 minute, even though there is one more step in package B. The import is obviously the bottleneck, but it seems such a big difference. How can it be explained?

    Thanks in advance for any input!

Viewing 0 posts

You must be logged in to reply to this topic. Login to reply