When to separate SSIS packages?

  • Hello!

    I'm a big fan of doing a lot of ETL and other processes with stored procedures. These days, I've been diving more into SSIS and trying to leverage a mixture of both to handler certain tasks.

    SSIS is pretty straight forward with a lot of configuration options and approaches to handling different tasks that make senses.

    However, for performance optimization, I wonder when would be a good time split complex data flows into separate packages as opposed to jamming them into one big package.

    For example, if I'm simply loading a few fact tables and dimension tables from some flat files into a SQL Server destination, would it be better to create a package per file or one big package for all files?

    I went the first route with putting everyone into one package. It runs decently enough with no major concerns on my end. But, the data does fluctuate in size and logic tells me that at some point, bottlenecks may occur being it's a lot of bulk loading in one package versus maybe partitioning them out into many separate packages.

    From a management standpoint, it's easier on me to manage everything in one package. It also allows me to run multiple data loads in parallel. But again, I worry from a performance standpoint, it doesn't feel like a best practice and maybe I should split them out into multiple packages and maybe have a master package controlling each slave package in a nice neat flow.

  • I'd have them all as separate packages. You can have a new "Master" package which calls the others one by one, one advantage of this is that the called packages can be run in parallel. So if you have five SSIS pacakges you'd then have one Master package which has five "execute package task" entries.

  • daniel.freedman 80164 (9/4/2015)


    I'd have them all as separate packages. You can have a new "Master" package which calls the others one by one, one advantage of this is that the called packages can be run in parallel. So if you have five SSIS pacakges you'd then have one Master package which has five "execute package task" entries.

    Yeah, that's what I was thinking about, but I have a few facts and over 30 dimensions to import. That would be around 30+ packages plus the one master package.

    Now, I see no issue combining a large number of the dimensions together simply because the file sizes are so small. Just more or less worried about the larger facts.

  • In my opinion this is all depending on maintenance, complexity and dependencies.

    Having small tasks each in seperate packages it's easy to execute a single one (if required) and it's easy to run in parallel.

    But having many seperate packages it's hard to keep sequential integrity when some packages can only run after other packages have succeeded. That's the point when you should integrate them into a single package.

    Also when several packages are using the same source data, it's easier to combine them into a single package. Within this package you only have to define (and thus read) the source once. Next you can split the flow into several forks of the source data. Within this single package it is still possible to define parallel processing of flows.

    ** Don't mistake the ‘stupidity of the crowd’ for the ‘wisdom of the group’! **
  • HanShi (9/4/2015)


    In my opinion this is all depending on maintenance, complexity and dependencies.

    Having small tasks each in seperate packages it's easy to execute a single one (if required) and it's easy to run in parallel.

    But having many seperate packages it's hard to keep sequential integrity when some packages can only run after other packages have succeeded. That's the point when you should integrate them into a single package.

    Also when several packages are using the same source data, it's easier to combine them into a single package. Within this package you only have to define (and thus read) the source once. Next you can split the flow into several forks of the source data. Within this single package it is still possible to define parallel processing of flows.

    Aye. For this particular flow, none are depending on each other so they can technically all run together at once if running 33+ packages in parallel is ideal. :w00t:

    But, I think I can ideally get away with all the dimensions in one package and maybe a separate package per fact table that is all controlled by a master.

    1 package for dimensions, 5 packages for facts, 1 master package = 7 packages.

  • You can run multiple dataflows in parallel in the same package but as has been said it depends on complexity and dependencies and performance issues.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply