Modular ETL Architecture Design in SSIS

  • aveek22

    SSC-Addicted

    Points: 484

    Comments posted to this topic are about the item Modular ETL Architecture Design in SSIS

    Aveek has been working as an Analytics Engineer for the past couple of years now. His main areas of interests is SQL Server, SSIS, and SSAS. On his leisure, he likes to travel and is also an amateur photographer.
    https://twitter.com/avikoleum
    https://www.linkedin.com/in/aveekd22/

  • davidadumas

    Newbie

    Points: 2

    Hi.  Good article.  I have always used single discreet packages instead of a master package calling child pkgs.  That way, when there is an error, once fixed, I can restart the SQ Agent Job from the job step (pkg) that failed.

  • aveek22

    SSC-Addicted

    Points: 484

    Thanks. Yeah, I agree on your point as well. But I personally feel that its easier to manage multiple packages in the master packages instead of creating SQL Agent jobs for each package.

    Aveek has been working as an Analytics Engineer for the past couple of years now. His main areas of interests is SQL Server, SSIS, and SSAS. On his leisure, he likes to travel and is also an amateur photographer.
    https://twitter.com/avikoleum
    https://www.linkedin.com/in/aveekd22/

  • davidadumas

    Newbie

    Points: 2

    I do create a "Master" SQL Agent job, which might call 30 related packages, one after another as steps.  So, for example, when step 18 fails, I fix the issue, and then run the job again, starting from step 18.  I code each pkg so that it can be rerun if it fails anywhere in the pkg (job step).

  • jyoung

    Newbie

    Points: 3

    I also prefer to use separate packages rather than child packages. With separate packages there is no need to have a job per package, just have one job that executes the packages in the correct order. If there is a problem with one package executing then it is easier to correct that package then restart the job at the step where it failed.

    If the packages are in the same project (which they should be) then you can use project parameters to define connection strings and project wide parameters.

    This is how I organize my ETL architecture - There are two main phases, staging and posting. I create tables for staging the initial data that is extracted. The staging phase includes data cleaning and performing any lookups into other systems for dependent data.

    Once data staging is complete and all rules for data quality have been performed on the staged data, the data is then sent to the posting tables. The design of the posting tables is a mirror of the final destination tables, the final destination for the data before being committed to the DW.

  • aveek22

    SSC-Addicted

    Points: 484

    Completely agree with your view @jyoung. The only difference is I don't have a separate environment for posting within my infrastructure. So, after the ETL cleansing in stage, I directly load the data into the warehouse tables.

    Aveek has been working as an Analytics Engineer for the past couple of years now. His main areas of interests is SQL Server, SSIS, and SSAS. On his leisure, he likes to travel and is also an amateur photographer.
    https://twitter.com/avikoleum
    https://www.linkedin.com/in/aveekd22/

  • heb1014

    Hall of Fame

    Points: 3790

    It has also been my experience that it is more challenging to track errors down to their sources with the Integration Services Catalog when using child packages.  I have done it both ways and prefer to have a multi-step Agent job where each step executes a package.

  • Jeffrey Williams

    SSC Guru

    Points: 88659

    I will use both approaches - depending on what the project requires.  For example - when I want to make sure several processes run parallel - the master/child package approach will allow that - where the agent job approach does not...

    Using the master/child approach this way still allows for individual small packages to be grouped in separate 'threads' in the master package and all sharing the same project/package parameters, with inter-dependencies between groupings where needed.

    Jeffrey Williams
    Problems are opportunities brilliantly disguised as insurmountable obstacles.

    How to post questions to get better answers faster
    Managing Transaction Logs

Viewing 8 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply