Modular ETL Architecture Design in SSIS

Question

Modular ETL Architecture Design in SSIS

aveek22

Mr or Mrs. 500

Points: 509
More actions
October 21, 2019 at 12:00 am

#3687883

Comments posted to this topic are about the item Modular ETL Architecture Design in SSIS
Aveek has been working as an Analytics Engineer for the past couple of years now. His main areas of interests is SQL Server, SSIS, and SSAS. On his leisure, he likes to travel and is also an amateur photographer.
https://twitter.com/avikoleum
https://www.linkedin.com/in/aveekd22/

Viewing 8 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply

davidadumas Newbie Points: 3 More actions · Answer 1

Hi. Good article. I have always used single discreet packages instead of a master package calling child pkgs. That way, when there is an error, once fixed, I can restart the SQ Agent Job from the job step (pkg) that failed.

aveek22 Mr or Mrs. 500 Points: 509 More actions · Answer 2

Thanks. Yeah, I agree on your point as well. But I personally feel that its easier to manage multiple packages in the master packages instead of creating SQL Agent jobs for each package.

Aveek has been working as an Analytics Engineer for the past couple of years now. His main areas of interests is SQL Server, SSIS, and SSAS. On his leisure, he likes to travel and is also an amateur photographer.
https://twitter.com/avikoleum
https://www.linkedin.com/in/aveekd22/

davidadumas Newbie Points: 3 More actions · Answer 3

I do create a "Master" SQL Agent job, which might call 30 related packages, one after another as steps. So, for example, when step 18 fails, I fix the issue, and then run the job again, starting from step 18. I code each pkg so that it can be rerun if it fails anywhere in the pkg (job step).

jyoung Newbie Points: 3 More actions · Answer 4

I also prefer to use separate packages rather than child packages. With separate packages there is no need to have a job per package, just have one job that executes the packages in the correct order. If there is a problem with one package executing then it is easier to correct that package then restart the job at the step where it failed.

If the packages are in the same project (which they should be) then you can use project parameters to define connection strings and project wide parameters.

This is how I organize my ETL architecture - There are two main phases, staging and posting. I create tables for staging the initial data that is extracted. The staging phase includes data cleaning and performing any lookups into other systems for dependent data.

Once data staging is complete and all rules for data quality have been performed on the staged data, the data is then sent to the posting tables. The design of the posting tables is a mirror of the final destination tables, the final destination for the data before being committed to the DW.

aveek22 Mr or Mrs. 500 Points: 509 More actions · Answer 5

Completely agree with your view @jyoung. The only difference is I don't have a separate environment for posting within my infrastructure. So, after the ETL cleansing in stage, I directly load the data into the warehouse tables.

Aveek has been working as an Analytics Engineer for the past couple of years now. His main areas of interests is SQL Server, SSIS, and SSAS. On his leisure, he likes to travel and is also an amateur photographer.
https://twitter.com/avikoleum
https://www.linkedin.com/in/aveekd22/

heb1014 Hall of Fame Points: 3790 More actions · Answer 6

It has also been my experience that it is more challenging to track errors down to their sources with the Integration Services Catalog when using child packages. I have done it both ways and prefer to have a multi-step Agent job where each step executes a package.

Jeffrey Williams SSC Guru Points: 90349 More actions · Answer 7

I will use both approaches - depending on what the project requires. For example - when I want to make sure several processes run parallel - the master/child package approach will allow that - where the agent job approach does not...

Using the master/child approach this way still allows for individual small packages to be grouped in separate 'threads' in the master package and all sharing the same project/package parameters, with inter-dependencies between groupings where needed.

Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

― Charles R. Swindoll

How to post questions to get better answers faster
Managing Transaction Logs

Modular ETL Architecture Design in SSIS

Cookies on SQLServerCentral