SSIS 2008- Design optimization / Best Practices

  • Hi,

    I am in trying to design ETL process using SSIS 2008. The source (DB2 and Siebel) has tables Tbl_s1, Tbl_s2, Tbl_s3 …. Tbl_sN and the destination (SQL Server 2008) has Tbl_d1, Tbl_d2, Tbl_d3 …. Tbl_dN tables, where Tbl_s1 has the same structure as Tbl_d1 i.e. Tbl_s1 -> Tbl_d1 , similarly Tbl_s2 -> Tbl_d2 ….. Tbl_sN -> Tbl_dN.

    The data in destination tables is to be truncated and should be reloaded with source data for all tables. Every time the package is executed and the packages requires to process different regional data based on the parameter value that is being passed during runtime.

    Please suggest the best design optimization / Best Practices for achieving this task.

    Thanks in advance..

  • As metadata in a package can not be changed while the package is running, you should create a package "on the fly" with the source and destination you need, validate the metadat therein , and then run this newly created package.

    A solution that I have seen working in SSIS2005 was:

    • create a table that holds the info about the dataflows: source instance, db, table (or query), and destination instance, db, table
    • create all destination tables

    Then create a package with the following components:

    • An execute sql task loads the info about the dataflows from the table you created in step 1 to a variable
    • a For Each loop that executes the next steps for each record in the variable you just filled, and puts the values from the current record in their own parameter
    • a script component that dynamically creates a new package, and creates the 2 connections and a dataflow in the new package (based on the settings from table in the first step), validates then saves the package
    • an execute package that executes the just created package
    • a file component that deletes the just created package

    As said I have seen this working in the environment of a client of mine, where it imported a lot of tables from Oracle OLTP to a SQL staging area before further ETL took it into the SQL DWH.

    If I recall correctly they used a package that has 2 connections available instead of creating a new package from scratch.

    Unfortunately I don't have code or examples available, since creating the script-code may be challenging.

    Hopefully this approach is helpfull to you.

    Peter Rijs
    BI Consultant, The Netherlands

  • What is the value of N? Peter's approach is a good one, but from my experience it can be very time consuming to configure everything to work programmatically - even more so if you haven't had a lot of experience doing so. Even if N was 200 I'd suggest creating the packages manually. If N is 1000 then I would definitely consider a programmatic approach.

    My only suggestion would be to definitely create a package for each table. There might be times when you want to reload a single table and packages also tend to become difficult to work with due to slowness issues if you try to cram too many data flows within the same package.

    Also, how much data are in the tables? If you have a lot of records and/or column wide tables that need to be loaded, you might want to consider looking into some form of CDC so that you don't have to reload large tables every time - unless the data being reloaded is always new.

  • Well, If you can find a good example of the script you can easily go from there.

    http://consultingblogs.emc.com/jamiethomson/archive/2007/03/28/SSIS_3A00_-Building-Packages-Programatically.aspx might be a good place to start your search (maybe a little old).

    The chore of building 200 packages that are essentially the same would be too much of a burden to me, so I would probably put the limit on several tens of packages.

    If I have time to check my archive I will see if I can find some examples of the code. It's an approach I wanted to reexamine myself as well, but never had the occasion to do so.

    Peter Rijs
    BI Consultant, The Netherlands

  • Peter,

    Thank you for the link and I would definitely be interested in seeing an example! I'm sure I would create packages programmatically if it came a little easier. Great post!

  • Programatically building packages is not for the faint of heart, I wouldn't start with that solution..

    How many tables are we talking about here and does their structure change often?

    In a case where we had about 30 tables that rarely changed we just built a package that had a couple data-flows one contained all the smaller tables and the other couple spread out much bigger tables. The query was little more than SELECT * FROM tblA WHERE Field1 BETWEEN ? AND ? and pumped into directly into the destination table.

    It was very simple to build and maintain.

    Don't make it too complex..

    CEWII

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply