Design Approach - Thoughts/Suggestions?

  • Hey Guys,

    I have below scenario -

    I am pulling data from table A from database D1 on Server S1 and want to load data into a table B from database D2 on Server S2. I also want to do some transformations like Lookup with same table B from database D2 on Server S2, some derived columns, some datatype conversions etc before loading it into table B from database D2 on Server S2. My SSIS package will be deployed onto Server S3. So to summarize the scenario, I will be running a package off S3 which pulls data from S1, does some transformations and loads data onto S2.

    I have two ways to achieve this -

    1. Using Single Data Flow Task which will pull the data from S1, use all lookups and other transformation in the same Data Flow Task and load it into S2 Server.

    2. I will create one staging table on S2 server where I will be pulling all data from S1 using one Data Flow Task and then use other Data Flow Task to read data from S2 staging table, do all lookup and other data transformations and load data into table B on S2.

    Look ups in both the approch will have full caching and lookup data will grow as it will be incremental load.

    I am somehow considering using approach 2 to gain on performance side considering full caching and lookup on the fly will slowdown the process and also will increase the network overload between the 3 servers.

    Any thoughts/suggestions?

    :rolleyes:

  • I would probably go with choice 1, it gives you the most compact and portable package with the only dependencies being 2 connections. If later on you have performance issues you could pretty easily switch it. If you believe that is likely you might add a 3rd connection that is just used for the lookups, making it easier to switch their location.

    This coming from a design philosophy that portability and movement from one server to another is a goal and having minimum external dependencies is a plus. In other words you could easily move the package from server 3 to server 2, 1, or even 4 without having to bring along anything else.

    Full caching can sometimes spill onto disk but that probably won't be an issue since you'd have that either way. Having the lookups locally can be faster BUT depending on the kind of data you are moving what happens is that you usually you only need a subset of data and not the whole table. On your lookups don't let it handle the query, always write your own with the MINIMUM number of fields required to satisfy the lookup..

    CEWII

  • I guess there are some factors you should take into consideration like the size of your lookup tables and the complexity of your transformations and where your lookup are.

    Are your lookup tables on Server S2? I guess they aren't on S1 else you could just do the lookups in a join when creating the data source. If your lookups have large numbers of records or there are many of them I would look to stage the table on the destination server on s2. Then the lookups and data conversions can be done in a single sql statement (if our lookups are on S2) using joins when moving the data from the staging table which will have a big performance advantage over lookups.

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply