SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Design Approach - Thoughts/Suggestions?


Design Approach - Thoughts/Suggestions?

Author
Message
HakunaMatata
HakunaMatata
Ten Centuries
Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)

Group: General Forum Members
Points: 1216 Visits: 491
Hey Guys,

I have below scenario -

I am pulling data from table A from database D1 on Server S1 and want to load data into a table B from database D2 on Server S2. I also want to do some transformations like Lookup with same table B from database D2 on Server S2, some derived columns, some datatype conversions etc before loading it into table B from database D2 on Server S2. My SSIS package will be deployed onto Server S3. So to summarize the scenario, I will be running a package off S3 which pulls data from S1, does some transformations and loads data onto S2.

I have two ways to achieve this -

1. Using Single Data Flow Task which will pull the data from S1, use all lookups and other transformation in the same Data Flow Task and load it into S2 Server.

2. I will create one staging table on S2 server where I will be pulling all data from S1 using one Data Flow Task and then use other Data Flow Task to read data from S2 staging table, do all lookup and other data transformations and load data into table B on S2.

Look ups in both the approch will have full caching and lookup data will grow as it will be incremental load.

I am somehow considering using approach 2 to gain on performance side considering full caching and lookup on the fly will slowdown the process and also will increase the network overload between the 3 servers.

Any thoughts/suggestions?

Rolleyes
Elliott Whitlow
Elliott Whitlow
SSC-Insane
SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)

Group: General Forum Members
Points: 24334 Visits: 5314
I would probably go with choice 1, it gives you the most compact and portable package with the only dependencies being 2 connections. If later on you have performance issues you could pretty easily switch it. If you believe that is likely you might add a 3rd connection that is just used for the lookups, making it easier to switch their location.

This coming from a design philosophy that portability and movement from one server to another is a goal and having minimum external dependencies is a plus. In other words you could easily move the package from server 3 to server 2, 1, or even 4 without having to bring along anything else.

Full caching can sometimes spill onto disk but that probably won't be an issue since you'd have that either way. Having the lookups locally can be faster BUT depending on the kind of data you are moving what happens is that you usually you only need a subset of data and not the whole table. On your lookups don't let it handle the query, always write your own with the MINIMUM number of fields required to satisfy the lookup..

CEWII
Daniel Forrester 123
Daniel Forrester 123
SSChasing Mays
SSChasing Mays (600 reputation)SSChasing Mays (600 reputation)SSChasing Mays (600 reputation)SSChasing Mays (600 reputation)SSChasing Mays (600 reputation)SSChasing Mays (600 reputation)SSChasing Mays (600 reputation)SSChasing Mays (600 reputation)

Group: General Forum Members
Points: 600 Visits: 195
I guess there are some factors you should take into consideration like the size of your lookup tables and the complexity of your transformations and where your lookup are.

Are your lookup tables on Server S2? I guess they aren't on S1 else you could just do the lookups in a join when creating the data source. If your lookups have large numbers of records or there are many of them I would look to stage the table on the destination server on s2. Then the lookups and data conversions can be done in a single sql statement (if our lookups are on S2) using joins when moving the data from the staging table which will have a big performance advantage over lookups.
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search