• Evil Kraig F (11/12/2012)


    Randy Knight (11/12/2012)


    rhythm.varshney (11/12/2012)


    yes.I am importing the data to staging table as it is and there I am applying split on a column say name into first name and last name on the basis of space between two.

    so you meant too say derived column would be better approach ??

    Yes. The SSIS expression language is going to be much easier to do this in. And while it is a bit of a paradigm shift, get away from the staging table idea. We want to be doing ETL (Extract, Tranform, Load) rather than ELT (Extract, Load, Transform). SSIS excels at doing these things in memory, in the pipeline. Writing to staging table(s) in the interim slows things down considerably. When things get overly complex for a single dataflow, I will often write to raw files for the interim step rather than a db table ... still much faster.

    Right up until you need to sort data for aggregations or the like, I agree with you. When you need to do that, T-SQL tends to blow the dataflow stream away. I agree with the general principal though. In a staging table you need to write at least twice (once in, once transformed), where if you transform before you write, it's a single dropoff.

    However, don't completely throw away the idea of a staging table. You cannot UPDATE or DELETE from an existing table from SSIS in bulk. If you're doing a MERGE equivalent and not merely delivering new rows to the existing table, deliver to staging and then MERGE it in. That's still part of the LOAD components of ETL though, so I may just be splitting hairs with what Randy was recommending.

    Yes, actually if you see my comments I mentioned that I am performing UPINSERT logic on the basis of staging table which hold around 50 K records daily basis.So was thinking to handle the split on the same.As was having the staging table and did not want to insert derived columns additionally.But not sure about the performance.