Whats best for Incremental loading of dimensions

  • I built my first ETL today to load my product dimension using mainly lookups, conditional splits etc. The etl handles 1 & 2 SCD, new in records and retires old record.

    the source tabel has about 1.9m records, my first run took hours so not impressed with performance. Will this get better after initial load.

    Im using sql 2012 and i know i you can use Merge statement, change date capture or the scd or kimbal scd object in ssis. i plan to experiment with all but has anyone any info on what i should expect regarding performance.

    thanks

  • We use Kimball, it is all I have used. Downside is that data must input SORTED.

    Sorting and aggregating are blocking functions in SSIS. this means, every row has to get to that component before the component can do its work.

    If you can eliminate/decrease the sorts and aggregations it could help.

    I often have to use MergeJoin, which also requires sorted input. If you can avoid Merge Join to avoid the required sort , that can help.

    Otherwise, if you anticipate having to throw out rows, do it as early in the data flow as possible to bring along the least amount of rows possible.

    I don't know what is best, I only know how we've done it and what I've learned from it.

    You could play with buffer sizes as it could assist you, but do so carefully. I also use fast load in my destinations and apply a table lock since I only ever write to staging and nothing but the executing pkg knows the table exists.

Viewing 2 posts - 1 through 1 (of 1 total)

You must be logged in to reply to this topic. Login to reply