Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
Log in  ::  Register  ::  Not logged in

The Smiling DBA

Thomas LeBlanc is a Business Intelligence Consultant/Data Warehouse Architect in Baton Rouge, Louisiana. He uses his 25+ years in IT to help develop OLTP systems with normalized databases for high-performing T-SQL and end-to-end dimensional data marts using SSIS, SSAS, PPS, and Excel. His SQL Server certifications include MCSA 12, MCITP 08 BI and DBA, MCITP 2005 DBA, and MCDBA 2000. As a PASS volunteer, he is current chair of the Excel BI virtual chapter, past chair of the Data Architecture virtual chapter, and past virtual chapter mentor. He has helped the Baton Rouge SQL Server User Group with SQLSaturdays and speaks at local IT meetings.

Lookup transformation in SSIS: Performance Improvements

There are 2 suggestions I hear about when performance tuning the Lookup transformation on a large dimension table when getting the surrogate key for a fact table. I hope to introduce here a third.

The first is the caching option.



The Full Cache option will cache the whole table (or query columns). If you have a large dimension table, this might use a lot of memory.

The Partial Cache will load the matching rows but will clear least used lookup values once the memory size of the cache is exceed.

No Cache will query the data each and every time a lookup value is needed. No good is running lots of lookups on a large table


I believe using Full Cache is good when you have no memory limitations and a small dimension table. The Partial Cache is good for large tables. Not sure where No Cache would be used unless the lookup is not used much in the Data Flow Task.

The next suggestion is instead of selecting the whole table, use a T-SQL query to select only the columns you need,


By using only the needed columns in the SELECT T-SQL statement, you limit the amount of information retrieve into SQL Server’s Buffer Pool and the SSIS cache.

One more option can help improve this query even more – a proper index.

SELECT ProductSKey, ProductID
  FROM DimProduct
WHERE Status = 1

This query used above will still do a Clustered Index scan or Non-clustered Index with a Key Lookup in the execution plan based on the indexes available like an index on the just the status column.

But, if you can create a covering index with the Status as a Non-Clustered Index including ProductSKey (surrogate key) and ProductID (natural key) in the index, you can get better performance on the SQL instance side.

The cost of the query went from 0.0178649 to 0.0065309. The statistics IO went from 21 logical reads to 7 and the execution went from a Clustered Index scan to a Non-clustered Index Seek.

image  image

Using the partial cache and proper index on dimension table helped reduce a look up on one of our dimension tables from 3 seconds to less than one second on a 200,000+ row dimension table at my current employer. The execution performance was viewed in the new SSISDB statistics report in the 2012 version of SSIS. More on that in another blog.


Leave a comment on the original post [, opens in a new window]

Loading comments...