Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

The Smiling DBA

Thomas LeBlanc ( MCITP 2005/2008 & MCDBA 2000) is a Senior SQL Server DBA at Turner Industries, LLC in Baton Rouge, LA. He has been in the IT field for 21 years starting as a COBOL programmer, graduating to dBase, FoxPro, Visual FoxPro, upgrading to Visual Basic versions 3-6 and even some .Net(C#). Designing and developing normalized database has become his passion. Full-time DBA work started about 9 years ago for Thomas while working at a Paper Mill in St. Francisville, LA continuing with IEM, then Amedisys. Performance tuning and reviewing database design and code was an everyday occurrence for DBAs at Amedisys. Thomas’ free time is spent helping those less fortunate and improving his relationship with his family and God.

Lookup transformation in SSIS: Performance Improvements

There are 2 suggestions I hear about when performance tuning the Lookup transformation on a large dimension table when getting the surrogate key for a fact table. I hope to introduce here a third.

The first is the caching option.

image

 

The Full Cache option will cache the whole table (or query columns). If you have a large dimension table, this might use a lot of memory.

The Partial Cache will load the matching rows but will clear least used lookup values once the memory size of the cache is exceed.

No Cache will query the data each and every time a lookup value is needed. No good is running lots of lookups on a large table

 

I believe using Full Cache is good when you have no memory limitations and a small dimension table. The Partial Cache is good for large tables. Not sure where No Cache would be used unless the lookup is not used much in the Data Flow Task.

The next suggestion is instead of selecting the whole table, use a T-SQL query to select only the columns you need,

image

By using only the needed columns in the SELECT T-SQL statement, you limit the amount of information retrieve into SQL Server’s Buffer Pool and the SSIS cache.

One more option can help improve this query even more – a proper index.

SELECT ProductSKey, ProductID
  FROM DimProduct
WHERE Status = 1

This query used above will still do a Clustered Index scan or Non-clustered Index with a Key Lookup in the execution plan based on the indexes available like an index on the just the status column.


But, if you can create a covering index with the Status as a Non-Clustered Index including ProductSKey (surrogate key) and ProductID (natural key) in the index, you can get better performance on the SQL instance side.


The cost of the query went from 0.0178649 to 0.0065309. The statistics IO went from 21 logical reads to 7 and the execution went from a Clustered Index scan to a Non-clustered Index Seek.


image  image


Using the partial cache and proper index on dimension table helped reduce a look up on one of our dimension tables from 3 seconds to less than one second on a 200,000+ row dimension table at my current employer. The execution performance was viewed in the new SSISDB statistics report in the 2012 version of SSIS. More on that in another blog.

Comments

Leave a comment on the original post [thesmilingdba.blogspot.com, opens in a new window]

Loading comments...