SSIS Lookup Cache

  • We are developing packages in a SSIS environment using Version 9.00.3042.00. I’m presently having problems or a misunderstanding on when to use Partial or Full cache on lookups. Seems like the lookup results differ when using these two modes. For example, full lookups often “succeed” (turn green) but result in no matches while changing the same lookup to use partial or no cache will result in data matching but much slower performance. This has been observed on tables of varying sizes, but is most common on large (hundreds of thousands of rows) tables. I would like to know if anyone is experiencing similar issues and if possible to offer some explanation and possible workaround or corrective measures. I would also be interested in knowing if there any other methods to speed up lookup performance. Would like to find out what the best practice for writing SSIS lookups. Any help is much appreciated.

  • Kurt,

    Generally, the only way you would get different results in your lookups is due to changing data, either in the reference tables or in the data in the pipeline. The cache mode should not affect the result of the lookup, only the way the reference data is handled in memory.

    How are you determining whether a lookup succeeded? I assume that you have your lookup transforms set to ignore lookup failures, since you are not having component failures at runtime.

    By the way, here's a good explanation of the different types of cache modes:

    http://blogs.msdn.com/mattm/archive/2008/10/18/lookup-cache-modes.aspx

    hth,

    Tim

    Tim Mitchell, Microsoft Data Platform MVP
    Data Warehouse and ETL Consultant
    TimMitchell.net | @Tim_Mitchell | Tyleris.com
    ETL Best Practices

  • When using 'Full' cache setting you need to make sure that what you are comparing is going to be identical. This is going to be a case-sensitive lookup match, so you will need to make sure that you are trimming spaces and setting the values to UPPER to ensure that they will find a match.

    SSIS: Case-sensitivity in Lookup component

    You will need to do some comparisons on speed and performance since this will be based on the size of the dataset and the amount of memory available to use on the server.

    If you are using SSIS 2008 then you can look at utilizing the Cache Transform to write the data to a disk (like using a hash file in DataStage). This is useful for dimension information that is going to be used in multiple fact table loads like date, customer, store, etc. information. These can then be reused and you don't have to regenerate the dataset for each load.

    Did you know? SQL Server 2008 includes Cache Transform screencast

    ----------------------------------------------------------------------------------------
    Dan English - http://denglishbi.wordpress.com

  • Dan, I didn't realize that case sensitivity only applies in full cache mode. That may be the source of the problem described by the OP.

    Thanks for pointing out that fact... I'll put that feather in my cap.

    Tim Mitchell, Microsoft Data Platform MVP
    Data Warehouse and ETL Consultant
    TimMitchell.net | @Tim_Mitchell | Tyleris.com
    ETL Best Practices

  • SSIS 2008 Performance

    I am trying to import data (a few million rows) from flat files and populate into SQL server tables. Not sure which approach is better - Cache transformation or Staging tables/Merge (SSIS). Would appreciate your reply.

    Thanks,

    Jay

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply