SSIS Lookup or Merge Join To Get Specific Dimension Key

  • I need help with understanding an SSIS strategy that can support more complexity than a single natural key lookup/merge join to the data warehouse dimension in order to return a surrogate key.

    I need to also constrain this lookup/merge join based on the dimension row's effective and expired dates. In other words three criteria are needed as follows:

    1. DataStagingSource.ModifyDate < DataWarehouseDimension.RowExpiredDate AND

    2. DataStagingSource.ModifyDate >= DataWarehouseDimension.RowEffectiveDate AND

    3. DataStagingSource.NaturalKey = DataWarehouseDimension.NaturalKey

    I have really struggled with using the SSIS Lookup Transformation's Advanced Tab and Parameters.

    ANY assistance with this problem is appreciated!!

     

  • Joe,

    The LOOKUP transformation only supports equality comparisons. I would hope that this be addressed in the next version (along with a few other issues I have with the LOOKUP transform).

    Inthe meantime, the MERGE JOIN component will be the way to go.

    -Jamie

     

  • Maybe a too simple solution and you have been that way, but I see as a workaround :

    1) lookup on

    the DataStagingSource.NaturalKey = DataWarehouseDimension.NaturalKey

    and include in the lookup the following columns :

    • DataWarehouseDimension.RowExpiredDate 
    • DataWarehouseDimension.RowEffectiveDate

    2) feed that output into a conditional switch, which contains the logic of

    DataStagingSource.ModifyDate < DataWarehouseDimension.RowExpiredDate AND  DataStagingSource.ModifyDate >= DataWarehouseDimension.RowEffectiveDate

    On the output you should have only the records which apply to your conditions.

    Hope this is usefull.

     

    - Servaas

     

     

  • Sound similar to something I wanted to do a long time ago. I quickly knocked up an example today, so here's an article I wrote for you:

    SSIS Lookup with value range

    http://www.juliankuiters.id.au/article.php/ssis-lookup-with-range

    let me know if it doesn't work for you.


    Julian Kuiters
    juliankuiters.id.au

  • I resolved a similar issue by using a parameterised Lookup - but there are a few things to look out for if you do this...

    1. SSIS really doesnt like parameters in sub queries so make sure your parameters are in the outer query

    2. You need to map in the columns tab to each of the columns used in the Advanced Tab mapping - even though this is ignored at run time.

    See here for more details: http://blog.cybner.com.au/2008/03/working-with-complex-lookups-in-ssis.html


    Kindest Regards,

    Catherine Eibner
    cybner.com.au

  • Julian, thanks. This was very useful.

  • There is solution based on the third-party commercial CozyRoc SSIS+ library. CozyRoc has implemented data flow destination script, which creates memory-efficient range dictionary object. The dictionary object can then be used in CozyRoc Lookup Plus component. For more information and demonstration how to use the script, check here:

    http://www.cozyroc.com/script/range-dictionary-destination

    ---
    SSIS Tasks Components Scripts Services | http://www.cozyroc.com/

  • Julian Kuiters (2/9/2006)


    Sound similar to something I wanted to do a long time ago. I quickly knocked up an example today, so here's an article I wrote for you:

    <a href="http://www.juliankuiters.id.au/article.php/ssis-lookup-with-range">SSIS Lookup with value range</a>

    http://www.juliankuiters.id.au/article.php/ssis-lookup-with-range

    let me know if it doesn't work for you.

    Hi Julian,

    I'm just trying to apply something very similiar. What I am trying to acheieve is a SSIS package that I can use on a daily run, as well as date ranges, picking up the relevant location SCD record for the a give 'pick date'.

    My source contains the location and a 'pick_date' that I need to use in a location_dim lookup.

    I've configured my Lookup SSIS task SQL Statement as:

    select * from

    (select * from [dbo].[location_dim]) as refTable

    where [refTable].[location] = ?

    and ( ( [refTable].[effective_start_dt] >= ? and [refTable].[effective_end_dt] <= ? )

    or ( [refTable].[effective_start_dt] >= ? and [refTable].[effective_end_dt] IS NULL )

    )

    couple of things have happened - the package now takes 12 minutes to run (previously it took about 20 seconds!)

    Also, it doesn't pair up any values - everthing is redirected to the error output?

    so

    a) is the best way to speed this up to put an index on the location dimension or is it likely something else is going on here?

    b) Is my logic wrong ? I re-read it and it seems ok, so I'm a little miffed as to why it's not finding a record?

    You help would be appreciated

    thanks.

    _____________________________________________________________________________MCITP: Business Intelligence Developer (2005)

  • ok - i've resolved the first issue of mathing the keys, by changing my look up SQL

    select *

    from

    (select * from [dbo].[location_dim]) as refTable

    where [refTable].[location] = ?

    AND ((? >= [refTable].[effective_start_dt] and ? <= [refTable].[effective_end_dt])

    or ( [refTable].[effective_end_dt] IS NULL)

    )

    However - performance is still pretty dreadful.

    I've added an index to the dimension table for:

    location, effective_start_dt and effective_end_dt, without which, performance is dire. The lookup table is only 18,204 records, so even without an index I would expect things to be quicker.

    Using the 'default' lookup - my package takes 5 seconds in BIDS. When I change Cache SQL statement, under the Advanced tab, to the above mentioned code, the package then takes approx 27 seconds!

    _____________________________________________________________________________MCITP: Business Intelligence Developer (2005)

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply