SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


SSIS Lookup or Merge Join To Get Specific Dimension Key


SSIS Lookup or Merge Join To Get Specific Dimension Key

Author
Message
Joe Salvatore
Joe Salvatore
SSCommitted
SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)

Group: General Forum Members
Points: 1720 Visits: 82

I need help with understanding an SSIS strategy that can support more complexity than a single natural key lookup/merge join to the data warehouse dimension in order to return a surrogate key.

I need to also constrain this lookup/merge join based on the dimension row's effective and expired dates. In other words three criteria are needed as follows:
1. DataStagingSource.ModifyDate < DataWarehouseDimension.RowExpiredDate AND
2. DataStagingSource.ModifyDate >= DataWarehouseDimension.RowEffectiveDate AND
3. DataStagingSource.NaturalKey = DataWarehouseDimension.NaturalKey

I have really struggled with using the SSIS Lookup Transformation's Advanced Tab and Parameters.

ANY assistance with this problem is appreciated!!





Jamie Thomson
Jamie Thomson
SSCrazy Eights
SSCrazy Eights (9.8K reputation)SSCrazy Eights (9.8K reputation)SSCrazy Eights (9.8K reputation)SSCrazy Eights (9.8K reputation)SSCrazy Eights (9.8K reputation)SSCrazy Eights (9.8K reputation)SSCrazy Eights (9.8K reputation)SSCrazy Eights (9.8K reputation)

Group: General Forum Members
Points: 9799 Visits: 188

Joe,

The LOOKUP transformation only supports equality comparisons. I would hope that this be addressed in the next version (along with a few other issues I have with the LOOKUP transform).

Inthe meantime, the MERGE JOIN component will be the way to go.

-Jamie



Jamie Thomson
http://sqlblog.com/blogs/jamie_thomson
Servaas G Winkelman
Servaas G Winkelman
Valued Member
Valued Member (71 reputation)Valued Member (71 reputation)Valued Member (71 reputation)Valued Member (71 reputation)Valued Member (71 reputation)Valued Member (71 reputation)Valued Member (71 reputation)Valued Member (71 reputation)

Group: General Forum Members
Points: 71 Visits: 3

Maybe a too simple solution and you have been that way, but I see as a workaround :

1) lookup on

the DataStagingSource.NaturalKey = DataWarehouseDimension.NaturalKey

and include in the lookup the following columns :

  • DataWarehouseDimension.RowExpiredDate
  • DataWarehouseDimension.RowEffectiveDate

2) feed that output into a conditional switch, which contains the logic of

DataStagingSource.ModifyDate < DataWarehouseDimension.RowExpiredDate AND DataStagingSource.ModifyDate >= DataWarehouseDimension.RowEffectiveDate

On the output you should have only the records which apply to your conditions.

Hope this is usefull.

- Servaas


Julian Kuiters
Julian Kuiters
SSCarpal Tunnel
SSCarpal Tunnel (4.8K reputation)SSCarpal Tunnel (4.8K reputation)SSCarpal Tunnel (4.8K reputation)SSCarpal Tunnel (4.8K reputation)SSCarpal Tunnel (4.8K reputation)SSCarpal Tunnel (4.8K reputation)SSCarpal Tunnel (4.8K reputation)SSCarpal Tunnel (4.8K reputation)

Group: General Forum Members
Points: 4843 Visits: 1
Sound similar to something I wanted to do a long time ago. I quickly knocked up an example today, so here's an article I wrote for you:

SSIS Lookup with value range
http://www.juliankuiters.id.au/article.php/ssis-lookup-with-range



let me know if it doesn't work for you.




Julian Kuiters
juliankuiters.id.au
Catherine Eibner
Catherine Eibner
Ten Centuries
Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)

Group: General Forum Members
Points: 1234 Visits: 72
I resolved a similar issue by using a parameterised Lookup - but there are a few things to look out for if you do this...

1. SSIS really doesnt like parameters in sub queries so make sure your parameters are in the outer query
2. You need to map in the columns tab to each of the columns used in the Advanced Tab mapping - even though this is ignored at run time.

See here for more details: http://blog.cybner.com.au/2008/03/working-with-complex-lookups-in-ssis.html


Kindest Regards,

Catherine Eibner
cybner.com.au
robinwesley
robinwesley
SSC Journeyman
SSC Journeyman (77 reputation)SSC Journeyman (77 reputation)SSC Journeyman (77 reputation)SSC Journeyman (77 reputation)SSC Journeyman (77 reputation)SSC Journeyman (77 reputation)SSC Journeyman (77 reputation)SSC Journeyman (77 reputation)

Group: General Forum Members
Points: 77 Visits: 23
Julian, thanks. This was very useful.
CozyRoc
CozyRoc
One Orange Chip
One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)

Group: General Forum Members
Points: 26190 Visits: 2235
There is solution based on the third-party commercial CozyRoc SSIS+ library. CozyRoc has implemented data flow destination script, which creates memory-efficient range dictionary object. The dictionary object can then be used in CozyRoc Lookup Plus component. For more information and demonstration how to use the script, check here:

http://www.cozyroc.com/script/range-dictionary-destination

---
SSIS Tasks Components Scripts Services | http://www.cozyroc.com/


dave-dj
dave-dj
SSChampion
SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)

Group: General Forum Members
Points: 10350 Visits: 1149
Julian Kuiters (2/9/2006)
Sound similar to something I wanted to do a long time ago. I quickly knocked up an example today, so here's an article I wrote for you:

<a href="http://www.juliankuiters.id.au/article.php/ssis-lookup-with-range">SSIS Lookup with value range</a>
http://www.juliankuiters.id.au/article.php/ssis-lookup-with-range



let me know if it doesn't work for you.

Hi Julian,

I'm just trying to apply something very similiar. What I am trying to acheieve is a SSIS package that I can use on a daily run, as well as date ranges, picking up the relevant location SCD record for the a give 'pick date'.


My source contains the location and a 'pick_date' that I need to use in a location_dim lookup.

I've configured my Lookup SSIS task SQL Statement as:

select * from
(select * from [dbo].[location_dim]) as refTable
where [refTable].[location] = ?
and ( ( [refTable].[effective_start_dt] >= ? and [refTable].[effective_end_dt] <= ? )
or ( [refTable].[effective_start_dt] >= ? and [refTable].[effective_end_dt] IS NULL )
)



couple of things have happened - the package now takes 12 minutes to run (previously it took about 20 seconds!)

Also, it doesn't pair up any values - everthing is redirected to the error output?

so
a) is the best way to speed this up to put an index on the location dimension or is it likely something else is going on here?
b) Is my logic wrong ? I re-read it and it seems ok, so I'm a little miffed as to why it's not finding a record?

You help would be appreciated

thanks.

_____________________________________________________________________________
MCITP: Business Intelligence Developer (2005)
dave-dj
dave-dj
SSChampion
SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)

Group: General Forum Members
Points: 10350 Visits: 1149
ok - i've resolved the first issue of mathing the keys, by changing my look up SQL


select *
from

(select * from [dbo].[location_dim]) as refTable

where [refTable].[location] = ?
AND ( (? >= [refTable].[effective_start_dt] and ? <= [refTable].[effective_end_dt])
or ( [refTable].[effective_end_dt] IS NULL)
)



However - performance is still pretty dreadful.

I've added an index to the dimension table for:
location, effective_start_dt and effective_end_dt, without which, performance is dire. The lookup table is only 18,204 records, so even without an index I would expect things to be quicker.

Using the 'default' lookup - my package takes 5 seconds in BIDS. When I change Cache SQL statement, under the Advanced tab, to the above mentioned code, the package then takes approx 27 seconds!

_____________________________________________________________________________
MCITP: Business Intelligence Developer (2005)
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum








































































































































































SQLServerCentral


Search