Data Flow task for "Gradual Lookup Cache"?

  • I suspect I know the answer: "No!" but just in case, let me ask...

    Is it possible to configure a dataflow lookup task to "gradually" build its cache.

    Typically

    In my lookup, I want to check for existence of a related row.

    If the related row does not exists, I redirect onto an OLD DB Command where I create the missing row, followed by another lookup with no caching to return the new row, then merge this back with the origin flow.

    Unfortunately, this generates lots of OLE Db Command (and secondary lookup) if a new value is used many times in my flow.

    QUESTION: Is it possible to configure the initial Lookup for checking each time on missing rows?

    At the moment, it looks like my Lookup just gives up on a value if it does not find it the first time.

    I think I have seen this achieved through some sort of "home made" lookup through a script component but wondered whether I am missing something with the Lookup task.

    Thanks

    Eric

  • I presume this is because you may have multiple occurrences of the 'new' lookup item in your source data and you don't want to risk creating something which is already there?

    So in the following scenario, what do you want to happen to row 2?

    Source row 1 not found in target.

    Source row 2 not found in (original version of) target, but same key as Source row 1.

    If there is no risk of duplicate keys in the source data, it would be safe to use a fully cached lookup which does not get refreshed (and no need to use OLEDB command either - just INSERT them.


  • Yes... but I am trying to avoid putting my (Fact) table in a staging table so I would like to drop the rows straight into the destination, therefore I need the Lookup key value straight away...

    If going through a staging table, then indeed it gets a lot easier.

  • Eric Mamet (3/20/2014)


    Yes... but I am trying to avoid putting my (Fact) table in a staging table so I would like to drop the rows straight into the destination, therefore I need the Lookup key value straight away...

    If going through a staging table, then indeed it gets a lot easier.

    OK. Can you split this into two phases then:

    1) Data flow to insert missing lookup rows.

    2) Main data flow using cached lookup

    The data source for (1) would just be something like

    select distinct keycol, <otherdata>

    from source

    and obviously it could also use a cached lookup and redirect unmatched rows to do a direct INSERT in the lookup table.


  • Funny you would say that, this is exactly what I tried yesterday...

    I was working on an IIS Log file and reading the file looked very quick so I adopted that approach.

    I therefore read the file twice but that's no problem.

    Also, the fact that it's a "static" file means that the data does not change between the two reads.

    It actually works very well.

    Thanks

Viewing 5 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply