Exclude rows from the pipeline

  • I am looking for ideas on how to exclude rows from the pipeline. I have incoming rows that I do not want to continue down the pipeline based on whether the value of one of the incoming columns exists in a table on a different server. At present I have a Data Flow that consists of the following:

    1. OLE DB Source gets rows from a database on SERVER1

    2. A Lookup Transformation pointed to a database on SERVER2 uses a column to lookup data in SERVER2, and overwrites the same column with the lookup value it found. Not all column values will be found, so in cases when it is not found it would normally error, but I have the error output set to Ignore Failure so when no value is found it overwrites the column with an empty string. This is how we identify rows we do not want to keep, rows from SERVER1 with no key value in SERVER2.

    3. Then a Conditional Split looks for non-empty values in the column and sends them down the pipeline, the rest go nowhere.

    4. The fourth step in the Data Flow is to push the rows that survived the Conditional Split into a flat file.

    It works, but it feels like a hack. Any suggestions on how else to handle this scenario?

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • While it feels hacky I think that is the way to do it.. About the only other way I could think about would be to create a table in tempdb copy the key values from the key table into it. Even with millions of rows this usually goes pretty quick. Then you can do an exists or not exists check.. Its about the same number of steps..

    CEWII

  • I like your idea about bringing SERVER2's data into tempdb on SERVER1. That should scale up just fine too. Thanks for having a look.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • I agree with Elliott that you're using a best practice scenario there with what's available. For more complex decisions I'll occasionally use an asynchronous script component that will simply toss the row based on complex decision logic (mostly because expressions become utterly unreadable after a certain nesting level of if logic), but you'd still need to bring the lookup in to check your values.

    I personally prefer a second column for decision making like this in case I eventually want to dump excepted records into another datastore, but that's just preference.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Evil Kraig F (4/3/2012)


    I agree with Elliott that you're using a best practice scenario there with what's available. For more complex decisions I'll occasionally use an asynchronous script component that will simply toss the row based on complex decision logic (mostly because expressions become utterly unreadable after a certain nesting level of if logic), but you'd still need to bring the lookup in to check your values.

    I personally prefer a second column for decision making like this in case I eventually want to dump excepted records into another datastore, but that's just preference.

    I like the idea you mentioned of having the extra column for decision making. I would go that route in the future if loading to a database or persisting the results some other way. In this case I decided to overlay the same column since I was headed for a flat file and did not want to have to deal with a change to the signature of the pipeline downstream.

    Thanks for the confirmation. It helps to hear I that I did not veer too far off the grid. The Script Component method could some in handy down the line for me too for more complex checks. Thanks.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply