Data Flow filtering puzzle

  • Is there a way, in the SSIS data flow, to retain in the data flow only the records that have the max value in a certain field, grouped by other fields?

    I have a contract number and version nb, of which there may be multiple records, and for each distinct contract nb and version nb, I only want the record with max(sequence nb) for that contract nb and version nb.

    So if coming into my data flow I have

    contract nb, version nb, sequence nb

    10000, 001,0

    10000,001,1

    10000,001,2

    10001,001,0

    I want to end up with 2 records:

    10000,001,2

    10001,001,0

    In Transact-SQL I would express it

    Select cntrc_nb, ver_nb, max(seq_nb)

    from table

    group by cntrc_nb, ver_nb

    Is there a way to do that in the data flow? I looked at conditional split, but haven't figured out how to do it.

    Thanks,

    Holly

  • You should be able to use an aggregate component. Let me know if you need help configuring this.

    Kindest Regards,

    Frank Bazan

  • hey frank can i know how can we do that using aggragate function ...bcoz we need to mention the distinct nid and the max of the other two columns so how can we do that suing aggregatio plz let me know

    Thanks,
    Chinna

    Its the Journey which gives you Happiness not the Destination-- Dan Millman

  • Thank you, Frank. I see how to configure the Aggregate component, I think, and that allows me to select the correct sequence nb. But I don't see how to get the rest of the fields from the source component back into the data flow. In addition to the fields I mentioned, there are a number of text fields, which I can't bring into the aggregate component, because they aren't relevant to the grouping.

    I wanted to use a merge join, but the Source component only has one output, so I don't see how to get the remaining fields back into the flow.

    Holly

  • From what I've understood in the original question you wanted the equivalent of T-SQL's:-

    SELECT DISTINCT contract nb, version nb, MAX(sequence nb).

    You drop an aggregate component on to the dataflow. Connect the pipeline to it. Then check all 3 columns. In the column list on the bottom of the component editor choose the required operation for each column. In this case you want to GROUP BY for contract nb, version nb and MAXIMUM for sequence nb.

    Note that you will only see the MAX, MIN and other mathematical functions with an number datatype.

    Hope this helps.

    Kindest Regards,

    Frank Bazan

  • Sorry I posted my last before I saw your next post...

    In which case what you have to do is a multicast. You aggregate one stream and then merge it back to the non aggregated pipeline.

    In order to do this you just need to make sure that the joining columns are present in both streams. In your case you should be able to join back using your 2 text fields (one of them was contract nb, can't remember the other). You may need to add a sort or two in there as well before you can merge.

    Kindest Regards,

    Frank Bazan

  • Perfect!!! I had to put a Sort in both the flows and then used a Merge Join to get them back together.

    I really do appreciate your help. I might have figured that out after much time, after all they say monkeys could write Shakespeare given enough time (I don't believe that for a minute!). But thanks to your help I have it working in a time frame my employer likes.

    Holly

  • If joining columns are not present in both streams, you can choose to ignore the columns. It will be considered as nulls instead.

    --------------------------------------------------------------------------------------
    Save our world, its all we have! A must watch video Pale Blue Dot[/url]

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply