Get first row for a key value that is duplicated

  • I was unable to find a scenario that was similar to mine. I am forced to open a new topic. Please help me if you can.

    I have two data sources(table) from the same database. The first table is at the customer level and the second at the account level.

    Customer ID AgeSex

    1 25M

    2 35F

    3 45M

    4 60F

    Customer ID Account ID Balance

    1 1A50

    1 1B20

    2 2A3

    2 2B40

    2 2C25

    3 3A34

    4 4A23

    4 4B67

    In my output I need CustomerID Age Sex AccountID and Balance. But I need just the first row for each customer from the account table. So the output looks like this -

    Customer ID AgeSexAccount IDBalance

    1 25M1A50

    2 35F2A3

    3 45M3A34

    4 60F4A23

    I was thinking of creating a count number that increases for each row and then resets at the next Customer ID. Then I can output just the rows that have count<2. I tried several data transformations but failed and I'm at my wits end.

    Has anybody come across something that is close to this problem? Any help is appreciated.

    Thanks.

  • This could be solved outside of SSIS as well...

    With a TSQL subselect, you could get the first Account for a CustomerID

    Something like so

    select whateveryouneedhere

    from customers as c

    inner join accounts as a

    on c.customer_id = a.customer_id

    inner join (

    select customer_id, min(account_id) as account_id from accounts group by customer_id

    ) as minaccounts

    on a.customer_id = minaccounts.customerid

    and a.account_id = minaccounts.account_id

    ~PD

  • Thanks. The db is DB2 and I need to have this done in SSIS as we are not allowed to customize on the db side.

  • So you cannot write custom TSQL select statements on the DB side? Odd

  • Try working with the aggregate transform, havent personally used min and max, but it should be pretty standard.

    Conceptually something like following:

    a) Use a sort transform and sort by customer ID

    b) Use a aggregate transform to get the min(account ID)

    c) Use a merge join to get all static for customer joined to aggregation on customer and account

    Something like that

    Careful though, sorts chew insane amounts of memory when you have a big dataset.

    Good luck!

    ~PD

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply