New to SSIS with Update problem in etl

  • Hi. I am relatively new to SSIS and am creating a very simple Update/Insert using data flow tasks instead of T-SQL. I have table A and table B. Table A is the source and B is the target. I bring both tables in using OLEDB connections, then I use a conversion task to change table A's key field to nvarchar instead of varchar. The same key field in Table B is nvarchar. I don't pass through the original varchar value, only the nvarchar value for the key and the description. When I sort each table by its key field.

    After the sorts, I use a MergeJoin task and Table A is the left part of the left join and B is the right part and the join field is the key on both tables. So I have the key field on Table A checked and also the description field on Table A as pass through. I have the key field on Table V checked and the description in that table is set to pass through.

    Following the MergeJoin I add a simple Audit to add lineage data and follow this with CONDITIONALSPLIT.

    In the conditional split, I have an Update output and an Input output. Update is on !Isnull(B's key) && upper(description from A) <> upper(description from B). I only want to update if I have to update the description.

    For my Input path, it is simply Isnull(Table B's key).

    Finally I have an OLEDB command for the update path. It is very simple:

    Update TableB

    Set Description = ?

    I mapped the Table A nvarchar description to Param0.

    When I run the update part of the flow, every single record in Table B is updated with the last value of the record in Table A that isn't a new record. So the description is the same in Table B for every record !!!

    I know I am mapping the Table A description to Param 0. I have looked at Resolve Mapping and I can't figure out what I should be doing on that screen. It's new to me.

    Anyway, any assistance is greatly appreciated. If more information is required, please let me know what you need.

  • Maybe you should add a WHERE clause to your UPDATE statement?

    Update TableB

    Set Description = ?

    WHERE TableB.Key = ?

    In the second parameter you map the key that needs to be updated.

    The OLE DB Command updates TableB row by row which is incredibly slow.

    Also, the MERGE JOIN requires sorted input and the SORT component in SSIS is a blocking component (meaning all data has to be read into memory). All this means your package will be slow for larger data sets.

    What I usually do is a reading TableA, fetching data from TableB using a LOOKUP and then use the conditional split to find out if a row is an insert or an update. Inserts are directly inserted into tableB using an OLE DB destination with fast load.

    Updates are insterted into a staging table. After the data flow, an Execute SQL Statement is used to update tableB using the data from tableB.

    Something like this:

    UPDATE B

    SET ...

    FROM TableB B

    JOIN myStagingTable stg ON B.Key = stg.Key

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

Viewing 2 posts - 1 through 1 (of 1 total)

You must be logged in to reply to this topic. Login to reply