I see a lot of questions on the forums about updating existing data in SSIS. When the update is dynamic and is based on elements within your data flow, a common mistake is to use the Ole DB Command within the data flow pane:
The above poorly designed update has us retrieving data from some source, and then running an update on our target database based on one or more values in the source. Does it work? Sure. The problem occurs when you touch more than a handful of rows. When you bring in the Ole Db command into the picture in a data flow, you’ll be firing the statement in this control once for every row of data in your pipeline. So let’s say our source retrieves 100,000 rows into the data flow: the downstream UPDATE command will be executed 100,000 times! Such operations could bring the most capable server to its knees.
A better solution could include staging your data. Using this method, you’ll retrieve the data from the source and write it into a staging table in the destination database. You can then use an Execute SQL task to run your update in a more organic manner.
Set up your data flow as shown above to pull in the data to a staging table, then you can run a single UPDATE statement:
SET InvoiceAmount = st.UpdatedInvoiceAmount
FROM Invoices i
INNER JOIN StagedData st ON i.InvoiceID = st.InvoiceID
The advantage here is that you’re executing the expensive UPDATE statement once for each table rather than once for each row affected.
Of course, there are some situations that explicitly disallow the use of staging tables in destination systems. If storage or access restrictions keep you from using this method, you may have to use the row-by-row insert, so be aware that it's going to be a bottleneck.