I see a lot of questions on the forums about updating existing data in SSIS. When the update is dynamic and is based on elements within your data flow, a common mistake is to use the Ole DB Command within the data flow pane:
The above poorly designed update has us retrieving data from some source, and then running an update on our target database based on one or more values in the source. Does it work? Sure. The problem occurs when you touch more than a handful of rows. When you bring in the Ole Db command into the picture in a data flow, you’ll be firing the statement in this control once for every row of data in your pipeline. So let’s say our source retrieves 100,000 rows into the data flow: the downstream UPDATE command will be executed 100,000 times! Such operations could bring the most capable server to its knees.
A better solution could include staging your data. Using this method, you’ll retrieve the data from the source and write it into a staging table in the destination database. You can then use an Execute SQL task to run your update in a more organic manner.
Set up your data flow as shown above to pull in the data to a staging table, then you can run a single UPDATE statement:
UPDATE i
SET InvoiceAmount = st.UpdatedInvoiceAmount
FROM Invoices i
INNER JOIN StagedData st ON i.InvoiceID = st.InvoiceID
The advantage here is that you’re executing the expensive UPDATE statement once for each table rather than once for each row affected.
Of course, there are some situations that explicitly disallow the use of staging tables in destination systems. If storage or access restrictions keep you from using this method, you may have to use the row-by-row insert, so be aware that it's going to be a bottleneck.



Subscribe to this blog
Briefcase
Print
Posted by Anonymous on 10 September 2009
Pingback from Dew Drop – September 10, 2009 | Alvin Ashcraft's Morning Dew
Posted by yogesh_t_r on 19 July 2012
We use a Temp table instead of a dedicated staging table. Its Brief and to the point thx.