• Hi,

    I've not noticed anyone else suggest this alternative, so here goes:

    Use cache files to slim down your data.

    Try to avoid putting data into staging tables if at all possible. It is costly in i/o and creates extra steps the data has to pass through. That said in some circumstances it is unavoidable. The T-SQL MERGE is also not that efficient so you need to do this only on the data you need. Ideally avoid it if you can as MERGE has to do a full table scan of the target table during operation.

    Before loading data, create a cache file based upon the natural key of the target table. This is what defines a unique record based upon your business rules. Include the surrogate key also if available. Now within your data flow now refer to this cache file with a Lookup task. Rows which match are updates, rows which don't are inserts. Direct your non-matches (inserts) directly into the target table. Direct your matches into a staging table. Create a stored procedure to batch update your target table from this staging table (I do batches of 1000 rows at a time). And now that you know that all the records in your staging table are updates, you can avoid using the MERGE statement and just use UPDATE.

    This gets you closer to performing as much computation in memory (RAM) as possible, which is much more efficient than storing and calculating with data on disk.