TF Upsert destination Taking lot of time

  • Hi,

    I am trying to update/insert data into destination table through Pragmatic works Upsert destination task. It is extremely slow and loading 15 records at a time. there are about 700000 records and 213 columns for a table. This is just example for one table. I have 11 tables to be loaded which is taking lot a time. Few tables have around 5M data.

    Do you guys have any better solution to load the data?

  • I have a suggestion:

    Ditch the Upsert Component and instead do this:

    1. Add an Execute SQL Task above your Data Flow to truncate a new staging table* in the target database or a database on the same instance with the target database, e.g. in a staging-database.

    2. In your Data Flow, load the incoming rows into the staging table using a native OLE DB Destination Component with Fast Load enabled.

    3. Use an Execute SQL Task after your Data Flow to issue SQL that will first UPDATE rows that exist in the staging table and the target table, then INSERT rows from your staging table into your target table that do not already exist in the target table.**

    * This approach assumes you are in control of the target database (or another database on the same instance) and can create a staging table there to use for this type of routine.

    ** You could do the same using a MERGE statement instead of the separate UPDATE and INSERT statements. Your mileage may vary.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • hey Thank you very much 🙂

    I indeed followed this method only. I had source table and then wrote a Merge Stored procedure which updates/inserts destination table using Execute SQL task. A task which took 25 hours now is taking 43 minutes.

  • Thanks for posting back. Seeing improvements like that winds my clock.

    You must be processing a lot of data for it to continue to take 43 minutes. If this is well within your needs and you must move on to other things I fully understand. However, if you want a hand possibly improving on the 43 minutes post here or to a new thread some of the longer running code and we will see if there are some tuning opportunities.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Well there are around 5M data to process.

  • Would it benefit you to improve in the 43 minutes? 5 million rows is more than your average data load but it is possible to process that many rows a lot faster than 43 minutes, depending what is being done. Sometimes a design decision can shave hours off like what you just did moving from row by row processing from SSIS to the database to a set based process rooted in T-SQL. Other times redesigning or tuning T-SQL can shave hours off. Have you profiled the process end to end to find out which step is taking the longest? Can you lay out what the process is doing end to end for us?

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Orlando Colamatteo (1/23/2016)


    Would it benefit you to improve in the 43 minutes? 5 million rows is more than your average data load but it is possible to process that many rows a lot faster than 43 minutes, depending what is being done. Sometimes a design decision can shave hours off like what you just did moving from row by row processing from SSIS to the database to a set based process rooted in T-SQL. Other times redesigning or tuning T-SQL can shave hours off. Have you profiled the process end to end to find out which step is taking the longest? Can you lay out what the process is doing end to end for us?

    Very true. 5M rows in 43 minutes leaves the impression there is room for improvement. In your merge, are the join columns indexed? It might help to build/rebuild an index on that stage table after a load.

    ----------------------------------------------------

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply