Improving SSIS Update Performance

  • brettk (10/31/2013)


    Dunno if it's worth mentioning, but the downside to the tmp table + update option is the lack of SSIS-level, row-by-row error reporting (which makes sense given you're operating set-based). In cases like that, my general approach is to create a preload table that SSIS dumps into, that *has the same schema as the target table*. That way, most common errors (think null column violations, column widths exceeded, etc) are caught when populating the preload table and can be handled the usual way. Then, when doing the update, you can be confident that the staged data will load correctly.

    Kinda sorta.. 🙂

    If you let SQL handle the exceptions, or rather let SQL notify SSIS of the exception, then yes. But, since SSIS is a pipeline, you can filter/redirect errant rows before they reach the destination.

  • Yes, that is the dream. 🙂

    In reality, ETLs, being software, are buggy. To anticipate that, while obviously I work to catch bad data before it ever gets loaded, I also make sure to catch errors at insert/update time and incorporate them into my staging area archive accordingly. That way I have a final safety net in case things go horribly wrong.

    This also ensures that a bug resulting in a bad row getting through doesn't cause the entire load to fail, as would be the case if the error was thrown during a straight bulk update of the target dimension/fact table.

  • Leo Peysakhovich (10/31/2013)


    Thanks, nice article. Few comments:

    1. Threads can interfere with each other if there are records on the same pages. And this will be a reality unless split is done by the surrogate key.

    2. I would go even further in the changes and create differential update: a. first set the comparison of data and find which records really need to be updated. Then do an update only for those records. This technique allows minimize the interaction time. I have similar set of processes running daily and required to do the simple update/insert. Amount of records is about 100,000,000 in total. But in reality only few thousands are changing. Initial process was done exactly the way you describe in article. When differential load was implemented interaction time (e.g. unavalability of the table for the application) was reduced from 5-10 minutes to the few seconds. I do understand that this required an additional work but I think in most cases this is the most proper way to go.

    Thanks Leo. I fully agree with your 2nd point and in fact it's something which should always be done. In this article, I simple removed the 'checking' part to focus only on the update section.

    Regards
    Brian Ellul
    ----------------------------------------------------------------------------------
    Computers are incredibly fast, accurate, and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Together they are powerful beyond imagination.
    - Albert Einstein -

  • Thanks to all for your comments 🙂

    Regards
    Brian Ellul
    ----------------------------------------------------------------------------------
    Computers are incredibly fast, accurate, and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Together they are powerful beyond imagination.
    - Albert Einstein -

  • brettk (10/31/2013)


    Yes, that is the dream. 🙂

    +1

    great article, thanks for the share.

    one more point to mention is if you updating too many rows at a time.

    SQL Server creates an implicit transaction for each separate query (if transaction is not already started). Therefore, all changes made by the query must be persisted in the transaction log which makes your log HUGE and takes A LOT OF TIME to complete. splitting the update to, say 1000 rows (this number is not fixed, its just like mentioning Salt in cooking recipes :-P) at a time, will boost the performance tremendously.

  • Jeff Moden (10/31/2013)


    Gosh... still more than 2 minutes using the "fastest" option. How many rows were in the source?

    From the article:

    "The source has been limited to 1 million rows ..."

    Also, original time of 12 minutes @ approximately 99K executions / min = approximately 1,118,000 executions.

  • Sweet article, currently I am figuring out what is the best option for the company where I work.

    The SSIS package in use is not for a large datawarehouse structure or so it is just for the BI sollution within a failover cluster with two SQL Servers.

    However...

    One table in the database is 30mil+ rows large. The SSIS package now creates an temp table dumps all the data to that temp table deletes the old one on the new location and renames the temp table. (Yes it is a delete and replace package and not an UPDATE package)

    The keys etc are also created with in the temp table after the inserting of rows is done. My question is the last option you name is what I use at this moment however this is for updating results. Is option 2 perhaps better for an complete creation of a new table??

  • I wonder if there is much benefit to using BOTH concurrency AND insert/update. As in "multiple concurrent inserts and updates". Then SQL Server could thread each UPDATE separately. I wouldn't even attempt this without partitioning the destination table in a way that matched the separation of threads (meaning, something other than a meaningless modulus).

  • I was also having the same situation when i tried to update with OLED command component. And then after some days, i went with option 4 which gave me a tremendous result in time as well as in performance.I think option 4 is mainly well suitable for bulk update.

    After all , this is a nice article.

  • andrew.baines (10/31/2013)


    I've been using the UPSERT component from Pragmatic Works, would be interesting to see that in the comparison.

    We've similarly been using the Pragmatic Works tools and find they are a quick and easy performance improver with virtually no re-coding effort.

Viewing 10 posts - 16 through 24 (of 24 total)

You must be logged in to reply to this topic. Login to reply