Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Improving SSIS Update Performance


Improving SSIS Update Performance

Author
Message
brian118
brian118
SSC-Addicted
SSC-Addicted (428 reputation)SSC-Addicted (428 reputation)SSC-Addicted (428 reputation)SSC-Addicted (428 reputation)SSC-Addicted (428 reputation)SSC-Addicted (428 reputation)SSC-Addicted (428 reputation)SSC-Addicted (428 reputation)

Group: General Forum Members
Points: 428 Visits: 587
Comments posted to this topic are about the item Improving SSIS Update Performance

Regards
Brian Ellul
----------------------------------------------------------------------------------
Computers are incredibly fast, accurate, and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Together they are powerful beyond imagination.
- Albert Einstein -
natty_nooy
natty_nooy
Forum Newbie
Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)

Group: General Forum Members
Points: 3 Visits: 52
Fantastic article,

Thanks
andrew.baines
andrew.baines
Forum Newbie
Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)

Group: General Forum Members
Points: 3 Visits: 26
I've been using the UPSERT component from Pragmatic Works, would be interesting to see that in the comparison.
Koen Verbeeck
Koen Verbeeck
SSC-Insane
SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)

Group: General Forum Members
Points: 24082 Visits: 13268
Nice article, nice to see some alternatives to the "insert in staging table and then do an update".
However, it's a we can't review the UPDATE statement used. The article says the UPDATE is performed using the surrogate key, so I can imagine this partitions nicely within the conditional split and you avoid locking issues on the destination table.
I can think about other scenarious though where the UPDATE statement isn't that straight forward and where multiple threads may interfere with each other.

Anyway, very nice article and thanks for the effort.


How to post forum questions.
Need an answer? No, you need a question.
What’s the deal with Excel & SSIS?
My blog at SQLKover.

MCSE Business Intelligence - Microsoft Data Platform MVP
Paul Hernández
Paul Hernández
SSC Veteran
SSC Veteran (292 reputation)SSC Veteran (292 reputation)SSC Veteran (292 reputation)SSC Veteran (292 reputation)SSC Veteran (292 reputation)SSC Veteran (292 reputation)SSC Veteran (292 reputation)SSC Veteran (292 reputation)

Group: General Forum Members
Points: 292 Visits: 661
Hi Brian,

first I would like to thank you for the article.

I think this week or the last there was also an interesting discussion of the performance of the merge command, which is also a tool to be considered.
Even when it is very nice and comfortable to design packages with the visual component, connectors and dropdown list (almost only mouse clicks), most of the time scripts are needed to reach better performance. And that is always the problem with the DFT components and processing row by row.
I know the SQL scripts are more difficult to document, and a SSIS Package is harder to understand (when someone else has to modify something) but I always try to use the merge command or the temporary table solution.

Kind Regards,

Paul Hernández
Leo Peysakhovich
Leo Peysakhovich
SSC-Enthusiastic
SSC-Enthusiastic (195 reputation)SSC-Enthusiastic (195 reputation)SSC-Enthusiastic (195 reputation)SSC-Enthusiastic (195 reputation)SSC-Enthusiastic (195 reputation)SSC-Enthusiastic (195 reputation)SSC-Enthusiastic (195 reputation)SSC-Enthusiastic (195 reputation)

Group: General Forum Members
Points: 195 Visits: 343
Thanks, nice article. Few comments:
1. Threads can interfere with each other if there are records on the same pages. And this will be a reality unless split is done by the surrogate key.
2. I would go even further in the changes and create differential update: a. first set the comparison of data and find which records really need to be updated. Then do an update only for those records. This technique allows minimize the interaction time. I have similar set of processes running daily and required to do the simple update/insert. Amount of records is about 100,000,000 in total. But in reality only few thousands are changing. Initial process was done exactly the way you describe in article. When differential load was implemented interaction time (e.g. unavalability of the table for the application) was reduced from 5-10 minutes to the few seconds. I do understand that this required an additional work but I think in most cases this is the most proper way to go.



sneumersky
sneumersky
SSCrazy
SSCrazy (2.4K reputation)SSCrazy (2.4K reputation)SSCrazy (2.4K reputation)SSCrazy (2.4K reputation)SSCrazy (2.4K reputation)SSCrazy (2.4K reputation)SSCrazy (2.4K reputation)SSCrazy (2.4K reputation)

Group: General Forum Members
Points: 2374 Visits: 487
My favorite data loading aids:

1. The Data Loading Performance Guide White Paper
2. SSIS Operational and Tuning Guide White Paper

There are so many factors that can affect performance, but the simple concept of adding threads is well illustrated here and may give someone another weapon in their ETL arsenal. Thank you for the contribution.
banreaxe
banreaxe
Grasshopper
Grasshopper (16 reputation)Grasshopper (16 reputation)Grasshopper (16 reputation)Grasshopper (16 reputation)Grasshopper (16 reputation)Grasshopper (16 reputation)Grasshopper (16 reputation)Grasshopper (16 reputation)

Group: General Forum Members
Points: 16 Visits: 86
"ETL load"
extract transform load load
Phil Parkin
Phil Parkin
SSChampion
SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)

Group: General Forum Members
Points: 14988 Visits: 20169
Nice article. Your numbering system needs alignment (starts off zero-based and moves to one-based in the results table).

I would be interested to see whether the final 'staging table' option would run even faster if it was indexed just before issuing the UPDATE.

Another technique I have seen used is INSERTing all rows (inserts and updates) directly into a flat view of the table (create view as select * from table) which contains an INSTEAD OF INSERT trigger which performs a MERGE. The good thing about this is that, by tuning batch commit size in the SSIS destination, you can achieve a near-continuous data flow. Might be worth a test ...


Help us to help you. For better, quicker and more-focused answers to your questions, consider following the advice in this link.

If the answer to your question can be found with a brief Google search, please perform the search yourself, rather than expecting one of the SSC members to do it for you.

Please surround any code or links you post with the appropriate IFCode formatting tags. It helps readability a lot.
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (69K reputation)SSC Guru (69K reputation)SSC Guru (69K reputation)SSC Guru (69K reputation)SSC Guru (69K reputation)SSC Guru (69K reputation)SSC Guru (69K reputation)SSC Guru (69K reputation)

Group: General Forum Members
Points: 69973 Visits: 40733
Gosh... still more than 2 minutes using the "fastest" option. How many rows were in the source?

Heh... and I have to admit this is the first time I've ever seen fully intentional RBAR in an SSIS package. If all the other packages contain the same type of problem, it may vary well be worth spending a couple of extra minutes on each to fix them. I'd also likely use a different tool. For example I can only guess that the reason why someone wrote the original package as RBAR is to allow "healthy" rows to be imported even if "erroneous" rows exist and are rejected by keys on the destination table. BULK INSERT and BCP both allow for "bad" rows to be sequestered in a separate file while the good rows are all loaded. If a BCP format file is used correctly (not required though) and depending on, of course, the number of FKs and other constraints on the target table, both BULK INSERT and BCP are capable of loading millions of rows per minute including some data validation in the process.

And nice article. Thanks for taking the time to write it and publish it.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search