Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 123»»»

Improving SSIS Update Performance Expand / Collapse
Author
Message
Posted Wednesday, October 30, 2013 9:52 PM


SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Yesterday @ 6:02 AM
Points: 237, Visits: 534
Comments posted to this topic are about the item Improving SSIS Update Performance

Regards
Brian Ellul
----------------------------------------------------------------------------------
Computers are incredibly fast, accurate, and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Together they are powerful beyond imagination.
- Albert Einstein -
Post #1510014
Posted Thursday, October 31, 2013 2:14 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Wednesday, November 27, 2013 1:41 PM
Points: 1, Visits: 52
Fantastic article,

Thanks
Post #1510056
Posted Thursday, October 31, 2013 2:23 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Tuesday, November 19, 2013 12:10 PM
Points: 1, Visits: 14
I've been using the UPSERT component from Pragmatic Works, would be interesting to see that in the comparison.
Post #1510057
Posted Thursday, October 31, 2013 2:37 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 1:26 AM
Points: 12,227, Visits: 9,198
Nice article, nice to see some alternatives to the "insert in staging table and then do an update".
However, it's a we can't review the UPDATE statement used. The article says the UPDATE is performed using the surrogate key, so I can imagine this partitions nicely within the conditional split and you avoid locking issues on the destination table.
I can think about other scenarious though where the UPDATE statement isn't that straight forward and where multiple threads may interfere with each other.

Anyway, very nice article and thanks for the effort.




How to post forum questions.
Need an answer? No, you need a question.
What’s the deal with Excel & SSIS?

Member of LinkedIn. My blog at LessThanDot.

MCSA SQL Server 2012 - MCSE Business Intelligence
Post #1510060
Posted Thursday, October 31, 2013 3:41 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Thursday, April 10, 2014 5:48 AM
Points: 107, Visits: 436
Hi Brian,

first I would like to thank you for the article.

I think this week or the last there was also an interesting discussion of the performance of the merge command, which is also a tool to be considered.
Even when it is very nice and comfortable to design packages with the visual component, connectors and dropdown list (almost only mouse clicks), most of the time scripts are needed to reach better performance. And that is always the problem with the DFT components and processing row by row.
I know the SQL scripts are more difficult to document, and a SSIS Package is harder to understand (when someone else has to modify something) but I always try to use the merge command or the temporary table solution.

Kind Regards,



Paul Hernández
http://hernandezpaul.wordpress.com/
https://twitter.com/paul_eng
Post #1510078
Posted Thursday, October 31, 2013 6:15 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Thursday, April 03, 2014 6:02 AM
Points: 138, Visits: 259
Thanks, nice article. Few comments:
1. Threads can interfere with each other if there are records on the same pages. And this will be a reality unless split is done by the surrogate key.
2. I would go even further in the changes and create differential update: a. first set the comparison of data and find which records really need to be updated. Then do an update only for those records. This technique allows minimize the interaction time. I have similar set of processes running daily and required to do the simple update/insert. Amount of records is about 100,000,000 in total. But in reality only few thousands are changing. Initial process was done exactly the way you describe in article. When differential load was implemented interaction time (e.g. unavalability of the table for the application) was reduced from 5-10 minutes to the few seconds. I do understand that this required an additional work but I think in most cases this is the most proper way to go.



Post #1510140
Posted Thursday, October 31, 2013 8:12 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Thursday, March 27, 2014 2:39 PM
Points: 2,141, Visits: 486
My favorite data loading aids:

1. The Data Loading Performance Guide White Paper
2. SSIS Operational and Tuning Guide White Paper

There are so many factors that can affect performance, but the simple concept of adding threads is well illustrated here and may give someone another weapon in their ETL arsenal. Thank you for the contribution.
Post #1510194
Posted Thursday, October 31, 2013 8:35 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Wednesday, April 16, 2014 12:31 PM
Points: 14, Visits: 79
"ETL load"
extract transform load load
Post #1510212
Posted Thursday, October 31, 2013 8:52 AM


SSCarpal Tunnel

SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

Group: General Forum Members
Last Login: Yesterday @ 8:23 PM
Points: 4,832, Visits: 11,197
Nice article. Your numbering system needs alignment (starts off zero-based and moves to one-based in the results table).

I would be interested to see whether the final 'staging table' option would run even faster if it was indexed just before issuing the UPDATE.

Another technique I have seen used is INSERTing all rows (inserts and updates) directly into a flat view of the table (create view as select * from table) which contains an INSTEAD OF INSERT trigger which performs a MERGE. The good thing about this is that, by tuning batch commit size in the SSIS destination, you can achieve a near-continuous data flow. Might be worth a test ...



Help us to help you. For better, quicker and more-focused answers to your questions, consider following the advice in this link.

When you ask a question (and please do ask a question: "My T-SQL does not work" just doesn't cut it), please provide enough information for us to understand its context.
Post #1510234
Posted Thursday, October 31, 2013 9:01 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Yesterday @ 9:45 PM
Points: 36,013, Visits: 30,300
Gosh... still more than 2 minutes using the "fastest" option. How many rows were in the source?

Heh... and I have to admit this is the first time I've ever seen fully intentional RBAR in an SSIS package. If all the other packages contain the same type of problem, it may vary well be worth spending a couple of extra minutes on each to fix them. I'd also likely use a different tool. For example I can only guess that the reason why someone wrote the original package as RBAR is to allow "healthy" rows to be imported even if "erroneous" rows exist and are rejected by keys on the destination table. BULK INSERT and BCP both allow for "bad" rows to be sequestered in a separate file while the good rows are all loaded. If a BCP format file is used correctly (not required though) and depending on, of course, the number of FKs and other constraints on the target table, both BULK INSERT and BCP are capable of loading millions of rows per minute including some data validation in the process.

And nice article. Thanks for taking the time to write it and publish it.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

"Change is inevitable. Change for the better is not." -- 04 August 2013
(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1510245
« Prev Topic | Next Topic »

Add to briefcase 123»»»

Permissions Expand / Collapse