SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Anatomy of an Incremental Load


Anatomy of an Incremental Load

Author
Message
kenambrose
kenambrose
SSC Veteran
SSC Veteran (279 reputation)SSC Veteran (279 reputation)SSC Veteran (279 reputation)SSC Veteran (279 reputation)SSC Veteran (279 reputation)SSC Veteran (279 reputation)SSC Veteran (279 reputation)SSC Veteran (279 reputation)

Group: General Forum Members
Points: 279 Visits: 665
Understood Scott. And I think your contribution will likely help me in my work in the future.

And I personally feel that all technical examples offered in venues such as this should have volume performance limitations investigated and noted in the article, as this issue can be one of the "deal breakers" for an ETL solution, and an unpleasant disappointment if users make incorrect assumptions.

And as I mentioned previously however, I also expect such performance limitations would NOT be an issue for many ETL problems, although as you have experienced, will be for some.

Best,
Ken



Andy Leonard
Andy Leonard
Ten Centuries
Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)

Group: General Forum Members
Points: 1124 Visits: 1095
Hi mescotty22,

This article was originally published in early 2008. The purpose of the article was to introduce and demonstrate one way of loading data incrementally using SSIS. It's not the only way, and this design pattern is not the ultimate solution for all incremental loading strategies. For me, this (or something very similar to this) is the usual starting point.

If this doesn't perform for me as desired, I tinker with the design. How and where I tinker... depends.

There were some really cool suggestions about approaches to making this solution scale. Some can be found earlier in this forum near this post (http://www.sqlservercentral.com/Forums/FindPost491603.aspx). Some are things I've tried before and some were things I'd never tried.

Everyone is working in different environments. 100 million rows is a lot - sometimes. Data shape comes into play with scale, as does hardware. I currently work on servers with 24 to 96 CPUs and 256 to 1,024 GB RAM. Some 100M row loads on these systems are, frankly, no big deal.

We're all working in different environments with different requirements. My intent with this article was to help folks get started understanding incremental loads with SSIS.

:{> Andy

Andy Leonard
Data Philosopher, Enterprise Data & Analytics
Carlos Sacristan
Carlos Sacristan
Grasshopper
Grasshopper (15 reputation)Grasshopper (15 reputation)Grasshopper (15 reputation)Grasshopper (15 reputation)Grasshopper (15 reputation)Grasshopper (15 reputation)Grasshopper (15 reputation)Grasshopper (15 reputation)

Group: General Forum Members
Points: 15 Visits: 357
Hi Andy,

good article.

I didn't read all the posts, so maybe someone write about it, but did you know TableDifference component (http://www.sqlbi.eu/Default.aspx?tabid=74)? I used it a couple of times and it's more easy than the way you show us.
jpratt-797544
jpratt-797544
SSC Journeyman
SSC Journeyman (85 reputation)SSC Journeyman (85 reputation)SSC Journeyman (85 reputation)SSC Journeyman (85 reputation)SSC Journeyman (85 reputation)SSC Journeyman (85 reputation)SSC Journeyman (85 reputation)SSC Journeyman (85 reputation)

Group: General Forum Members
Points: 85 Visits: 79
A very interesting article, which I have bookmarked for future reference. I, too, find SSIS difficult to learn coming from a VB/.NET/T-SQL background. I ended up building an extensible VB.NET task engine with process-specific task handlers instantiated by reflection and data processing written in T-SQL stored procedures rather than trying to fight with SSIS to handle my very specific external EDI file import, error handling, reprocessing, and reporting requirements.

One issue I have is that incremental loads are NOT always faster. I had one case with one step in an ETL process where an incremental load of the last 30 days of data, with aggregation occurring when selecting from the data source, took 3 minutes (~4,300 rows). FYI, the process pulls data from a remote server to the data warehouse.

I was unhappy with that result, so I tried a TNL (Truncate-N-Load) method for all data from 8/1/06 to present (~128,000 rows). It took 10 seconds. The TNL method time blew me away versus the incremental method's time. It wasn't what I expected, but I guess we live and learn.
qiyuef
qiyuef
Grasshopper
Grasshopper (16 reputation)Grasshopper (16 reputation)Grasshopper (16 reputation)Grasshopper (16 reputation)Grasshopper (16 reputation)Grasshopper (16 reputation)Grasshopper (16 reputation)Grasshopper (16 reputation)

Group: General Forum Members
Points: 16 Visits: 103
If you have million rows in source table. it may take longer time to do lookup and conditional split comparison than simply batch load all data into destination table
tskelley
tskelley
SSC-Enthusiastic
SSC-Enthusiastic (157 reputation)SSC-Enthusiastic (157 reputation)SSC-Enthusiastic (157 reputation)SSC-Enthusiastic (157 reputation)SSC-Enthusiastic (157 reputation)SSC-Enthusiastic (157 reputation)SSC-Enthusiastic (157 reputation)SSC-Enthusiastic (157 reputation)

Group: General Forum Members
Points: 157 Visits: 1175
Hello Andy, thanks for a great overview using SSIS. I especially like the detailed explanation with screenshots. We currently use a stored procedure to handle our comparisons and upserts, once the data is loaded using SSIS. One of these steps is a stored proc to ensure the lookup tables are populated with any new values before updating the main table. Any particular thoughts on handling this or would we perform "mini" processes similar to this step?



jcraigue@cox.net
jcraigue@cox.net
Grasshopper
Grasshopper (12 reputation)Grasshopper (12 reputation)Grasshopper (12 reputation)Grasshopper (12 reputation)Grasshopper (12 reputation)Grasshopper (12 reputation)Grasshopper (12 reputation)Grasshopper (12 reputation)

Group: General Forum Members
Points: 12 Visits: 34
Um, just ooc - I didn't see any discussion of just doing this in SQL?
Why use SSIS at all?
rcooke
rcooke
Grasshopper
Grasshopper (12 reputation)Grasshopper (12 reputation)Grasshopper (12 reputation)Grasshopper (12 reputation)Grasshopper (12 reputation)Grasshopper (12 reputation)Grasshopper (12 reputation)Grasshopper (12 reputation)

Group: General Forum Members
Points: 12 Visits: 17
i prefer to use the merge join transformation combined with a conditional split. For more complex situations use the konesans checksum algorithm transformation to compare staging table vs fact/dimension table.

regards

Robert Cooke
Andy Leonard
Andy Leonard
Ten Centuries
Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)

Group: General Forum Members
Points: 1124 Visits: 1095
There are use cases for SSIS that are not served well by using T-SQL. The example used in the article is not a good use case for SSIS over T-SQL - at least not all of the time.

SSIS is suited for moving data between instances of SQL Server on different physical servers or between SQL Server and another database platform. SSIS can also perform better than T-SQL on certain large data transfers between SQL Server databases.

As is often the case, the correct answer is "it depends". I encourage developers to test and see which will perform better: T-SQL or SSIS. That's what I do.

:{> Andy

Andy Leonard
Data Philosopher, Enterprise Data & Analytics
cyril.capistrano
cyril.capistrano
Forum Newbie
Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)

Group: General Forum Members
Points: 5 Visits: 36
I respectfully disagree that incremental loads are faster than the truncate method. The problem with an incremental load is the amount of resources it uses on the SSIS side to manage joins and lookups. The need for SSIS to load the dataset into memory before it can perform a comparison is great. In addition to the resources required to pull up the data in SQL, the amount of resources needed by the SSIS machine to manage the comparisons may even be greater. Part of this is the need to ensure string comparisons are done with trimmed fields, re-sorting, and data conversions that are performed to ensure the comparisons are accurate. I've yet to see a comment somewhere where the accuracy of the comparisons are put to question but they do occur if these are not taken to consideration. This is due also to the fact that the collation of the SQL server is different from the collation of the Windows Server machine that runs the SSIS package. From my experience, we have processed, logged, validated, and loaded about 60 million records in 2.5 hours using the truncated method (with drop/re-create specific indexes), as opposed to the incremental that takes about 6-8 hours.
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search