SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Anatomy of an Incremental Load


Anatomy of an Incremental Load

Author
Message
fuller.artful
fuller.artful
SSC Journeyman
SSC Journeyman (81 reputation)SSC Journeyman (81 reputation)SSC Journeyman (81 reputation)SSC Journeyman (81 reputation)SSC Journeyman (81 reputation)SSC Journeyman (81 reputation)SSC Journeyman (81 reputation)SSC Journeyman (81 reputation)

Group: General Forum Members
Points: 81 Visits: 165
Hi Andy,

I appreciate your quick responses. Perhaps I misread the following sentence, but I don't think so:

"Your source may be a table or view, it may be a query. For those of us who like to squeeze every cycle possible out of an application, using a query for relational database sources will provide better performance than specifying a table or view name."

Arthur

Arthur Fuller
cell: 647-710-1314

Only two businesses refer to their clients as users: drug-dealing and software development.
-- Arthur Fuller
Matt Miller (4)
Matt Miller (4)
SSChampion
SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)

Group: General Forum Members
Points: 12435 Visits: 18576
Arthur -

Picking the name of the table or view from the drop down is equivalent to select * from MyTableOrView. In the same way that that's bad form in T-SQL - it's not a great idea if you intend to get every ounce of performance out of SSIS.

Andy -

like I mentioned the last time (when I apparently got a sneak preview of the articleSmile ), very nice, in-depth article. Just curious about one thing: during the correlate part when setting up the look-up, I would have naturally gone towards NOT setting the Lookup task to ignore the error, and to treat the "lookup failures" as the new records (meaning - set it up to "redirect the rows").

Is there any major pros and cons about setting it up this way or the other (I know that there are a million ways to set up various things in here). For example - in a case like that - is it better to have a conditional split that handles ALL conditions, or two conditional splits, each on a smaller set of the data?

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
Christian Buettner-167247
Christian Buettner-167247
Hall of Fame
Hall of Fame (3.6K reputation)Hall of Fame (3.6K reputation)Hall of Fame (3.6K reputation)Hall of Fame (3.6K reputation)Hall of Fame (3.6K reputation)Hall of Fame (3.6K reputation)Hall of Fame (3.6K reputation)Hall of Fame (3.6K reputation)

Group: General Forum Members
Points: 3607 Visits: 3889
Hello,

As already mentioned by others, I also doubt that this is a method that works for importing data from a different server.
The problems with data coming from a different server start with bad access paths.
I am still looking for this "ultimate" technology that will allow me to efficiently join data from two different servers without doing mostly full table scans in the remote database. Currently I am only achieving this through dynamic SQL. Please give me a hint if you know of a better method:-)

Thanks for sharing your knowledge!

Best Regards,

Chris Büttner
Andy Leonard
Andy Leonard
Ten Centuries
Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)

Group: General Forum Members
Points: 1160 Visits: 1095
Great comments. I wrote during lunch instead of responding - apologies. I will respond to the questions this evening or tomorrow.

One thing I noticed in my browser, and maybe this is only my browser doing this, I cannot see the Changed Rows detection condition expression. Here it is:

( (NameStyle != Dest_NameStyle) || (Title != Dest_Title) || (FirstName != Dest_FirstName) || (MiddleName != Dest_MiddleName) || (LastName != Dest_LastName) || (Suffix != Dest_Suffix) || (EmailAddress != Dest_EmailAddress) || (EmailPromotion != Dest_EmailPromotion) || (Phone != Dest_Phone) || (ModifiedDate != Dest_ModifiedDate) ) || ( IsNull(Title) || IsNull(MiddleName) || IsNull(Suffix) )

Back to work...

:{> Andy

Andy Leonard
Data Philosopher, Enterprise Data & Analytics
Linda Wenglikowski
Linda Wenglikowski
Valued Member
Valued Member (71 reputation)Valued Member (71 reputation)Valued Member (71 reputation)Valued Member (71 reputation)Valued Member (71 reputation)Valued Member (71 reputation)Valued Member (71 reputation)Valued Member (71 reputation)

Group: General Forum Members
Points: 71 Visits: 175
Great article Andy!

I found a unique way to reduce the number of rows I had to check, when I've got a SQL Server database as my source. I just add a "timestamp" column to each source table. Of course, this isn't really a timestamp its a varbinary that's database row version. Then I check for the minimum active row version, and pull only the rows that have been updated or inserted since my last incremental load.

This helps us reduce the number of rows we have to compare down to 40K or so to see if they were inserted or update. This is a great time saver when you would have to compare 100s of millions rows we have in our source database.

Thanks again - Linda
kevin.l.williams
kevin.l.williams
Hall of Fame
Hall of Fame (3.4K reputation)Hall of Fame (3.4K reputation)Hall of Fame (3.4K reputation)Hall of Fame (3.4K reputation)Hall of Fame (3.4K reputation)Hall of Fame (3.4K reputation)Hall of Fame (3.4K reputation)Hall of Fame (3.4K reputation)

Group: General Forum Members
Points: 3419 Visits: 1323
Hi Andy,

Thanks for a very informative article!

Some things I found while walking through this.

The Changed Rows detection condition expression failed for me. The error was that the expression returned a null. I think this is because some of the columns being tested contained nulls. If this is true then what would be the best way to handle it? I thought of this (isnull(NameStyle, ‘’) != isnull(Dest_NameStyle, ‘’))). I thought that this would also eliminate your need for the isnull check at the end.

When I tested I also noticed that the second time I ran it, new rows were inserted even though I didn’t delete any from my destination. I think this is because the destination is being keyed with an identity column. The rows I inserted didn’t have the same contactid so the next time the package ran it found the same rows missing. What would be the best way of dealing with this? The only way I could think of is to do it the same way you do the updates so you could set the identity insert on in the sql statement.

-Kevin



Rick Todd
Rick Todd
SSC Veteran
SSC Veteran (292 reputation)SSC Veteran (292 reputation)SSC Veteran (292 reputation)SSC Veteran (292 reputation)SSC Veteran (292 reputation)SSC Veteran (292 reputation)SSC Veteran (292 reputation)SSC Veteran (292 reputation)

Group: General Forum Members
Points: 292 Visits: 441
cmcklw:
I'm doing my compare based on the Business Key of the table, as opposed to something I'm generating internally (like the Surrogate Keys I'm generating with an IDENTITY column). Is this possible in your situation?

HTH,
Rick


Rick Todd
kevin.l.williams
kevin.l.williams
Hall of Fame
Hall of Fame (3.4K reputation)Hall of Fame (3.4K reputation)Hall of Fame (3.4K reputation)Hall of Fame (3.4K reputation)Hall of Fame (3.4K reputation)Hall of Fame (3.4K reputation)Hall of Fame (3.4K reputation)Hall of Fame (3.4K reputation)

Group: General Forum Members
Points: 3419 Visits: 1323
Rick Todd (2/11/2008)
cmcklw:
I'm doing my compare based on the Business Key of the table, as opposed to something I'm generating internally (like the Surrogate Keys I'm generating with an IDENTITY column). Is this possible in your situation?

HTH,
Rick


Yes. I think that would work but I wanted to know the best way to deal with a table that does not have a Business Key and the only key is an identity column.

Thanks,

-Kevin



Rick Todd
Rick Todd
SSC Veteran
SSC Veteran (292 reputation)SSC Veteran (292 reputation)SSC Veteran (292 reputation)SSC Veteran (292 reputation)SSC Veteran (292 reputation)SSC Veteran (292 reputation)SSC Veteran (292 reputation)SSC Veteran (292 reputation)

Group: General Forum Members
Points: 292 Visits: 441
cmcklw (2/11/2008)
Rick Todd (2/11/2008)
cmcklw:
I'm doing my compare based on the Business Key of the table, as opposed to something I'm generating internally (like the Surrogate Keys I'm generating with an IDENTITY column). Is this possible in your situation?

HTH,
Rick


Yes. I think that would work but I wanted to know the best way to deal with a table that does not have a Business Key and the only key is an identity column.

Thanks,

-Kevin


I guess we'll wait for Andy or someone else to weigh in, but I can't see how you can use any of this logic if you don't have a Business Key to make sure you're matching up the right rows.


Rick Todd
Matt Miller (4)
Matt Miller (4)
SSChampion
SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)

Group: General Forum Members
Points: 12435 Visits: 18576
cmcklw -

There has to be some element that matches what you're importing to what's already in there. Meaning - there would need to be some way for you to say on a row-by-row basis, that this row from this table matches up with THIS row in this other table. Without having some basis to be able to cross-reference them, how could you figure that out?

Whether it's an identity field, a name field, etc... or several fields really isn't important, but that "natural key" (interestingly enough - there's a bit of a brawl going on on another thread about "natural keys") would be what you need to code into the condition. Meaning - is there or is there NOT a match? And if this is a mismatch anywhere in that natural key, then that would entail a NEW row versus updating an existing row with new data.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search