Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««12345»»»

Anatomy of an Incremental Load Expand / Collapse
Author
Message
Posted Monday, February 11, 2008 6:51 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Friday, March 08, 2013 10:18 AM
Points: 47, Visits: 165
Hi Andy,

I appreciate your quick responses. Perhaps I misread the following sentence, but I don't think so:

"Your source may be a table or view, it may be a query. For those of us who like to squeeze every cycle possible out of an application, using a query for relational database sources will provide better performance than specifying a table or view name."

Arthur


Arthur Fuller
cell: 647-710-1314

Only two businesses refer to their clients as users: drug-dealing and software development.
-- Arthur Fuller
Post #453829
Posted Monday, February 11, 2008 7:20 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 2:12 PM
Points: 7,084, Visits: 14,685
Arthur -

Picking the name of the table or view from the drop down is equivalent to select * from MyTableOrView. In the same way that that's bad form in T-SQL - it's not a great idea if you intend to get every ounce of performance out of SSIS.

Andy -

like I mentioned the last time (when I apparently got a sneak preview of the article:) ), very nice, in-depth article. Just curious about one thing: during the correlate part when setting up the look-up, I would have naturally gone towards NOT setting the Lookup task to ignore the error, and to treat the "lookup failures" as the new records (meaning - set it up to "redirect the rows").

Is there any major pros and cons about setting it up this way or the other (I know that there are a million ways to set up various things in here). For example - in a case like that - is it better to have a conditional split that handles ALL conditions, or two conditional splits, each on a smaller set of the data?


----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
Post #453845
Posted Monday, February 11, 2008 7:44 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Tuesday, April 15, 2014 7:19 AM
Points: 2,814, Visits: 3,851
Hello,

As already mentioned by others, I also doubt that this is a method that works for importing data from a different server.
The problems with data coming from a different server start with bad access paths.
I am still looking for this "ultimate" technology that will allow me to efficiently join data from two different servers without doing mostly full table scans in the remote database. Currently I am only achieving this through dynamic SQL. Please give me a hint if you know of a better method

Thanks for sharing your knowledge!


Best Regards,
Chris Büttner
Post #453853
Posted Monday, February 11, 2008 11:18 AM


Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Wednesday, April 16, 2014 1:39 PM
Points: 388, Visits: 1,034
Great comments. I wrote during lunch instead of responding - apologies. I will respond to the questions this evening or tomorrow.

One thing I noticed in my browser, and maybe this is only my browser doing this, I cannot see the Changed Rows detection condition expression. Here it is:

( (NameStyle != Dest_NameStyle) || (Title != Dest_Title) || (FirstName != Dest_FirstName) || (MiddleName != Dest_MiddleName) || (LastName != Dest_LastName) || (Suffix != Dest_Suffix) || (EmailAddress != Dest_EmailAddress) || (EmailPromotion != Dest_EmailPromotion) || (Phone != Dest_Phone) || (ModifiedDate != Dest_ModifiedDate) ) || ( IsNull(Title) || IsNull(MiddleName) || IsNull(Suffix) )

Back to work...

:{> Andy


Andy Leonard
CSO, Linchpin People
Follow me on Twitter: @AndyLeonard
Post #453985
Posted Monday, February 11, 2008 12:14 PM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Tuesday, November 19, 2013 2:03 PM
Points: 45, Visits: 161
Great article Andy!

I found a unique way to reduce the number of rows I had to check, when I've got a SQL Server database as my source. I just add a "timestamp" column to each source table. Of course, this isn't really a timestamp its a varbinary that's database row version. Then I check for the minimum active row version, and pull only the rows that have been updated or inserted since my last incremental load.

This helps us reduce the number of rows we have to compare down to 40K or so to see if they were inserted or update. This is a great time saver when you would have to compare 100s of millions rows we have in our source database.

Thanks again - Linda
Post #454011
Posted Monday, February 11, 2008 1:59 PM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Friday, March 28, 2014 11:01 AM
Points: 3,047, Visits: 1,229
Hi Andy,

Thanks for a very informative article!

Some things I found while walking through this.

The Changed Rows detection condition expression failed for me. The error was that the expression returned a null. I think this is because some of the columns being tested contained nulls. If this is true then what would be the best way to handle it? I thought of this (isnull(NameStyle, ‘’) != isnull(Dest_NameStyle, ‘’))). I thought that this would also eliminate your need for the isnull check at the end.

When I tested I also noticed that the second time I ran it, new rows were inserted even though I didn’t delete any from my destination. I think this is because the destination is being keyed with an identity column. The rows I inserted didn’t have the same contactid so the next time the package ran it found the same rows missing. What would be the best way of dealing with this? The only way I could think of is to do it the same way you do the updates so you could set the identity insert on in the sql statement.

-Kevin



Post #454063
Posted Monday, February 11, 2008 2:09 PM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Tuesday, March 11, 2014 2:39 PM
Points: 216, Visits: 422
cmcklw:
I'm doing my compare based on the Business Key of the table, as opposed to something I'm generating internally (like the Surrogate Keys I'm generating with an IDENTITY column). Is this possible in your situation?

HTH,
Rick



Rick Todd
Post #454070
Posted Monday, February 11, 2008 2:41 PM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Friday, March 28, 2014 11:01 AM
Points: 3,047, Visits: 1,229
Rick Todd (2/11/2008)
cmcklw:
I'm doing my compare based on the Business Key of the table, as opposed to something I'm generating internally (like the Surrogate Keys I'm generating with an IDENTITY column). Is this possible in your situation?

HTH,
Rick


Yes. I think that would work but I wanted to know the best way to deal with a table that does not have a Business Key and the only key is an identity column.

Thanks,

-Kevin



Post #454089
Posted Monday, February 11, 2008 2:44 PM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Tuesday, March 11, 2014 2:39 PM
Points: 216, Visits: 422
cmcklw (2/11/2008)
Rick Todd (2/11/2008)
cmcklw:
I'm doing my compare based on the Business Key of the table, as opposed to something I'm generating internally (like the Surrogate Keys I'm generating with an IDENTITY column). Is this possible in your situation?

HTH,
Rick


Yes. I think that would work but I wanted to know the best way to deal with a table that does not have a Business Key and the only key is an identity column.

Thanks,

-Kevin


I guess we'll wait for Andy or someone else to weigh in, but I can't see how you can use any of this logic if you don't have a Business Key to make sure you're matching up the right rows.



Rick Todd
Post #454090
Posted Monday, February 11, 2008 2:59 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 2:12 PM
Points: 7,084, Visits: 14,685
cmcklw -

There has to be some element that matches what you're importing to what's already in there. Meaning - there would need to be some way for you to say on a row-by-row basis, that this row from this table matches up with THIS row in this other table. Without having some basis to be able to cross-reference them, how could you figure that out?

Whether it's an identity field, a name field, etc... or several fields really isn't important, but that "natural key" (interestingly enough - there's a bit of a brawl going on on another thread about "natural keys") would be what you need to code into the condition. Meaning - is there or is there NOT a match? And if this is a mismatch anywhere in that natural key, then that would entail a NEW row versus updating an existing row with new data.


----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
Post #454098
« Prev Topic | Next Topic »

Add to briefcase ««12345»»»

Permissions Expand / Collapse