Level 3 of the Stairway to Integration Services - Adding Rows in Incremental Loads

Question

Level 3 of the Stairway to Integration Services - Adding Rows in Incremental Loads

Andy Leonard

SSChampion

Points: 10134
More actions
August 20, 2011 at 8:29 pm

#241923

Comments posted to this topic are about the item Level 3 of the Stairway to Integration Services - Adding Rows in Incremental Loads
Andy Leonard, Chief Data Engineer, Enterprise Data & Analytics

Viewing 15 posts - 1 through 15 (of 22 total)

You must be logged in to reply to this topic. Login to reply

raghav.kaura SSC Enthusiast Points: 103 More actions · Answer 1

How do you handle a scenario when data exists in the target table but has been deleted from the source table. Also how will this handle the load in case email id gets updated in the source.

Samuel Vella SSCrazy Eights Points: 8055 More actions · Answer 2

raghav.kaura (9/15/2011)
Also how will this handle the load in case email id gets updated in the source.

If your source system uses an attribute like email as a key column and that attribute can be changed then you have a very bad source system or you're using the wrong column as a key onto the source data.

NYProjectLeader SSC Veteran Points: 295 More actions · Answer 3

Below figure 20, there is a statement that reads, "In SSIS 2008 and SSIS 2008 R2, the Lookup Transformation provides this built-in output to catch records in the Lookup table (the dbo.Contact destination table, in this case) that do not exist in the source (the Person.Contact table) - it's the Lookup No Match Output."

Shouldn't this be the other way around? " In SSIS 2008 and SSIS 2008 R2, the Lookup Transformation provides this built-in output to catch records in the Source table (the dbo.Contact destination table, in this case) that do not exist in the lookup (the Person.Contact table) - it's the Lookup No Match Output."

S Paquin SSC Journeyman Points: 83 More actions · Answer 4

I just finished reading your SSIS articles and they are an excellent introduction. I just can not wait to see other articles to handle updates an deletes. I hope I will not have to wait too long...

Is there any way to simulate the same as Replication and keep a log of insert, update and deletes in the source so that only those manipulations are handled in the destination ? It seems to me it is a waste of resources to have to do lookups periodically to detect changes between source and destination when SQLServer already does that job of tracking changes. Any suggestions?

Keep the good work.

Stephane

Andy Leonard SSChampion Points: 10134 More actions · Answer 5

Hi Raghav,

You're getting ahead of me. Step 5...

:{>

Andy Leonard, Chief Data Engineer, Enterprise Data & Analytics

Andy Leonard SSChampion Points: 10134 More actions · Answer 6

Hi NYSystemAnalyst,

That's a poorly-worded sentence. My apologies. The No Match output exists in 2008 and R2.

:{>

Andy Leonard, Chief Data Engineer, Enterprise Data & Analytics

NYProjectLeader SSC Veteran Points: 295 More actions · Answer 7

Thanks. This is a great series of articles and look forward to reading more.

Andy Leonard SSChampion Points: 10134 More actions · Answer 8

Thanks Stephane,

There is a way to create audit trails of record states, but that topicis beyond the scope of this series. I can get you started offline. Email me at andy.leonard@gmail.com.

:{>

Andy Leonard, Chief Data Engineer, Enterprise Data & Analytics

ajenawsu Mr or Mrs. 500 Points: 558 More actions · Answer 9

This is a great (and timely) article. I'm looking forward to the next installment. I watched your SQL Lunch presentation on re-executable T-sql and it has changed my life. Thanks.

JJ B SSCertifiable Points: 5100 More actions · Answer 10

You have great skill in explaining things. Thanks for taking the time to share your information in a way that is actually helpful!

Andy Leonard SSChampion Points: 10134 More actions · Answer 11

Hi JJ,

I have an excellent editor, Steve Jones.

:{>

Andy Leonard, Chief Data Engineer, Enterprise Data & Analytics

rannoune SSC Rookie Points: 26 More actions · Answer 12

Hi Andy,

Thanks for this great article!!! I have a question; what if the Person.Contacts table is located on an another server, what would you do? We definitely don't want to get all rows as it impacts performance and may have timeout issues for huge data. How can we join Person.Contact and dbo.Contact when they are located on different servers, then only select the rows we need to insert from Person.Contact?

Thanks again!

Rob de Vos SSChasing Mays Points: 646 More actions · Answer 13

Andy I'am enjoing the series but I have a small problem with the 'Update'.

I'ts working, but it only updates about 700 records each run, I can't find any option to change this behavior.

The only thing I can imagine is that I use a very limited virtual server and that this is caused by the Full Cache option of the Lookup.

But I see the correct number of records at each Data Flow.

I do see 2 warnings, but I don't think they have any thing to do with the update problem:

[SSIS.Pipeline] Warning: Warning: Could not open global shared memory to communicate with performance DLL; data flow performance counters are not available. To resolve, run this package as an administrator, or on the system's console.

and

[SSIS.Pipeline] Warning: The output column "LkUp_BusinessEntityID" (131) on output "Lookup Match Output" (103) and component "Lookup" (101) is not subsequently used in the Data Flow task. Removing this unused output column can increase Data Flow task performance.

Rob

Brandon Forest SSCommitted Points: 1977 More actions · Answer 14

I agree that the second method is better than the first, but the duration of each method can only be measured correctly if you clear the procedure cache for the previous dataset. 😉