Level 3 of the Stairway to Integration Services - Adding Rows in Incremental Loads

  • Comments posted to this topic are about the item Level 3 of the Stairway to Integration Services - Adding Rows in Incremental Loads

    Andy Leonard, Chief Data Engineer, Enterprise Data & Analytics

  • How do you handle a scenario when data exists in the target table but has been deleted from the source table. Also how will this handle the load in case email id gets updated in the source.

  • raghav.kaura (9/15/2011)


    Also how will this handle the load in case email id gets updated in the source.

    If your source system uses an attribute like email as a key column and that attribute can be changed then you have a very bad source system or you're using the wrong column as a key onto the source data.

  • Below figure 20, there is a statement that reads, "In SSIS 2008 and SSIS 2008 R2, the Lookup Transformation provides this built-in output to catch records in the Lookup table (the dbo.Contact destination table, in this case) that do not exist in the source (the Person.Contact table) - it's the Lookup No Match Output."

    Shouldn't this be the other way around? " In SSIS 2008 and SSIS 2008 R2, the Lookup Transformation provides this built-in output to catch records in the Source table (the dbo.Contact destination table, in this case) that do not exist in the lookup (the Person.Contact table) - it's the Lookup No Match Output."

  • I just finished reading your SSIS articles and they are an excellent introduction. I just can not wait to see other articles to handle updates an deletes. I hope I will not have to wait too long...

    Is there any way to simulate the same as Replication and keep a log of insert, update and deletes in the source so that only those manipulations are handled in the destination ? It seems to me it is a waste of resources to have to do lookups periodically to detect changes between source and destination when SQLServer already does that job of tracking changes. Any suggestions?

    Keep the good work.

    Stephane

  • Hi Raghav,

    You're getting ahead of me. Step 5...

    :{>

    Andy Leonard, Chief Data Engineer, Enterprise Data & Analytics

  • Hi NYSystemAnalyst,

    That's a poorly-worded sentence. My apologies. The No Match output exists in 2008 and R2.

    :{>

    Andy Leonard, Chief Data Engineer, Enterprise Data & Analytics

  • Thanks. This is a great series of articles and look forward to reading more.

  • Thanks Stephane,

    There is a way to create audit trails of record states, but that topicis beyond the scope of this series. I can get you started offline. Email me at andy.leonard@gmail.com.

    :{>

    Andy Leonard, Chief Data Engineer, Enterprise Data & Analytics

  • This is a great (and timely) article. I'm looking forward to the next installment. I watched your SQL Lunch presentation on re-executable T-sql and it has changed my life. Thanks.

  • You have great skill in explaining things. Thanks for taking the time to share your information in a way that is actually helpful!

  • Hi JJ,

    I have an excellent editor, Steve Jones.

    :{>

    Andy Leonard, Chief Data Engineer, Enterprise Data & Analytics

  • Hi Andy,

    Thanks for this great article!!! I have a question; what if the Person.Contacts table is located on an another server, what would you do? We definitely don't want to get all rows as it impacts performance and may have timeout issues for huge data. How can we join Person.Contact and dbo.Contact when they are located on different servers, then only select the rows we need to insert from Person.Contact?

    Thanks again!

  • Andy I'am enjoing the series but I have a small problem with the 'Update'.

    I'ts working, but it only updates about 700 records each run, I can't find any option to change this behavior.

    The only thing I can imagine is that I use a very limited virtual server and that this is caused by the Full Cache option of the Lookup.

    But I see the correct number of records at each Data Flow.

    I do see 2 warnings, but I don't think they have any thing to do with the update problem:

    [SSIS.Pipeline] Warning: Warning: Could not open global shared memory to communicate with performance DLL; data flow performance counters are not available. To resolve, run this package as an administrator, or on the system's console.

    and

    [SSIS.Pipeline] Warning: The output column "LkUp_BusinessEntityID" (131) on output "Lookup Match Output" (103) and component "Lookup" (101) is not subsequently used in the Data Flow task. Removing this unused output column can increase Data Flow task performance.

    Rob

  • I agree that the second method is better than the first, but the duration of each method can only be measured correctly if you clear the procedure cache for the previous dataset. 😉

Viewing 15 posts - 1 through 15 (of 23 total)

You must be logged in to reply to this topic. Login to reply