Deleting Rows in Incremental Loads – Level 5 of the Stairway to Integration Services

  • Comments posted to this topic are about the item Deleting Rows in Incremental Loads – Level 5 of the Stairway to Integration Services

    Andy Leonard, Chief Data Engineer, Enterprise Data & Analytics

  • As you're now using a staging table, you could do all three ADD/DELETE/UPDATE operations at once with an SQL MERGE. Are you going to cover that in the next part of the series?

  • Delete src

    From Person.Contact src

    Join StageDeletes stage

    On stage.Email = src.EmailAddress

    I think it should be deleted from dbo.Contact

  • David Data (11/4/2011)


    As you're now using a staging table, you could do all three ADD/DELETE/UPDATE operations at once with an SQL MERGE. Are you going to cover that in the next part of the series?

    ^^^ THIS ^^^, yes please! I believe the MERGE function still must be contained within a script task / execute T-SQL task (no SSIS component built for it...yet...) but its so powerful its worth the extra overhead 🙂

    Peace.

    - Savage

  • Hi David,

    You are correct, but I do not plan to cover the Merge statement in this series.

    Andy

    Andy Leonard, Chief Data Engineer, Enterprise Data & Analytics

  • Hi Erin,

    You are correct - my intent was to delete the rows from the Destination table (dbo.Contact), not the source (Person.Contact).

    Good catch and my bad!

    Andy

    Andy Leonard, Chief Data Engineer, Enterprise Data & Analytics

  • I recently was surprised to find importing a large number of wide rows it was faster to use a SSIS merge of two sorted inputs with a conditional split rather than a lookup diverting no matches to a no match connection. Have you seen a general rule when it is optimal to use the merge rather than a lookup?

  • The final test query should also be

    Use AdventureWorks

    go

    Select Count(*) As RecCount

    From dbo.Contact

    Where FirstName = 'Andy'

    And LastName = 'Leonard'

    If you run the query before executing the package you get a count of 1. After executing the package the count is 0.

  • Hi Davide,

    How can one use the three ADD/DELETE/UPDATE operations at once with SQL MERGE?

    I would really like to have a clue of how it works.

    Thanks

  • SQL 2008 BOL "ms-help://MS.SQLCC.v10/MS.SQLSVR.v10.en/s10de_1devconc/html/952595a6-cf1d-4ff5-8927-66f9090cf79d.htm". MS finally did something nice. It has it's limitations and place but doesn't everything. I use it quite a bit, but would really like to see it added to SSIS. Enjoy 🙂

  • Hi Andy,

    Great series (integration services)! It was very helpful -- easy to follow and extremely relevant. I just have one thing I believe is an error. It was on the last part (delete rows). The final query to check if rows were deleted went against the person.contact table. However, the row was actually added to the dbo.contact table. So I believe the last query should also go against the dbo.contact table. Just an Fyi. Thanks for your great articles!

    Michelle

  • Andy, for the TSQL step to apply updates and Deletes to the destination database, would you recommend adding indexes to the stage tables that match the indexes on the production tables (I'm assuming that in reality the join will be on some unique (or close to) identifier rather than an email address) to help the optimiser and therefore speed the package processing time?

  • Hi Jonathan,

    If it would improve performance, sure. I am unsure how much performance improvement could be realized unless there are a lot of staged rows, though. But as with many things related to SQL Server, I am certain the answer is "it depends."

    :{>

    Andy Leonard, Chief Data Engineer, Enterprise Data & Analytics

  • Andy,

    Thank you for the great post. This tutorial helped me a lot. I think your delete script is deleting from the source table instead of the destination table. The script should have been:

    [font="Courier New"]Delete dest

    From dbo.Contact dest

    Join StageDeletes stage

    On stage.Email = dest.Email[/font]

    The table should have been dbo.Contact instead of Person.Contact. And the test script should change to:

    Use AdventureWorks

    go

    Select Count(*) As RecCount

    From dbo.Contact

    Where FirstName = 'Andy'

    And LastName = 'Leonard'

  • Hi Andy,

    Appreciate your good stuffs.. I know it is very late comment to this post. However I like to share my opinion and get your thoughts as well.

    1. From your scenario, my view of the incremental data means the data that has been amented/inserted since certain time. according to this logic, the extracted source data in this article is not incremental one. it is FULL SET. if my assumption / definition is not correct, please advise me.

    2. Lookup transformation with "cache" will degrade the performance for huge amounts of data due to memory usage. my opinion is that it can not be used for transactional tables (or fact tables). But surely it helps for small lookups table ( or dimension tables)

    3. Lookup transformation with "no cache" again degrade a lot when we use the FULL DATA SET as it make millions of calls to the database

    4. my option is "use the MERGE JOIN " route for delete operation by selecting only key colunms and compare them to identify the missing ones..

    to be honest , I am a LOOKUP TRANSFORAMTION lover, but no other go for DELETE OPERATION as I am aware of. Requesting you provide me any other route/ logic

    looking forward to hear your thoughts..

    Thanks again for such a wonderful series,

    Have a good day!!!

    K.Singanan

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply