Incremental Loading using SSIS

  • Hi Everyone,

    I created a ssis package to insert new records/update existing records into destination table using Look up transformation.

    Package works perfectly fine but it is updating thousands of existing records even if they are not changed.

    Instead of updating all existing records, is there a way to update just changed rows?

    Thanks.

  • Yes, read the Stairway to Integration Services lesson 4: http://www.sqlservercentral.com/articles/Stairway+Series/76390/

  • Oh my... all that just to do a simple "upsert".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thank you all.

    I added another lookup transformation on "Matched output". That resolved the issue.

  • Just to expand on Jeff’s “upsert“ comment for future reference.

    This pattern can be replaced with a single “Upsert” on the source and destination table. I find the merge is more complicated to grasp than this simple pattern. Make your choice on what is most important for you: Speed, Environment, Ease of comprehension, Support.

  • larskandersen (12/15/2015)


    Just to expand on Jeff’s “upsert“ comment for future reference.

    This pattern can be replaced with a single “Upsert” on the source and destination table. I find the merge is more complicated to grasp than this simple pattern. Make your choice on what is most important for you: Speed, Environment, Ease of comprehension, Support.

    I was thinking how easy it is to use a condition INSERT/UPDATE as 2 separate statements in T-SQL. IMHO, that gives you all of the important things you mentioned.

    And, no... I wouldn't use MERGE (T-SQL) on a bet. There doesn't seem to be a performance advantage and too few seem to be able to support it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (12/15/2015)


    larskandersen (12/15/2015)


    Just to expand on Jeff’s “upsert“ comment for future reference.

    This pattern can be replaced with a single “Upsert” on the source and destination table. I find the merge is more complicated to grasp than this simple pattern. Make your choice on what is most important for you: Speed, Environment, Ease of comprehension, Support.

    I was thinking how easy it is to use a condition INSERT/UPDATE as 2 separate statements in T-SQL. IMHO, that gives you all of the important things you mentioned.

    And, no... I wouldn't use MERGE (T-SQL) on a bet. There doesn't seem to be a performance advantage and too few seem to be able to support it.

    T-SQL is often times the secret sauce for an SSIS/ETL solution. However, there are a lot of BI developers who simply won't move data using T-SQL tasks, even when you point out that it's easier to code and 10x as fast.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Eric M Russell (12/15/2015)


    Jeff Moden (12/15/2015)


    larskandersen (12/15/2015)


    Just to expand on Jeff’s “upsert“ comment for future reference.

    This pattern can be replaced with a single “Upsert” on the source and destination table. I find the merge is more complicated to grasp than this simple pattern. Make your choice on what is most important for you: Speed, Environment, Ease of comprehension, Support.

    I was thinking how easy it is to use a condition INSERT/UPDATE as 2 separate statements in T-SQL. IMHO, that gives you all of the important things you mentioned.

    And, no... I wouldn't use MERGE (T-SQL) on a bet. There doesn't seem to be a performance advantage and too few seem to be able to support it.

    T-SQL is often times the secret sauce for an SSIS/ETL solution. However, there are a lot of BI developers who simply won't move data using T-SQL tasks, even when you point out that it's easier to code and 10x as fast.

    I guess that makes me even with those folks because I don't use SSIS for the very reasons you've stated. 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Using SSIS may be intuitive to some folks. When the preference somehow makes T-SQL a poor choice in their mind in all conceivable cases is where good solutions are missed. Clearly the converse position is just as absurd. Sometimes performance just isn't as important as consistency in design or purity of the target data platform. Sometimes performance is paramount and technology preferences and implementation details take a backseat to squeezing every millisecond out of every execution. Those capable of employing either technique, all data movement using SSIS or a hybrid approach with T-SQL, are better positioned to meet any requirement in a multitude of environments. Pick the right technique for the situation.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • there are a lot of BI developers who simply won't move data using T-SQL tasks, even when you point out that it's easier to code and 10x as fast.

    Really? As long as it's accurate, I go for the fast one. If you mean the difference between inserting using a TSQL task and using a data flow task with a table or stored procedure as a source, I have found with large volumes of data that the data flow task is much faster. It also doesn't flood the transaction log since it can be set to batch. I recently replaced a number of TSQL tasks that although easier to maintain, were demonstrably slower than data flow tasks I replaced them with.

  • RonKyle (12/16/2015)


    there are a lot of BI developers who simply won't move data using T-SQL tasks, even when you point out that it's easier to code and 10x as fast.

    Really? As long as it's accurate, I go for the fast one. If you mean the difference between inserting using a TSQL task and using a data flow task with a table or stored procedure as a source, I have found with large volumes of data that the data flow task is much faster. It also doesn't flood the transaction log since it can be set to batch. I recently replaced a number of TSQL tasks that although easier to maintain, were demonstrably slower than data flow tasks I replaced them with.

    It's been a long time so I'm not fully qualified to make a judgment anymore and maybe it was a bit of ineptitude on the part of the person that wrote the SSIS packages but one of the jobs I took on about 8 years back was to convert all the large import/export jobs to T-SQL because the SSIS packages were too slow.

    You DO have a very interesting note in the quote above. You said that the transaction log doesn't get flooded. I'll have to look into that more on the SSIS side. I used to avoid that by using a staging database in the simple recovery model and only the final "upsert" of data to the ultimate target database would do any logging.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I used to avoid that by using a staging database

    I use those, too, though I'm glad to hear I'm not the only one. I've been told by other BI developers that it's an unnecessary database. The only data in want in the warehouse is clean data. It's not just the transaction log, it's the backup as well.

  • Staging database and T-SQL for almost everything except for data movement.

    It kind of makes SSIS a control flow system but it gives me maximum flexibility for checking out what's going on, making changes to the process without having to modify the package, etc.

    Correct me if I'm wrong but don't some data flow transformations change from set base processing to row based which is why they perform so poorly in comparison to T-SQL?

  • Correct me if I'm wrong but don't some data flow transformations change from set base processing to row based which is why they perform so poorly in comparison to T-SQL?

    I think you're thinking of the update tool. The data flow inserts rows, and does so very efficiently if set up correctly, which I admit isn't the most straightforward thing.

  • RonKyle (12/17/2015)


    Correct me if I'm wrong but don't some data flow transformations change from set base processing to row based which is why they perform so poorly in comparison to T-SQL?

    I think you're thinking of the update tool. The data flow inserts rows, and does so very efficiently if set up correctly, which I admit isn't the most straightforward thing.

    SSIS is great for data flows. It's when the transformations come in that things can slow down dramatically. My flows typically have a source, a row count, and a destination into a staging database. Then stored procedures take it from there.

Viewing 15 posts - 1 through 15 (of 23 total)

You must be logged in to reply to this topic. Login to reply