Incremental load solution to track the changes

  • Loading the data 70 Million records from IBM DB2 to SQL Server
    Problem –
    1.       Need to have incremental data solution
    2.       Source DB2 does not have any flag to determine the newly inserted or updated records, Even though we insisted their team to have some flag to track the changes but they are not willing to maintain any flag or date or timestamp.

     we are doing truncate and reload as  we can’t identify the changes are newly inserted records  and the job is taking 3 to 4 hour to complete, So in this case how to design a solution to handle this situation.

    we are using SQL server & SSIS 2016

  • ramrajan - Friday, October 13, 2017 1:26 AM

    Loading the data 70 Million records from IBM DB2 to SQL Server
    Problem –
    1.       Need to have incremental data solution
    2.       Source DB2 does not have any flag to determine the newly inserted or updated records, Even though we insisted their team to have some flag to track the changes but they are not willing to maintain any flag or date or timestamp.

     we are doing truncate and reload as  we can’t identify the changes are newly inserted records  and the job is taking 3 to 4 hour to complete, So in this case how to design a solution to handle this situation.

    we are using SQL server & SSIS 2016

    Given those constraints, I do believe that you are properly snookered. Truncate/Reload is your best option. Your challenge is to make the load part of that happen as quickly as possible, using all the optimisation tricks you can find.

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • ramrajan - Friday, October 13, 2017 1:26 AM

    Loading the data 70 Million records from IBM DB2 to SQL Server
    Problem –
    1.       Need to have incremental data solution
    2.       Source DB2 does not have any flag to determine the newly inserted or updated records, Even though we insisted their team to have some flag to track the changes but they are not willing to maintain any flag or date or timestamp.

     we are doing truncate and reload as  we can’t identify the changes are newly inserted records  and the job is taking 3 to 4 hour to complete, So in this case how to design a solution to handle this situation.

    we are using SQL server & SSIS 2016

    As Phil said, unless you have some sort of primary key, you're snookered.

    As  for the time it takes for a full reload, what method are you using?  My recommenation would be to have the DB2 boys export a TSV  (Tab Separated Values) file to a common area and then you do a BULK INSERT with "Minimally Logging" in play.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • We have Primary Key in Both source and target tables but how only this field help to identify the changes ?

  • ramrajan - Monday, October 16, 2017 5:04 AM

    We have Primary Key in Both source and target tables but how only this field help to identify the changes ?

    It helps you identify insertions and deletions, but it does not help you identify changes (updates).

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • Phil Parkin - Monday, October 16, 2017 5:43 AM

    ramrajan - Monday, October 16, 2017 5:04 AM

    We have Primary Key in Both source and target tables but how only this field help to identify the changes ?

    It helps you identify insertions and deletions, but it does not help you identify changes (updates).

    But, you can't identify the changes caused by an UPDATE unless you can identify individual rows and you need some form of unique key to do that.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • ramrajan - Friday, October 13, 2017 1:26 AM

    Loading the data 70 Million records from IBM DB2 to SQL Server
    Problem –
    1.       Need to have incremental data solution
    2.       Source DB2 does not have any flag to determine the newly inserted or updated records, Even though we insisted their team to have some flag to track the changes but they are not willing to maintain any flag or date or timestamp.

     we are doing truncate and reload as  we can’t identify the changes are newly inserted records  and the job is taking 3 to 4 hour to complete, So in this case how to design a solution to handle this situation.

    we are using SQL server & SSIS 2016

    If the source data precisely matches the target table AND the Primary Key is the same for both, then using HASHBYTES to establish a single row comparator may be the answer.  It could also be that a simple EXCEPT may be the ticket.  You'd have to give each a try to see if it would beat the 3 to 4 hour problem.

    However, if they are, in fact, providing all of the data every time, then the full reload may, indeed, be the simplest and fastest answer.

    It's a real shame that the DB2 team is being uncooperative (or doesn't know how to fix the problem).  Their simple addition of a "Last Modified On" column to the data would make everyone's life a whole lot easier.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Monday, October 16, 2017 5:49 AM

    Phil Parkin - Monday, October 16, 2017 5:43 AM

    ramrajan - Monday, October 16, 2017 5:04 AM

    We have Primary Key in Both source and target tables but how only this field help to identify the changes ?

    It helps you identify insertions and deletions, but it does not help you identify changes (updates).

    But, you can't identify the changes caused by an UPDATE unless you can identify individual rows and you need some form of unique key to do that.

    Jeff, I think that the OP saw this comment:

    As Phil said, unless you have some sort of primary key, you're snookered.


    and from it, deduced that you were suggesting that having a primary key means that an incremental load is possible.

    While technically true, via a full MERGE, it is likely to be slower than the full truncate/reload option because of the absence of any fast way of identifying updated rows in the source. Worth testing, maybe, just to be sure.

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply