Incremental load solution to track the changes

Question

Incremental load solution to track the changes

ramrajan

SSCommitted

Points: 1737
More actions
October 13, 2017 at 1:26 am

#352120

Loading the data 70 Million records from IBM DB2 to SQL Server
Problem –
1. Need to have incremental data solution
2. Source DB2 does not have any flag to determine the newly inserted or updated records, Even though we insisted their team to have some flag to track the changes but they are not willing to maintain any flag or date or timestamp.
we are doing truncate and reload as we can’t identify the changes are newly inserted records and the job is taking 3 to 4 hour to complete, So in this case how to design a solution to handle this situation.
we are using SQL server & SSIS 2016

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

Phil Parkin SSC Guru Points: 246695 More actions · Answer 1

ramrajan - Friday, October 13, 2017 1:26 AM
Loading the data 70 Million records from IBM DB2 to SQL Server
Problem â€“
1. Need to have incremental data solution
2. Source DB2 does not have any flag to determine the newly inserted or updated records, Even though we insisted their team to have some flag to track the changes but they are not willing to maintain any flag or date or timestamp.
we are doing truncate and reload as we canâ€™t identify the changes are newly inserted records and the job is taking 3 to 4 hour to complete, So in this case how to design a solution to handle this situation.
we are using SQL server & SSIS 2016

Given those constraints, I do believe that you are properly snookered. Truncate/Reload is your best option. Your challenge is to make the load part of that happen as quickly as possible, using all the optimisation tricks you can find.

If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

Jeff Moden SSC Guru Points: 1003863 More actions · Answer 2

ramrajan - Friday, October 13, 2017 1:26 AM
Loading the data 70 Million records from IBM DB2 to SQL Server
Problem â€“
1. Need to have incremental data solution
2. Source DB2 does not have any flag to determine the newly inserted or updated records, Even though we insisted their team to have some flag to track the changes but they are not willing to maintain any flag or date or timestamp.
we are doing truncate and reload as we canâ€™t identify the changes are newly inserted records and the job is taking 3 to 4 hour to complete, So in this case how to design a solution to handle this situation.
we are using SQL server & SSIS 2016

As Phil said, unless you have some sort of primary key, you're snookered.

As for the time it takes for a full reload, what method are you using? My recommenation would be to have the DB2 boys export a TSV (Tab Separated Values) file to a common area and then you do a BULK INSERT with "Minimally Logging" in play.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

ramrajan SSCommitted Points: 1737 More actions · Answer 3

We have Primary Key in Both source and target tables but how only this field help to identify the changes ?

Phil Parkin SSC Guru Points: 246695 More actions · Answer 4

ramrajan - Monday, October 16, 2017 5:04 AM
We have Primary Key in Both source and target tables but how only this field help to identify the changes ?

It helps you identify insertions and deletions, but it does not help you identify changes (updates).

If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

Jeff Moden SSC Guru Points: 1003863 More actions · Answer 5

Phil Parkin - Monday, October 16, 2017 5:43 AM
ramrajan - Monday, October 16, 2017 5:04 AM
We have Primary Key in Both source and target tables but how only this field help to identify the changes ?
It helps you identify insertions and deletions, but it does not help you identify changes (updates).

But, you can't identify the changes caused by an UPDATE unless you can identify individual rows and you need some form of unique key to do that.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1003863 More actions · Answer 6

ramrajan - Friday, October 13, 2017 1:26 AM
Loading the data 70 Million records from IBM DB2 to SQL Server
Problem â€“
1. Need to have incremental data solution
2. Source DB2 does not have any flag to determine the newly inserted or updated records, Even though we insisted their team to have some flag to track the changes but they are not willing to maintain any flag or date or timestamp.
we are doing truncate and reload as we canâ€™t identify the changes are newly inserted records and the job is taking 3 to 4 hour to complete, So in this case how to design a solution to handle this situation.
we are using SQL server & SSIS 2016

If the source data precisely matches the target table AND the Primary Key is the same for both, then using HASHBYTES to establish a single row comparator may be the answer. It could also be that a simple EXCEPT may be the ticket. You'd have to give each a try to see if it would beat the 3 to 4 hour problem.

However, if they are, in fact, providing all of the data every time, then the full reload may, indeed, be the simplest and fastest answer.

It's a real shame that the DB2 team is being uncooperative (or doesn't know how to fix the problem). Their simple addition of a "Last Modified On" column to the data would make everyone's life a whole lot easier.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Phil Parkin SSC Guru Points: 246695 More actions · Answer 7

Jeff Moden - Monday, October 16, 2017 5:49 AM
Phil Parkin - Monday, October 16, 2017 5:43 AM
ramrajan - Monday, October 16, 2017 5:04 AM
We have Primary Key in Both source and target tables but how only this field help to identify the changes ?
It helps you identify insertions and deletions, but it does not help you identify changes (updates).
But, you can't identify the changes caused by an UPDATE unless you can identify individual rows and you need some form of unique key to do that.

Jeff, I think that the OP saw this comment:

As Phil said, unless you have some sort of primary key, you're snookered.

and from it, deduced that you were suggesting that having a primary key means that an incremental load is possible.

While technically true, via a full MERGE, it is likely to be slower than the full truncate/reload option because of the absence of any fast way of identifying updated rows in the source. Worth testing, maybe, just to be sure.

If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.