Incremental Load using SQL SERVER 2008

  • I have to perform incremental load using ssis. The workflow and architecture is as follows:

    I have an etl process in place which will pull data from Server_A (oracle application) and dump the data to Server_B ( also called staging server SQL SERVER 2008). Once data is dumped, then business logic is applied and the using the etl process transformed data is dumped to the target tables(SQL SERVER). ETL process runs to create the csv's of the target tables and is send to the 3rd party who maintains the datawarehouse using the csv's.

    Now I have to pull the delta only from Server_A to Server_B and have to maintain the history also. I have checked the Server_A tables. there are no modified columns or flag columns on which I can rely and calculate the delta. I cannot use SCD on Server_A neither I can use CDC.

    I tried finding the delta from Server_A by using the checksum equivalent in oracle. But the problem is that I am not able to create the history.

    Other approach is to dump all the tables to staging and then apply checksum. but this will hit the performance and moreover client doesnot approves this architecture.

    I have in all around 70 tables in Server_A which I transfers to Server_B when ever I am refereshing the data.

    Can you please help me in building the logic for calculating the delta?

    Thanks

    Rahul Sahay

    Email: rahulsahay123@gmail.com

  • It sounds as though your hands are tied and you don't really have good options. The only thing you haven't ruled out is to do a comparison of the data as you load it from Oracle to your staging server and don't write the data that has not changed to your staging target. I would use the hashbytes function rather than a checksum because it is less likely to have a colision of data. Also make sure you trim both your source and target when comparing the data.

  • But when I am passing only delta to the staging server then I am not able to build the history. Because checksum in sqlserver and hashfunction in oracle cannot be compared. Henced I have to dump everything onto staging and calculate the delta.

    Now client is not approving this approach.

    They wants delta to come from server A. But problem with this approach is history building. Example table1 in server A has values 1, 2, 3. And table2 has value 1, 2. Henced by the end of pass 1 delta is 3. Now when I am running the package for 2nd time then delta 3 should be appended to table2 otherwise we will always get cumulative delta.

    Since server A is an oracle application ie legacy system, I have only select rights. I dont have access for creation or insertion.

  • If there is no field on the table to identify the last time a record was changed, how do they expect you to define the delta.

    Your options:

    Full snapshot of the table for yesterday and today and compare records based on primary key - slow and takes a lot of space if the tables are large.

    Set a trigger on the source table to write to a changed record log table and use this as the source for the delta extract

    If you know that records are only modified within (say) 5 days of their creation date you could pull all records that were created in the last 5 days and treat that as a delta and let the 3rd party handle the SCD issue.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply