• shannonjk (12/5/2012)


    So I will keep this concise as I can; I have a source as400 system I am extracting data from for a Data Warehouse. The standard method is to load data from this system into staging tables, then do further ETL from there into the Data Warehouse. The current method to load the staging tables is to wipe all data (truncate) and load fresh so it captures deletions, updates, and additions.

    Problem is, the as400 is now in a hosted environment, and the data is growing quite large so loading these 30Gb of data.

    So my question is....is there a way to directly query the data at the source for changes, updates, and deletions and handle this dynamically without wiping and loading every night?

    This is a question about the capabilities of the data/database on your source system. Do you have Modified Date and Create Date fields on all of your tables that are correct & trustworthy in the source data? Do you just mark a record with a deleted flag? or does it actually get deleted from the db? (Not a problem either way, you just need to handle the two situations differently).

    I don't know your source system; but if you can identify the changes accurately on your source system, then we can talk about ETL steps for incrementally loading/updating your staging tables.

    HTH,

    Rob