October 10, 2006 at 10:33 am
I am assigned with a task to develop a warehouse with feeds from SQL Server as well as Oracle.
Whats the best way to acquire deltas from the sources and applying it to the warehouse? is there any white paper on this?
thanks
October 10, 2006 at 11:45 pm
I think you will need this book "The Data Warehouse ETL Toolkit" co-authored by Ralph Kimball. You can purchase it from Amazon.
October 11, 2006 at 9:59 pm
Have a look at "Project Real". This is a demonstration project developed using SQL 2005 with white papers, code and sample data. The project is intended to demonstrate best practices with SSIS and may help you down this path.
The project is actually based on real data from Barnes and Noble so it does represent how you can write ETL processes that work effectively in large environments.
Extracting delta's from your source system requires a method to find the records that have changed since the warehouse was last loaded from that system. A column that stores that date and time that a record was last changes/updated/inserted would be ideal. An issue that is not as easily handled is deletion of records from the source system. You may need to find another way if this occurs such as tracking deletions (perhaps with a trigger in the source system) or extracting the primary keys of all records in the source system - any record in the warehouse that is not in that list must have been deleted.
The Ralph Kimball book is definitely worth a read - it covers the how data warehouse load process and should give you a lot of ideas.
Viewing 3 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply