A new series sponsored by Actuality Business Intelligence on data warehousing. In part 1, the data flow in SSIS packages are used to profile the source data and determine how it should be handled in the process.
ETL processing, generally involves copying/moving, transforming, cleaning the records/transactions from one or multiple sources. Most of the batch processing or warehousing projects involve such data processing in millions on daily/weekly basis. Typically, there is a Staging area and production area. Records are cleaned, transformed, filtered and verified from staging to production area. This demands SQL Set theory based queries, parallel processing with multiple processors/CPU. The article focuses on need of SQL Set theory approach and parallel processing while processing large volume of ETL records using programming approach.
These 34 subsystems cover the crucial extract, transform and load architecture components required in almost every dimensional data warehouse environment. Understanding the breadth of requirements is the first step to putting an effective architecture in place.