Parallel Processing of Large Volume ETL Jobs
ETL processing, generally involves copying/moving, transforming, cleaning the records/transactions from one or multiple sources. Most of the batch processing or warehousing projects involve such data processing in millions on daily/weekly basis. Typically, there is a Staging area and production area. Records are cleaned, transformed, filtered and verified from staging to production area. This demands SQL Set theory based queries, parallel processing with multiple processors/CPU. The article focuses on need of SQL Set theory approach and parallel processing while processing large volume of ETL records using programming approach.
2007-11-08
7,034 reads