Data movement is a fundamental piece of a data engineer’s duties, and recently I’ve been thinking about the art of data movement. What are some of the most important pieces that a data engineer needs to think about when confronted with data ingestion? There is of course data exporting as well, and in that case, […]
A new series sponsored by Actuality Business Intelligence on data warehousing. In part 1, the data flow in SSIS packages are used to profile the source data and determine how it should be handled in the process.
ETL processing, generally involves copying/moving, transforming, cleaning the records/transactions from one or multiple sources. Most of the batch processing or warehousing projects involve such data processing in millions on daily/weekly basis. Typically, there is a Staging area and production area. Records are cleaned, transformed, filtered and verified from staging to production area. This demands SQL Set theory based queries, parallel processing with multiple processors/CPU. The article focuses on need of SQL Set theory approach and parallel processing while processing large volume of ETL records using programming approach.