RE: Conditional Set-Based Processing: Moving Towards a Best Practice for ETL

SSC Veteran

Points: 211

November 23, 2009 at 4:34 pm

Hi Folks,

I understand what is desired. Please don't get me wrong: set based processing has been around for a long long time. What is important here is to note the following:

1) Size of the source and target don't matter - whether big or small, set based processing is important.

2) Using cursors and iterating over rows runs much slower than using "array" or block style commands, regardless of whether it's written in ETL (SSIS) or SQL stored procedures, or other ETL engine technologies.

3) Speed is important when moving data to the warehouse from the staging area.

4) Road-blocks to performance are often setup because of the data modeling architecture (ie: requiring business rules to be applied to the data when loading from stage to warehouse).

5) Business rule processing can "break" the set based architectural approach

6) moving business rules downstream (hence putting RAW data in the data warehouse) allows the EDW to be compliant and auditable, it also leads to many opportunities to apply set based operations over block style commands.

7) The set based processing commands for inserts (when dealing with RAW data) allow these pieces to be specific against index matches, which in turn allow for maximum parallelism and maximum partitioning.

I've used many of these techniques on data warehouses ranging in size from 3MB to 3 Petabytes with great success, again - you don't have to have large data warehouses to see the benefits of set based processing, however you DO need to move the "processing business rules" out of the way, and allow the good, the bad, and the ugly data IN to the data warehouse - otherwise, you miss out on compliance and auditability.

Hope this helps,

Dan Linstedt

DanL@DanLinstedt.com