• Hello,

    While I agree with your approach - the basic concepts are sound - I disagree with one thing: processing the data on the way in to the data warehouse. I've been teaching, talking, and discussing the nature of Data Warehouses for over 15 years now, and today's data warehouse has become a system of record. Due in part for the need of compliance.

    This means that the good, the bad, and the ugly data need to make it in to the data warehouse, regardless of what it looks like. It also means that "processing" the data according to business rules and functions are now moved down stream (on the way out to the data marts, the cubes, the star schemas etc...).

    This does a few things:

    1) Set based processing is in use for all data across the entire warehouse

    2) all load routines are parallel, and can be partitioned if necessary

    3) load performance should be upwards of 100,000 to 250,000 rows per second - making it easy to load 1 Billion + rows in 45 minutes or less (of course this depends on the hardware)

    4) restartability is inherited, as long as set based logic is in place

    and so on... The bottom line is moving raw data into the warehouse, the other bottom line is the architecture of the receiving tables in the warehouse is vitally important. This is where the Data Vault Modeling and methodology come in to play.

    I'm frequently engaged to correct performance and tuning of large scale systems, and since Microsoft is now there with SQLServer 2008 R2, (And the fact that Microsoft is interested in the Data Vault Model), I would suggest you reexamine the way you load information (ie: putting processing of the data upstream of the data warehouse).

    You can see more about this high-speed parallel approach at: http://www.DataVaultInstitute.com (free to register), or my profile on http://www.LinkedIn.com/in/dlinstedt

    Also, keep in mind this is not some fly-by-night idea. We've got a huge number of corporations following these principles with great success, including: JP Morgan Chase, SNS Bank, World Bank, ABN-AMRO, Diamler Auto, Edmonton Police Services, Dept of Defense, US Navy, US Army, FAA, FDA, City of Charlotte NC, Tyson Foods, Nutreco, and many many more....

    Cheers,

    Dan Linstedt