I'm not sure I entirely understand your argument for an ODS in this context. The only difference between that and the warehouse is that the ODS may not encompass the whole enterprise. Assuming that you would still want to have an enterprise-wide data warehouse, the ODS in this context seems to be redundant. I guess what I'm asking is how is it not a data warehouse
I'm also confused about what operational functions (updates etc) you would want to apply in the ODS. Surely any updates should be applied back at the source system.
Lastly if the data marts are being fed from the ODS (as illustrated in the diagram) then what is the data warehouse used for. Shouldn't the data warehouse be used as the one holy sacred supplier of all data for reporting?
Sorry I don't mean to be so negative, data warehouse and data mart design is always very open to interpretation. Thanks for the article it has given me some new angles to consider.
Hi Robert, thank you for your comments.
The ODS is not a data warehouse because ODS is in the third normal form, not in dimensional model. But yes you are right, ODS contains all the data in the enterprise, and data warehouse database also contains all the data in the enterprise. Because of the amount of data the ODS contains is the same as data warehouse, yes in this sense ODS is redundant. But the benefits I mentioned in the articles outweigh the cost of this redundancy.
About what operational functions (updates) that you would like to apply in the ODS, and why this updates are not applied back in the source system, please allow me to illustrate this case with an example. I hope it will clarify the situation. Referring to the diagram in the article, let's say for example you have 3 ERP systems: JDE, SAP and a propriatary system (custom developed in your company) called CGL. Say you have a business performance metric called strike rate, which is computed in the ODS, based on the information from all 3 ERP systems. This strike rate is by service centre, by service type and daily. So for example, for Birmingham, for Fiscal Year 2006 week 24, the strike rate for ServiceType1 are 93%, 92%, 96%, 95%, 91%, 94% (Monday to Saturday). And so on, we calculate the strike rate for every service centre, for every service type, for every day in the year. To calculate these percentages you need information not only from 1 ERP system, but from all 3.
Now, for this strike rate metric, your company also have strike rate target, which is by service centre, by service type. For example, for Birmingham, for ServiceType1, the strike rate is 95% whilst for Manchester, for ServiceType2, the strike rate is 92%. This information are not anywhere in the source system and we need to get it to the ODS. We could update the ODS directly, for example by building a web/windows .NET application. Or (this one is probably more preferable) by designing the new application to save the strike rate targets on a separate database, and we then ETL / import this information into staging, then into ODS.
About your last point, if the data marts are being fed from the ODS then what is the data warehouse used for. Yes we could feed the data marts from the data warehouse database (as the data warehouse database contains all the information on the ODS), but it is more efficient to feed them straight from ODS, especially if the data marts are fed daily.
Thank you for all your comments Roberts, they have been an eye opener to me too.
Thanks for you reply Vincent.
Really good article but I must disagree with your comment that the ODS is not a data warehouse because it is in 3rd normal form, not in dimensional model. Although storing data in a dimensional model is valid and may be the best model for supporting end-user reporting, it is not the end-all model for a data warehouse. The only reason your ODS is not the definative data warehouse is because you said it can be updated. This idea is not generally supported in the data warehouse... but as with anything there are exceptions to every rule i.e. Type-1 slowly changing dimensions.
I believe that different methods work for different environments and we must be flexible to real world environments yet frigid in staying with proven best practices... whether it's Kimball dimensional models, Inmons CIF and DW 2.0, or John Doe in Little Company USA's who made the data warehouse work within his real world limitations. Business practices change, methodologies change... and who knows, in a couple of years we may have a new method for building data warehouse system... the Rainardi Method
Anyway I digress, overall great article with a lot to think about and I'm looking forward to more.
Thanks David. Sometimes I wonder about that, why there are only 3 types of SCD, why don't we make a new type. Why only Kimball and Inmon method (and Barry Devlin), why don't we make a new method. Your question whether a data warehouse must be a dimensional has been in my mind for a long time now. I think dimensional is a better format to get the data out but normalised form is a better format for integration. I think real time ETL is really hot at the moment and it breaks the fundamental principles that a DW is a non-volatile store. Another trend is text analytics which developed very quickly in the last year or so which I think will be a hot potato in this 2007. What do you think?
With regards to the 3 types of SCD, I don't see another alternative to handling updates in a table. Either you 1) Replace the data, 2) Version the data in a new row, or 3) Version the columns that change in the same row. If there is another technique you were thinking of, I would love to hear it. I'm always open to hearing new ideas that may help out in the future.
For the warehouse here, we find that using the Kimball design for the warehouse as well as the 2 ODS's we have works just fine. Our reasoning for using an ODS vs putting the data in the warehouse was time based. We had limited time to get data into a storable format and add to that 90% of the data is junk in one particular column that is used for reference information. Using the ODS buys us time to analyze the junk so we can eventually create some rules in ETL to cleanse the data and land it in the DWH.