You are right Abdel, with micro-batch or near real time ETL, data warehouses now have new functions. It’s now used not only for analysis, but also for operational reporting and supporting daily business operations, which reduces/eliminates the needs for an ODS.
And you are right, we should only create ODS when we need one. ODS has associated additional costs both in development effort and operational overhead: more ETL processes to execute, longer loading times. If until today you have not dealt with ODS, that’s good — chances are you don’t need it. Many companies don’t use ODS; they only have a dimensional data warehouse.
The reasons why some companies use ODS are: (these vary from company to company)
1) The dimensional data warehouse is not updated frequently enough, so they created ODS which is refreshed more frequently. For example, the dimensional DW is updated once a day whilst the ODS is updated 4 times during working hours.
2) The dimensional data warehouse does not store data at the lowest level grain. It stores historical data at aggregate level, for example daily snapshots of account balances for the last 5 years. Therefore they created an ODS which stores data at the lowest level, for example the current version of all transactions.
3) The dimensional data warehouse is in denormalized structure and they need to run some report from a normalized data structure. So they created ODS which is in normalized data structure to support these reports.
4) To support operational and regulatory reporting, they need to maintain group-wide data such as standing data and operational targets. These are not available in the source systems as the source systems are company-wide (the group consists of several companies with different business systems). The ODS is not only used on read-only basis for reporting but also for maintaining this data. An example of such data is reporting structure and data conversion elements.
5) Integration. The ODS tables are very similar to their business systems, but it contains data from multiple business systems, and it’s better designed (it’s newer, more structured, more complete, more normalized, more stringent naming conventions). Hence for some reports they prefer to report from ODS then from individual business system or from the dimensional data warehouse.
6) Performance. Although the same data is available on the dimensional data warehouse, because of its normalized structure some reports execute faster when they are pointed to the ODS. The fact that the ODS stores only the latest/current version (no history) also helps the report performance, compared to the dimensional data warehouse which stores all versions and years of snapshot data. Of course the other way around is also true: some reports execute faster when they are pointed to the dimensional data warehouse instead of ODS.
About the term "dimensional data warehouse" above: I prefer to call it “dimensional data store” or DDS. Where as the term "data warehouse system" encompasses the whole thing: stage, ETL, ODS, DDS, DQ, control, audit, cubes and the front-end applications, as I described on this article: Business Intelligence or Data Warehouse[/url] and in my book[/url].
On the other hand, the term "data warehouse database" refers to either a relational DDS (as I mentioned here in this post) or a multidimensional data store (OLAP cubes). The data warehouse databases are shown in blue in the above article.
I appreciate that you may have different views Abdel and I am interested to learn from you. So I'd be grateful if you would post your views.