A traditional approach in data warehousing is to transform data from the stage
into a dimensional store (a.k.a. spatial store), i.e. in a star schema (or snow
flake) form. Ralph Kimball explained this in great detail in his book The
Data Warehouse Toolkit. A traditional approach in building an operational
data store (ODS) is to transform data from the stage (or straight from source)
into a third normal form data model and store it there. Inmon explained ODS
concept in his book
Building Operational Data Store. Stage is an area where we placed files
or tables from the source system temporarily, before we load them into either
ODS or data warehouse.
Some data warehouse implementations use ODS as a middle layer. This ODS
also takes the 3rd normal form, but unlike the original Inmon & Claudia's
concept, which states that ODS only stores data for a short period of time, this
new ODS contains all the history. So in this sense, one could argue that it is
not accurate to call this middle layer an ODS.
This ODS concept would be particularly useful when we have several business
systems. The new layer unites the business systems and builds a single source
of data feeding the data warehouse. Say we have 3 systems: system A, system
B and system C. And all 3 of them have customer tables. In the ODS we "overlay"
those 3 tables on top of each other. Not only it is easier to feed the data
warehouse but also the ODS in practice would becomes a new integrated business
system. Initially it might be a read only system, but it could be easily developed
to take updates, for example by creating maintenance modules. This could be
done in an SOA way in .NET, or we could utilise software such as
Data warehouse contains data for the entire enterprise: finance, operation,
logistics, human resources, marketing, strategic planning, sales, purchasing,
etc. Data marts contains one specific subject area, for example: marketing.
I must stress here that the level of granularity in the data marts is not just
the roll up level (aggregates) but also contains the bottom/detail level,
just like the granularity of the data warehouse.
If we have several data marts in the enterprise, the ODS is particularly useful.
For example, we could build a finance mart complete with all its dimensions,
and have another mart for operational purpose. Both marts taking the data from
the ODS, and the enterprise warehouse is also taking data from the ODS.
Another example is a group of companies which has subsidiaries in several industries.
Each company needs to have their own enterprise warehouse but they also need
to have a group-wide warehouse. In this case both the smaller company warehouses
and the bigger group warehouse would be build from a group-wide ODS.
The rational of building the smaller warehouses directly from the ODS is two
fold: performance and consistency.
Flexibility is another distinct advantage of having ODS as a middle layer.
The data warehouse is not directly interfacing with the source system, so it
is naturally protected. It is a lot easier to rebuild the warehouse, whether
it is total rebuilt or only partially rebuilt. This is because everything is
maintained in the ODS. They keys, the facts, the history, etc. We could opt
to limit our operational warehouse to 12 months data only, and occasionally,
if some projects require further data history, we could build a 3 year warehouse
easily, just by changing 1 parameter in the control system.
The benefits do come at a price. As we have one more layer, the data warehouse
batch window increases. Whatever ETL tool we use, this means more code to maintain.
The metadata helps, but it still increases development time and maintenance
effort. It also increases disk space requirements. The size of the ODS is typically
not far from the size of the enterprise data warehouse, plus or minus 20%. Is it
worth it? It depends on the size of our warehouse, how many business systems
we have, whether there is a need of integration, and how many marts we have
to produce and maintain.
If we are talking about 3 fact tables and 10 dimensions, only 1 warehouse without
any marts, and from 1 source system, it is not worth it. If you have 3 source
systems (and therefore would appreciate any means of integration), you have 2
marts, and we are talking 10 fact tables with 50 dimensions, yes, it is well
worth it. Can we build the ODS layer later? Yes we can but it will be very costly.
We need to reposition the data firewall, the key generation, the metadata, just
to name a few. The work of adding an ODS layer could take almost the same effort
as the building the data warehouse in the first place. Having an ODS as a middle
layer is a decision that we need to make before we build the warehouse system.
4th March 2006