Data Warehouse Models

  • Someone from my company's strategy has said that the Data Warehouse model needs to be:

    - Atomic

    - 3rd NF

    - Relational

    But I disagree. I always considered DW models to be:

    - Dimensional

    - Data has to be very granular (represents 3NF atomic data)

    - Subject Oriented

    But the Strategist from the team came back to me and commented:

    dimensional layer is for user access. What if user doesn't need to see the the data structures, eg if using a dashboard. Should we build a dimensional model? If so what advantages does it bring? Could just build a single layer? The atomic? The dimensional ? What are the implications if the dashboard includes data from both the DW and a OLTP application?

    Can someone assist me with answering these questions

    Thanks

  • rka (4/14/2013)


    Someone from my company's strategy has said that the Data Warehouse model needs to be:

    - Atomic

    - 3rd NF

    - Relational

    But I disagree. I always considered DW models to be:

    - Dimensional

    - Data has to be very granular (represents 3NF atomic data)

    - Subject Oriented

    But the Strategist from the team came back to me and commented:

    dimensional layer is for user access. What if user doesn't need to see the the data structures, eg if using a dashboard. Should we build a dimensional model? If so what advantages does it bring? Could just build a single layer? The atomic? The dimensional ? What are the implications if the dashboard includes data from both the DW and a OLTP application?

    I think some of the disagreement boils down to the differences in architecture between the Kimball and Inmon "schools" and perhaps in some different use in terms.

    James Serra gave a good presentation on an effective BI architecture at this past Saturday's SQL Saturday in Chicago. The slides can be found here: http://www.slideshare.net/jamserra/data-warehouse-architecture-16065902

    It seems that the Inmon school wants what I'd refer to as a "clean repository" and Inmon would call a CIF (Corporate Information Factory) and others call the data warehouse. A 3rd normal form repository from which your data marts are built off of.

    The Kimball school goes directly to the data marts off of a dimensional star schema; but there is an emphasis on designing the bus matrix so you don't have a bunch of isolated silos.

    Our current setup is very much like you describe, though I think I'm going to move towards a more hybrid model that incorporates some of both schools. My thought is that with the new tabular capabilities in SQL Server 2012, it may be more straightforward for the analysts/power users to use PowerPivot off of a clean repository/CIF and continue our MDX/OLAP development with our dimensional data marts.

    I'm don't think I answered all of your questions, but there's a few things to think about.

    HTH,

    Rob

  • rka,

    Your strategist is quite right that a dimensional model is not a requirement or a complete solution for a data warehouse. Dimensional models are designed very specifically for data marts supporting an OLAP-style presentation tier (such as SSAS multidimensional) but they are less well suited to things like ad-hoc analysis, dashboards, "static" reports and downstream data integration. A normalized, atomic and subject-oriented data warehouse serving as a hub for dependent data marts is a more versatile and scalable approach than dimensional marts on their own.

    A normalized data warehouse also reduces time and cost to develop, support and maintain the data warehouse environment because it ensures that presentation tier requirements are decoupled from the data sources themselves and doesn't suffer any inbuilt "bias" for a particular pattern of current requirements.

  • rka (4/14/2013)


    Someone from my company's strategy has said that the Data Warehouse model needs to be:

    - Atomic

    - 3rd NF

    - Relational

    But I disagree. I always considered DW models to be:

    - Dimensional

    - Data has to be very granular (represents 3NF atomic data)

    - Subject Oriented

    Both are correct.

    Your strategist's definition is aligned with Inmon's definition of a Data Warehouse while your own definition is aligned with Kimball's definition.

    Over the years, most DWH have developed as a collection of dimensional (Kimballistic) datamarts making this the "standard" from most people out there.

    Either way, both models are as good as the DWH Architect designing a particular DWH is.

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • I'm really resisting the urge to yell "INCOMING!!!!!!"

    I've read statements from both Inmon and Kimball that their respective takes on datawarehousing (or whatever you want to call it) are a lot closer than most people think and most of the flamewars are the results of people not completely comprehending their approaches.

    There are advantages and challenges with each method and as PaulB said, the success/failure of a given DWH has more to do with the skills of the architect than anything else.

    I'd look at it as an opportunity to challenge your own ideas about building a DWH and learn why the company uses the approach they do. Conversely, there is also an opportunity to show where some of Kimball's ideas might benefit. Regardless of outcome, it isn't personal, it's just business.

    ____________
    Just my $0.02 from over here in the cheap seats of the peanut gallery - please adjust for inflation and/or your local currency.

  • I'm always very wary of 'strategists'. Most 'strategists' I've dealt with (about 2.5 in total) are only good at one thing and that's generating enough FUD to keep themselves in a job.

    But ignoring my cynical view and assuming your strategist actually wants to add value, I make these comments:

    I work with a 'Enterprise Data Warehouse' for lack of another term. It contains:

    1. Replicas of source systems, used for operational or 'detailed' reporting

    2. Above this there is an 'ODS' layer which adds centralised reference / master grouping data to the replicas

    3. Above this there are a couple of 'datamarts' which are basically star schemas.

    4. Above this there are cubes

    5. At the top we have a BI tool running reports from each of these four layers

    Every level has certain types of business rules in it. These rules clean, conform and combine data. The higher you go the 'richer' the data. The lower you go the more 'operational' the data is.

    I can tell you all of our dashboards run from cubes, as that is the richest, most consistent layer to get data from. Most of our KPI's cannot actually run from source system data, as there is not enough info or context. But we do drill from the dashboard all the way back to your source systems and that is of great value.

    Even though your dashboard does not obviously have dimensions and measures in it, that's effectively what they are built from, unless they are purely operational KPI's (as opposed to cross system KPI's)

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply