OLAP versus Datamart STAR schema

  • New to data warehousing. Just curious why one would choose to use an OLAP cube (STAR schema) versus creating a datamart STAR schema. BI tools should be able to read both, but wouldn't the datamart be more scalable, support multiple sources, and better suited for enterprise use? Also there are licensing costs involved witht the cubes that could be avoided by building the STAR schema in the datamart.

    Thank you for your insights....

  • The main advantage with cube is they can serve as offline storage. So if you lose the connection to your data server, you would still be able to analyze the business reports based on cubes (local).

    Another advantage is performance. Cube stores multidimensional pre-calculated business metrics. So runtime slicing & dicing (depends upon BI tools) would be very fast.

  • lance.a.williams (12/16/2011)


    New to data warehousing. Just curious why one would choose to use an OLAP cube (STAR schema) versus creating a datamart STAR schema. BI tools should be able to read both, but wouldn't the datamart be more scalable, support multiple sources, and better suited for enterprise use? Also there are licensing costs involved witht the cubes that could be avoided by building the STAR schema in the datamart.

    No reason for "OR" here... how about having your data warehouse built as a collection of domain focused start-schema a.k.a "dimensional modeling based" datamarts and also provide the business comunity with reporting tools that allow for cubes?

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • Thank you for the reply. If I provided STAR schema data marts for each business domain and BI tools that connected direct to the datamart for reporting and analysis, then what would be the need of adding OLAP cubes to the architecture?

    If I don't need OLAP cubes per se, then is there some benefit that would justify the additional cost, and the effort of loading data from a datamart to an OLAP cube?

    Thank you for your insights

  • Dev (12/17/2011)


    The main advantage with cube is they can serve as offline storage. So if you lose the connection to your data server, you would still be able to analyze the business reports based on cubes (local).

    Another advantage is performance. Cube stores multidimensional pre-calculated business metrics. So runtime slicing & dicing (depends upon BI tools) would be very fast.

    Regarding performance. With a STAR schema datamart, would I be able to store pre-calculated business metrics, or is this a functionality found in OLAP cubes that differentiates then from a STAR schema DB?

    thank you

  • lance.a.williams (12/18/2011)


    Dev (12/17/2011)


    The main advantage with cube is they can serve as offline storage. So if you lose the connection to your data server, you would still be able to analyze the business reports based on cubes (local).

    Another advantage is performance. Cube stores multidimensional pre-calculated business metrics. So runtime slicing & dicing (depends upon BI tools) would be very fast.

    Regarding performance. With a STAR schema datamart, would I be able to store pre-calculated business metrics, or is this a functionality found in OLAP cubes that differentiates then from a STAR schema DB?

    thank you

    If you are doing runtime slice & dice, prefer cubes. You can certainly achieve it with relational star schema but relational structures can’t beat cubes on this ground.

  • lance.a.williams (12/18/2011)


    Thank you for the reply. If I provided STAR schema data marts for each business domain and BI tools that connected direct to the datamart for reporting and analysis, then what would be the need of adding OLAP cubes to the architecture?

    If I don't need OLAP cubes per se, then is there some benefit that would justify the additional cost, and the effort of loading data from a datamart to an OLAP cube?

    Thank you for your insights

    I have already given you the answer... Offline Data Analysis

  • lance.a.williams (12/18/2011)


    Thank you for the reply. If I provided STAR schema data marts for each business domain and BI tools that connected direct to the datamart for reporting and analysis, then what would be the need of adding OLAP cubes to the architecture?

    If I don't need OLAP cubes per se, then is there some benefit that would justify the additional cost, and the effort of loading data from a datamart to an OLAP cube?

    These are very good questions indeed - let me answer to the best of my knowledge even when risking to start a quassi-religious holly war around the issue.

    A proper dimensional a.k.a. star-schema datawarehouse includes core FACTual and DIMensional tables but it also has to include what I like to call a "delivery layer". This "delivery layer" may be something as basic as a set of cataloged views and queries that take care of business basic questions and requirements; it may also be a set of pre-calculated, aggregated tables and/or it may be a set of cubes built by the reporting tool of choice.

    In my experience a well planned "delivery layer" performns nicely.

    I have nothing against cubes but I have seen the reporting tool + cubes solution used as an somehow elegant way of throwing the "reporting issue" over the fence and restrict the responsibility of the DBA to the core tables of the datawarehouse.

    Last but not least, I've seen both strategies co-existing without killing each other.

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • Dev (12/19/2011)


    lance.a.williams (12/18/2011)


    Dev (12/17/2011)


    The main advantage with cube is they can serve as offline storage. So if you lose the connection to your data server, you would still be able to analyze the business reports based on cubes (local).

    Another advantage is performance. Cube stores multidimensional pre-calculated business metrics. So runtime slicing & dicing (depends upon BI tools) would be very fast.

    Regarding performance. With a STAR schema datamart, would I be able to store pre-calculated business metrics, or is this a functionality found in OLAP cubes that differentiates then from a STAR schema DB?

    thank you

    If you are doing runtime slice & dice, prefer cubes. You can certainly achieve it with relational star schema but relational structures can’t beat cubes on this ground.

    Thank you for the clarification

  • PaulB-TheOneAndOnly (12/19/2011)


    lance.a.williams (12/18/2011)


    Thank you for the reply. If I provided STAR schema data marts for each business domain and BI tools that connected direct to the datamart for reporting and analysis, then what would be the need of adding OLAP cubes to the architecture?

    If I don't need OLAP cubes per se, then is there some benefit that would justify the additional cost, and the effort of loading data from a datamart to an OLAP cube?

    These are very good questions indeed - let me answer to the best of my knowledge even when risking to start a quassi-religious holly war around the issue.

    A proper dimensional a.k.a. star-schema datawarehouse includes core FACTual and DIMensional tables but it also has to include what I like to call a "delivery layer". This "delivery layer" may be something as basic as a set of cataloged views and queries that take care of business basic questions and requirements; it may also be a set of pre-calculated, aggregated tables and/or it may be a set of cubes built by the reporting tool of choice.

    In my experience a well planned "delivery layer" performns nicely.

    I have nothing against cubes but I have seen the reporting tool + cubes solution used as an somehow elegant way of throwing the "reporting issue" over the fence and restrict the responsibility of the DBA to the core tables of the datawarehouse.

    Last but not least, I've seen both strategies co-existing without killing each other.

    Thank you - looking at this question as the "delivery layer" for a proper dimensional DW makes more sense. What that delivery layer contains may vary based on cost, performance needs, user preference for BI tools, offline data analyis needs, etc...but the data in the delivery layer is still consistent as it is refreshed from the same source (the DW). Correct?

  • lance.a.williams (12/19/2011)


    PaulB-TheOneAndOnly (12/19/2011)


    lance.a.williams (12/18/2011)


    Thank you for the reply. If I provided STAR schema data marts for each business domain and BI tools that connected direct to the datamart for reporting and analysis, then what would be the need of adding OLAP cubes to the architecture?

    If I don't need OLAP cubes per se, then is there some benefit that would justify the additional cost, and the effort of loading data from a datamart to an OLAP cube?

    These are very good questions indeed - let me answer to the best of my knowledge even when risking to start a quassi-religious holly war around the issue.

    A proper dimensional a.k.a. star-schema datawarehouse includes core FACTual and DIMensional tables but it also has to include what I like to call a "delivery layer". This "delivery layer" may be something as basic as a set of cataloged views and queries that take care of business basic questions and requirements; it may also be a set of pre-calculated, aggregated tables and/or it may be a set of cubes built by the reporting tool of choice.

    In my experience a well planned "delivery layer" performns nicely.

    I have nothing against cubes but I have seen the reporting tool + cubes solution used as an somehow elegant way of throwing the "reporting issue" over the fence and restrict the responsibility of the DBA to the core tables of the datawarehouse.

    Last but not least, I've seen both strategies co-existing without killing each other.

    Thank you - looking at this question as the "delivery layer" for a proper dimensional DW makes more sense. What that delivery layer contains may vary based on cost, performance needs, user preference for BI tools, offline data analyis needs, etc...but the data in the delivery layer is still consistent as it is refreshed from the same source (the DW). Correct?

    Yes. That's correct everything has to come from the same source and that source is the DWH.

    The whole idea of the data warehouse is to have a central repository from where you can answer business questions - sometimes people does it against aggregate (and other) tables, sometimes people does it agaisnt cubes and, it is not rare to see people hitting the base fact and dim tables to answer a particular question. At the end of the day, if business is asking for granular data or for an aggregation nobody thought before there is little choice but hitting the core tables.

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • At the end of the day, if business is asking for granular data or for an aggregation nobody thought before there is little choice but hitting the core tables.

    On the cost of Performance 😀

    @lance: OLAP can't go at this level (so out of competition :-P). Its scope is limited to few dimension & few metrics. I will not mark it as a limitation because CUBEs are designed this way.

  • Dev (12/20/2011)


    At the end of the day, if business is asking for granular data or for an aggregation nobody thought before there is little choice but hitting the core tables.

    On the cost of Performance 😀

    Not necessarily - I've seen well performing queries against a billion rows datamart, it all depends on indexing strategy.

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • PaulB-TheOneAndOnly (12/20/2011)


    Dev (12/20/2011)


    At the end of the day, if business is asking for granular data or for an aggregation nobody thought before there is little choice but hitting the core tables.

    On the cost of Performance 😀

    Not necessarily - I've seen well performing queries against a billion rows datamart, it all depends on indexing strategy.

    Actually it is ‘It depends’ type of question 😛 but my comment was on hitting core tables. You have to accept it can’t be as fast as pre-aggregated tables (or materialized views).

  • Dev (12/20/2011)


    PaulB-TheOneAndOnly (12/20/2011)


    Dev (12/20/2011)


    At the end of the day, if business is asking for granular data or for an aggregation nobody thought before there is little choice but hitting the core tables.

    On the cost of Performance 😀

    Not necessarily - I've seen well performing queries against a billion rows datamart, it all depends on indexing strategy.

    Actually it is ‘It depends’ type of question 😛 but my comment was on hitting core tables. You have to accept it can’t be as fast as pre-aggregated tables (or materialized views).

    If a valid Materialized View exists and fullfills the needs of the query it don't matter that the query is pointing to base-core tables, Oracle will rewrite the query on the fly and hit the Materialized View.

    Either way - your argument is like saying that it is better to be young, rich and healthy than old, poor and plagued with disease; everybody will agree on that but it was not related to the matter at hand 😀

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.

Viewing 15 posts - 1 through 15 (of 22 total)

You must be logged in to reply to this topic. Login to reply