Best practices with mutually exclusive Dimensions

  • I am designing a Data Warehouse for store retail sales (FactRetailSales). The granularity will be at the Invoice line item level.

    On an Invoice there can be products and/or services. (i.e. One Invoice could have Brake pads [Product]; Tires [Product]; Brake installation [Service]; Tire balance [Service])

    At the Invoice Line Item level, products and services are mutually exclusive. A line item can only be one or the other, never both.

    Products and Services only share a [Description] attribute, but Products have [Manufacturer], [Brand], [Category], [Sub-Category] attributes.

    Which is best practice?

    A) To create DimProduct and DimService dimensions:

    With this method, each Invoice Line Item would point to only one of the two dimensions. For a Product, there would be a link to the appropriate line in DimProduct and a "not applicable" (N/A) row in DimService. And vice versa for a Service Line Item.

    This would allow for the DimProduct and DimService to remain pure to their attributes, but there would be millions of links to either the Product or Service "N/A" row (one for every Line Item)

    or B) Create one DimProductService dimension:

    In this case, while each line item would point to a valid dimension record, all Services in the dimension table would have an "N/A" equivalent for each of the Product specific attributes.

    I understand that the answer to these types of questions is often "depends". I'd appreciate any insights into performance/ease of use trade-offs that might exist when either of these methods are put into practice. Or if there is an alternative I haven't thought of.

    Thanks.

  • My instinct would be your option (A) because, from what you have said, Products are Services are different entities. What other attributes beside Description does a Service have and what do you want to slice your RetailSales Measure by? If you want to slice by Service distinctly from slicing by Products then that would also point to having separate dimensions.

    Also, if you creating Hierarchies (and you should to improve performance and to increase the usefulness to business users), it will be easier to create meaninigful hierarchies with separate Product and Service dimensions.

    Regards

    Lempster

  • You're dealing with type-specific stars. (see chapter 13 of Star Schema - The complete reference).

    There are 3 options:

    * create one dimension to support both. (option B) This is a good choice of most attributes are the same. This is not a good choice of products differ too much from services, giving you way too much columns of which half of them is NULL each time.

    * create a core star and custom stars. The core dimension will hold all attributes that are in common with products and services. (most likely Name, SKU, surrogate key). The custom dimension (DimProduct and DimService) will store the type specific attributes. Important is that an item in the DimProduct dimension will have the same surrogate key as in the core dimension. That way you can link them together. The fact table links only to the core dimension. You also create a core fact table (with facts relating to products and services, allowing you to do cross-analysis) and custom fact tables, with specific facts relating to either product or service. The advantage of this model is that you can do a type specific analysis pretty easily (for example using only the service dimenions and custom fact), or you can do cross-analysis on both types using the core tables. Cross-analysis on type-specific attributes will of course be difficult. This option also can get quite messy if there are slowly changing dimensions type 2.

    * a generic approach, where you store the dimension info as value/type pairs into the dimension table. Very flexible, you can store everything you want but hard to query.

    Option A doesn't seem directly to be a good idea, as this requires two surrogates keys in the fact table. However, only one of those two can be filled in at the time. This can lead to issues against the grain of the fact table.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Koen Verbeeck (1/14/2014)


    Option A doesn't seem directly to be a good idea, as this requires two surrogates keys in the fact table. However, only one of those two can be filled in at the time. This can lead to issues against the grain of the fact table.

    I take your point Koen and your knowledge in this area is better than mine (:-)), but as long as there are surrogate keys that define 'Not Applicable' or 'Has Happened Yet' etc., then this isn't a problem is it? I agree that it would not be good to have NULL values.

    For example, in the Kimball Data Warehouse Toolkit book, they cover the concept of a Promotions dimension in relation to Retail Sales; there is not always going to be a promotion in place and so the Sales fact table would include whatever surrogate key denotes 'no promotion in place', but it doesn't affect the grain.

    Regards

    Lempster

  • Lempster (1/14/2014)


    Koen Verbeeck (1/14/2014)


    Option A doesn't seem directly to be a good idea, as this requires two surrogates keys in the fact table. However, only one of those two can be filled in at the time. This can lead to issues against the grain of the fact table.

    I take your point Koen and your knowledge in this area is better than mine (:-)), but as long as there are surrogate keys that define 'Not Applicable' or 'Has Happened Yet' etc., then this isn't a problem is it? I agree that it would not be good to have NULL values.

    For example, in the Kimball Data Warehouse Toolkit book, they cover the concept of a Promotions dimension in relation to Retail Sales; there is not always going to be a promotion in place and so the Sales fact table would include whatever surrogate key denotes 'no promotion in place', but it doesn't affect the grain.

    Regards

    Lempster

    That is correct, but the example in the Data Warehouse Toolkit is an optional dimension and it doesn't take part in the grain. In your case, the product or service dimension is in fact part of the grain: "The fact table measures the order dollars for a customer on a specific day on an invoice for a product or a service."

    It's the or in your grain declaration you should worry about.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • I would go with whatever is easier for the customer to query and possibly for you to abstract for the customer--although it is always tempting to go with developer convenience. Overall, I am in agreement with Koen.

    The most important thing is to know the pros and cons to each approach.

    Kimball on Resisting The Generic Dimension Approach:

    http://www.kimballgroup.com/2006/09/07/design-tip-83-abstract-generic-dimensions/

    Lastest Kimball Thinking on Heterogeneous Schemas:

    http://www.kimballgroup.com/data-warehouse-business-intelligence-resources/kimball-techniques/dimensional-modeling-techniques/supertype-subtype-heterogeneous/

    Advice for the OLAP Layer:

    http://www.cscjournals.com/docs/IJCSR10-01-01-009.pdf

    Careful when referencing the toolkit as well....make sure you also subscribe to design tips as well as, from time to time, the group will "overrule" or "modernize" their thinking from earlier editions of their books. Putting a datetime column in a FACT table, when time of day can be an important fact--such as a call center, is an example.

  • One final thought: Will you need to be doing a lot of cross-product analysis, or will they mostly be Line Of Business or Product Type specific?

  • sneumersky (1/21/2014)


    Careful when referencing the toolkit as well....make sure you also subscribe to design tips as well as, from time to time, the group will "overrule" or "modernize" their thinking from earlier editions of their books. Putting a datetime column in a FACT table, when time of day can be an important fact--such as a call center, is an example.

    Thanks for the tip about the tips - I have just set up my subscription. 🙂

  • No problem.

    There are also other schools of thought in data warehousing as well. You can check out Bill Inmon's site(s) as well with respect to the Corporate Information Factory.

    http://www.inmoncif.com/home/

    I have a bad tendency of pushing just Kimball, when in fact there are other schools of thought out there. On this site, I tend to lean way closer to the Kimball side because many of the MS BI tools like Powerpivot and Analysis Services are more friendly to the "Kimballgroup way of doing things".

    I also lean more towards Kimball as well.

  • sneumersky (1/23/2014)


    I have a bad tendency of pushing just Kimball, when in fact there are other schools of thought out there. On this site, I tend to lean way closer to the Kimball side because many of the MS BI tools like Powerpivot and Analysis Services are more friendly to the "Kimballgroup way of doing things".

    Although the CIF is normalized in the 3rd normal form, the various data marts build upon this are dimensional. So you can still use SSAS and PowerPivot when using the Inmon strategy. Also, because you can remodel everything in PowerPivot and the DMV of SSAS, you can build models directly on the normalized CIF. If just takes more work 🙂

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • I'm not sure I follow this completely, but my initial inclination would be to make this a single dimension with the initial heirachical attribute having as its data either Product or Service. If I follow this correctly, Product has more steps in the heirarchy, but you can use N/A or repeat the word Services. This allows an easy comparison between revenue for Products and Services and seems more business intuitive to me.

  • One final thought: Will you need to be doing a lot of cross-product analysis, or will they mostly be Line Of Business or Product Type specific?

    I'm not 100% sure what you mean by cross-product analysis.

    Reporting is varied. Some reports will show all Services grouped by category (i.e. there are multiple Service codes that fall under the "Tire Balances" category), another report will show Service categories as well as Product categories divided by the number of tires sold. ("Tires" would be a Product).

    Hope this clarifies. I find the unknowns of what reporting will be requested to be the most challenging part of designing.

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply