Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12»»

Best practices with mutually exclusive Dimensions Expand / Collapse
Author
Message
Posted Monday, December 16, 2013 9:20 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: 2 days ago @ 10:43 AM
Points: 3, Visits: 63
I am designing a Data Warehouse for store retail sales (FactRetailSales). The granularity will be at the Invoice line item level.

On an Invoice there can be products and/or services. (i.e. One Invoice could have Brake pads [Product]; Tires [Product]; Brake installation [Service]; Tire balance [Service])

At the Invoice Line Item level, products and services are mutually exclusive. A line item can only be one or the other, never both.

Products and Services only share a [Description] attribute, but Products have [Manufacturer], [Brand], [Category], [Sub-Category] attributes.


Which is best practice?

A) To create DimProduct and DimService dimensions:

With this method, each Invoice Line Item would point to only one of the two dimensions. For a Product, there would be a link to the appropriate line in DimProduct and a "not applicable" (N/A) row in DimService. And vice versa for a Service Line Item.

This would allow for the DimProduct and DimService to remain pure to their attributes, but there would be millions of links to either the Product or Service "N/A" row (one for every Line Item)

or B) Create one DimProductService dimension:

In this case, while each line item would point to a valid dimension record, all Services in the dimension table would have an "N/A" equivalent for each of the Product specific attributes.


I understand that the answer to these types of questions is often "depends". I'd appreciate any insights into performance/ease of use trade-offs that might exist when either of these methods are put into practice. Or if there is an alternative I haven't thought of.

Thanks.
Post #1523296
Posted Tuesday, January 14, 2014 2:29 AM
SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Today @ 9:03 AM
Points: 1,891, Visits: 1,194
My instinct would be your option (A) because, from what you have said, Products are Services are different entities. What other attributes beside Description does a Service have and what do you want to slice your RetailSales Measure by? If you want to slice by Service distinctly from slicing by Products then that would also point to having separate dimensions.
Also, if you creating Hierarchies (and you should to improve performance and to increase the usefulness to business users), it will be easier to create meaninigful hierarchies with separate Product and Service dimensions.

Regards
Lempster
Post #1530575
Posted Tuesday, January 14, 2014 3:01 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 7:14 AM
Points: 12,228, Visits: 9,204
You're dealing with type-specific stars. (see chapter 13 of Star Schema - The complete reference).

There are 3 options:

* create one dimension to support both. (option B) This is a good choice of most attributes are the same. This is not a good choice of products differ too much from services, giving you way too much columns of which half of them is NULL each time.

* create a core star and custom stars. The core dimension will hold all attributes that are in common with products and services. (most likely Name, SKU, surrogate key). The custom dimension (DimProduct and DimService) will store the type specific attributes. Important is that an item in the DimProduct dimension will have the same surrogate key as in the core dimension. That way you can link them together. The fact table links only to the core dimension. You also create a core fact table (with facts relating to products and services, allowing you to do cross-analysis) and custom fact tables, with specific facts relating to either product or service. The advantage of this model is that you can do a type specific analysis pretty easily (for example using only the service dimenions and custom fact), or you can do cross-analysis on both types using the core tables. Cross-analysis on type-specific attributes will of course be difficult. This option also can get quite messy if there are slowly changing dimensions type 2.

* a generic approach, where you store the dimension info as value/type pairs into the dimension table. Very flexible, you can store everything you want but hard to query.

Option A doesn't seem directly to be a good idea, as this requires two surrogates keys in the fact table. However, only one of those two can be filled in at the time. This can lead to issues against the grain of the fact table.




How to post forum questions.
Need an answer? No, you need a question.
What’s the deal with Excel & SSIS?

Member of LinkedIn. My blog at LessThanDot.

MCSA SQL Server 2012 - MCSE Business Intelligence
Post #1530584
Posted Tuesday, January 14, 2014 3:55 AM
SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Today @ 9:03 AM
Points: 1,891, Visits: 1,194
Koen Verbeeck (1/14/2014)
Option A doesn't seem directly to be a good idea, as this requires two surrogates keys in the fact table. However, only one of those two can be filled in at the time. This can lead to issues against the grain of the fact table.


I take your point Koen and your knowledge in this area is better than mine (), but as long as there are surrogate keys that define 'Not Applicable' or 'Has Happened Yet' etc., then this isn't a problem is it? I agree that it would not be good to have NULL values.
For example, in the Kimball Data Warehouse Toolkit book, they cover the concept of a Promotions dimension in relation to Retail Sales; there is not always going to be a promotion in place and so the Sales fact table would include whatever surrogate key denotes 'no promotion in place', but it doesn't affect the grain.

Regards
Lempster
Post #1530601
Posted Tuesday, January 14, 2014 5:46 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 7:14 AM
Points: 12,228, Visits: 9,204
Lempster (1/14/2014)
Koen Verbeeck (1/14/2014)
Option A doesn't seem directly to be a good idea, as this requires two surrogates keys in the fact table. However, only one of those two can be filled in at the time. This can lead to issues against the grain of the fact table.


I take your point Koen and your knowledge in this area is better than mine (), but as long as there are surrogate keys that define 'Not Applicable' or 'Has Happened Yet' etc., then this isn't a problem is it? I agree that it would not be good to have NULL values.
For example, in the Kimball Data Warehouse Toolkit book, they cover the concept of a Promotions dimension in relation to Retail Sales; there is not always going to be a promotion in place and so the Sales fact table would include whatever surrogate key denotes 'no promotion in place', but it doesn't affect the grain.

Regards
Lempster


That is correct, but the example in the Data Warehouse Toolkit is an optional dimension and it doesn't take part in the grain. In your case, the product or service dimension is in fact part of the grain: "The fact table measures the order dollars for a customer on a specific day on an invoice for a product or a service."

It's the or in your grain declaration you should worry about.




How to post forum questions.
Need an answer? No, you need a question.
What’s the deal with Excel & SSIS?

Member of LinkedIn. My blog at LessThanDot.

MCSA SQL Server 2012 - MCSE Business Intelligence
Post #1530630
Posted Tuesday, January 21, 2014 11:11 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Thursday, March 27, 2014 2:39 PM
Points: 2,141, Visits: 486
I would go with whatever is easier for the customer to query and possibly for you to abstract for the customer--although it is always tempting to go with developer convenience. Overall, I am in agreement with Koen.

The most important thing is to know the pros and cons to each approach.

Kimball on Resisting The Generic Dimension Approach:
http://www.kimballgroup.com/2006/09/07/design-tip-83-abstract-generic-dimensions/

Lastest Kimball Thinking on Heterogeneous Schemas:
http://www.kimballgroup.com/data-warehouse-business-intelligence-resources/kimball-techniques/dimensional-modeling-techniques/supertype-subtype-heterogeneous/

Advice for the OLAP Layer:
http://www.cscjournals.com/docs/IJCSR10-01-01-009.pdf

Careful when referencing the toolkit as well....make sure you also subscribe to design tips as well as, from time to time, the group will "overrule" or "modernize" their thinking from earlier editions of their books. Putting a datetime column in a FACT table, when time of day can be an important fact--such as a call center, is an example.
Post #1533285
Posted Tuesday, January 21, 2014 11:49 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Thursday, March 27, 2014 2:39 PM
Points: 2,141, Visits: 486
One final thought: Will you need to be doing a lot of cross-product analysis, or will they mostly be Line Of Business or Product Type specific?
Post #1533301
Posted Thursday, January 23, 2014 6:39 AM
SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Today @ 9:03 AM
Points: 1,891, Visits: 1,194
sneumersky (1/21/2014)

Careful when referencing the toolkit as well....make sure you also subscribe to design tips as well as, from time to time, the group will "overrule" or "modernize" their thinking from earlier editions of their books. Putting a datetime column in a FACT table, when time of day can be an important fact--such as a call center, is an example.


Thanks for the tip about the tips - I have just set up my subscription.
Post #1534051
Posted Thursday, January 23, 2014 8:50 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Thursday, March 27, 2014 2:39 PM
Points: 2,141, Visits: 486
No problem.

There are also other schools of thought in data warehousing as well. You can check out Bill Inmon's site(s) as well with respect to the Corporate Information Factory.

http://www.inmoncif.com/home/

I have a bad tendency of pushing just Kimball, when in fact there are other schools of thought out there. On this site, I tend to lean way closer to the Kimball side because many of the MS BI tools like Powerpivot and Analysis Services are more friendly to the "Kimballgroup way of doing things".

I also lean more towards Kimball as well.
Post #1534147
Posted Friday, January 24, 2014 12:16 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 7:14 AM
Points: 12,228, Visits: 9,204
sneumersky (1/23/2014)

I have a bad tendency of pushing just Kimball, when in fact there are other schools of thought out there. On this site, I tend to lean way closer to the Kimball side because many of the MS BI tools like Powerpivot and Analysis Services are more friendly to the "Kimballgroup way of doing things".


Although the CIF is normalized in the 3rd normal form, the various data marts build upon this are dimensional. So you can still use SSAS and PowerPivot when using the Inmon strategy. Also, because you can remodel everything in PowerPivot and the DMV of SSAS, you can build models directly on the normalized CIF. If just takes more work




How to post forum questions.
Need an answer? No, you need a question.
What’s the deal with Excel & SSIS?

Member of LinkedIn. My blog at LessThanDot.

MCSA SQL Server 2012 - MCSE Business Intelligence
Post #1534359
« Prev Topic | Next Topic »

Add to briefcase 12»»

Permissions Expand / Collapse