Dimension spanning multiple fact tables; best method of implementation?

Question

Dimension spanning multiple fact tables; best method of implementation?

Michael Lysons

SSCertifiable

Points: 6497
More actions
May 12, 2011 at 9:09 am

#250171

My BI system has two fact tables (and therefore cubes); an Inpatient Admissions fact and an Outpatients fact. One dimension that is common to both fact tables is a dimension - named OPCS - that specifies the procedure the patient was present for.
As some procedures are performed only on Inpatients and some only on Outpatients, not all values in OPCS are present in each Fact table. Because of this, and because there are thousands of OPCS Codes, the users want the dimension to only contain values that are present in the Fact table.
My question is what is the best way of achieving this? My thoughts are to either have 2 separate dimensions, or 1 dimension with flags (either on the dimension table itself or more likely in a child table) that specify which Fact table it is present in.
Is one method considered better than another? Or are there other methods that I am missing? I personally favour the idea of having 1 dimension table with flags as this cuts down on maintenance, but I'd be happy to hear people's input.

Viewing 7 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

Chris Umbaugh Ten Centuries Points: 1334 More actions · Answer 1

Michael Lysons (5/12/2011)
My BI system has two fact tables (and therefore cubes); an Inpatient Admissions fact and an Outpatients fact. One dimension that is common to both fact tables is a dimension - named OPCS - that specifies the procedure the patient was present for.
As some procedures are performed only on Inpatients and some only on Outpatients, not all values in OPCS are present in each Fact table. Because of this, and because there are thousands of OPCS Codes, the users want the dimension to only contain values that are present in the Fact table.
My question is what is the best way of achieving this? My thoughts are to either have 2 separate dimensions, or 1 dimension with flags (either on the dimension table itself or more likely in a child table) that specify which Fact table it is present in.
Is one method considered better than another? Or are there other methods that I am missing? I personally favour the idea of having 1 dimension table with flags as this cuts down on maintenance, but I'd be happy to hear people's input.

You could do as you suggest and add a flag to the dimension table. I would look at turning that into a new attribute and adding it to the hierarchy. So you would have something like this in your UDH:

.In/Outpatient (*sorry don't know the medical industry so I cannot be creative on the name*)

..Service

and would see something like:

.Inpatient

..Service 1

..Service 2

..

.Outpatient

..Service A

..Service B

..

This is of course assuming that a service is unique among the two groups. Service A can only be used as an outpatient service.

I am curious. Why do you make two cubes? Why would it not be an alternative for you to carry both facts in the same cube. They certainly share at least one dimension in common.

Chris Umbaugh
Data Warehouse / Business Intelligence Consultant
twitter @ToledoSQL

stevefromOZ SSC-Forever Points: 43646 More actions · Answer 2

I second the question re: why not use both facts (ie measure groups) in the one cube.

I also agree, the 'Used' or 'Active' flag on the dim table could simply be brought in as an Attribute. You could then leave all of the services in the dim and let the user filter to those that had been used.

I had assumed the other case is true - a service could potentially be in or outpatient. If this is the case, you can use a single flag for 'used' on the service, and then use the measure/s from each measure group to show things like, say Service is on rows, the count of operations performed as measures (one each for in and out patient) on columns, would let you see the breakdown for given services between their in and out patient delivery. just a thought 🙂

Steve.

Chris Umbaugh Ten Centuries Points: 1334 More actions · Answer 3

stevefromOZ (5/12/2011)
I second the question re: why not use both facts (ie measure groups) in the one cube.
I also agree, the 'Used' or 'Active' flag on the dim table could simply be brought in as an Attribute. You could then leave all of the services in the dim and let the user filter to those that had been used.
I had assumed the other case is true - a service could potentially be in or outpatient. If this is the case, you can use a single flag for 'used' on the service, and then use the measure/s from each measure group to show things like, say Service is on rows, the count of operations performed as measures (one each for in and out patient) on columns, would let you see the breakdown for given services between their in and out patient delivery. just a thought 🙂

Great point Steve, and a perfect illustration of the benefit of the [one cube] model.

Chris Umbaugh
Data Warehouse / Business Intelligence Consultant
twitter @ToledoSQL

Michael Lysons SSCertifiable Points: 6497 More actions · Answer 4

Thanks for the input, guys. I forgot to mention that my plan was to have flag(s) on the dimension table and then use named queries to create the actual dimension. So, for example, the inpatient version of the dimension would have a named query sourced something like:

Select *

From MyDimension

Where IsInpatientFlag = 'Y'

I had not considered the use of the flag as an attribute in the hierarchy because I am very new to this, so thanks for the suggestions.

As to the "two cubes" question. Inpatients and Outpatients are very different data sets albeit they do share some dimensions. It just seemed logical to me to separate them. As I said, I'm very new to this so my inexperience may be showing there.

Chris Umbaugh Ten Centuries Points: 1334 More actions · Answer 5

Michael,

You'll find some that hold the opinion that mixing divergent facts in the same cube produces data sparsely and that this is a bad thing. I however am not one of the people. I am a big fan of taking full advantage of the power of the dimensional BUS. Sure there are some facts where it may not be appropriate to keep in the same cube. However, you should resist the temptation to departmentalize your data marts. The business users may not see the value now because they likely do not think in multidimensional terms. You have to convince them that there is great untapped business knowledge to be had in building each data mart on the common dimensional BUS.

In my current organization the sales group were the main drivers of the initial BI project and they clamored for invoice metrics. Fast forward a couple of years and now manufacturing, customer service, and several other "departments" all share data marts out of the same cube. Each has some of its own unique dimensionality that the other business units are not interested in. However, those dimensions that do span the organization have driven some very interesting new questions that the business has never even considered. They have used the answers to those questions to drive management decisions placing BI into the mission critical role.

If you are new to data modeling, welcome! I suggest reading some Kimball and keep plugging away. If you can deliver a successful BI implementation that becomes mission critical to the business you are not only helping your company's bottom line but your own as well.;-)

Chris Umbaugh
Data Warehouse / Business Intelligence Consultant
twitter @ToledoSQL

Michael Lysons SSCertifiable Points: 6497 More actions · Answer 6

Thanks, Chris. I've been reading and implementing a fair bit, but as with everything I get so far in and then more questions appear in my head.

You've (both) given me some good ideas for things to try, so I'll take those away (thank you) and see how I get on with them.

Cheers, guys.