Splitting measures in a Fact Table

  • Im currently in the design stage of adding a new data source to our data warehouse. After analysing the data that's coming through I think ive settled on a fact table design but im not 100% sure its the 'right' design to choose.

    Heres an example of the data im working with and how I think I might have to tailor the fact table:

    Student A sits on Course 'English & History Studies' which covers the Subjects 'English' and 'History' and he has voted once (1 vote).

    In the fact table im thinking of breaking this vote into 2 entries:

    Student A 'English & History Studies' 'English' 0.5

    Student A 'English & History Studies' 'History' 0.5

    So im breaking the students vote from 1 to 0.5 x 2.

    Im going to further break it out again to take into account his/her modules:

    Student A 'Classical Eng Lit' 'English & History Studies' 'English' 0.25

    Student A 'Modern Languages' 'English & History Studies' 'English' 0.25

    Student A 'Modern American History' 'English & History Studies' 'History' 0.25

    Student A 'European History' 'English & History Studies' 'History' 0.25

    This way I can sum his vote by filtering the fact table by either student, module or course and it will sum as 1, and by subject to 0.5 which is what I need.

    Does this sound sensible? Ive not seen measures in a fact table split out like this before and I can't see examples in the literature.

    If anyone has any history of doing something like this I'd really appreciate hearing how you tackled the issue.

    Thanks.

  • Your scenario seems similar to a medical claims and patient diagnoses problem, where one medical claim could be related to more than one diagnosis and you would like to proportionally allocate costs by diagnosis.

    Not knowing your exact situation and requirements, if the numbers add up and to what you need...then yes it probably would be a feasible solution to include a factor to proportionally allocate student votes.

    Here are a few things you'd need to keep in mind before implementing a solution like this:

    • Assigning a factor in your fact table will require a lot of additional ETL overhead, because you'd have to update all the facts related to a student vote should anything change.
    • Is it possible to achieve the same result by using distinct counts? If so, it could be a better alternative if the ETL overhead is significant.
    • Are your end-users ok with the fact that the student votes measure will be proportioned like this, and that they may see decimal numbers as a result?

    Hope this helps.

  • Hello Martin,

    Thanks very much for your reply, its good to hear real world problems and solutions.

    It sounds like there isn't any one easy way around given that any solution will involve extra processing - either at the ETL stage to create decimalised votes or at the analysis/reporting end to use distinct count/sums/averages etc.

    I think we'll need to properly assess the overhead at the ETL stage. Its likely that the fact table will initially be roughly 150,000 rows in size and then increment by that amount each semester.

    Your third point about how the recipient would view decimalised data when they are expecting whole numbers is a good one. Fortunately they're accustomed to seeing data in this way so I think we can get away with it.

    You mention that this problem is similar to a medical claims and patient diagnoses scenario, is this one that you worked with? If so, what solution did you settle on?

    Thanks.

  • gibbettp (11/17/2015)


    Fortunately they're accustomed to seeing data in this way so I think we can get away with it.

    I think it is vitally important to know how your end-user population wants to see the data. Do they want to have the vote counted in each subject/module, or proportionally added? That should ultimately drive your modeling decision.

    gibbettp (11/17/2015)


    You mention that this problem is similar to a medical claims and patient diagnoses scenario, is this one that you worked with? If so, what solution did you settle on?

    It's been a while, but I think we settled on a helper table (positioned between the claims fact and diagnosis dimension) with a factor to proportionally assign cost. In that specific case though, it was a decision based on how the end users wanted to interpret the data.

    I'm not sure if this applies to your situation though...and it will be important to figure out exactly why it is necessary to proportion the votes. With a simple many-to-many relationship or a more complex scope assignment, it would be possible to assign a vote to each subject/module but still have it only counted once towards the grand total. It's a matter of interpretation and requirements in my opinion.

  • Thanks Martin,

    The style of the proposed fact table is being driven partly via the reporting requirements that we have been given. There is a need to sum these votes in a hierarchical fashion - from College to School - to Subject - and then Course. Things start to get tricky when the sum of subjects don't equal the sum at school level because a vote for one course can span over 2 subjects etc.

    This is why I thought about splitting a single vote fractionally as well as having a few columns in the fact table for the vote at different points in the hierarchy.

    I haven't heard of a helper table being used so i'll look into that. We're slightly limited on experience here and have been repeating what our previous colleagues put in place, whilst this has worked for the most part we are encountering difficulties when coming across new scenarios. This is why input from people like yourself on here is proving very useful.

    Thanks again,

    Paul

  • It appears from my understanding that this could very well be quite complex maintenance wise. In your design, could you not snowflake out the course # to include the subjects covered ? In a report I would be more inclined to present the data as a drill down with vote value being additive outside of the group, and boolean (semi additive?) within the group.

    ----------------------------------------------------

Viewing 6 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply