August 17, 2009 at 10:33 am
We want to calculate a weighting value in our hierarchy based on what values do/don't exist at the lower level.
Background:
* Our heirarchy is Region --> Country --> Company
* Our fact table stores two measures at the company level - total_customers and avg_customer_revenue.
* For some companies, we may not have a measure value for avg_customer_revenue
* At the country level we want to calculate an aggregated weighting for avg_customer_revenue that accounts for those companies with a missing avg_customer_revenue value.
Example for France:
CompanyA avg_customer_revenue = no recorded value ($0.00)
CompanyB avg_customer_revenue = $37.17
CompanyC avg_customer_revenue = $32.25
CompanyA total_customers = 3,234,320
CompanyB total_customers = 2,944,000
CompanyC total_customers = 2,694,000
Our formula is:
(CompanyA avg_customer_revenue * CompanyA total_customers) + (CompanyB avg_customer_revenue * CompanyB total_customers)
+ (CompanyC avg_customer_revenue * CompanyC total_customers)
/ (CompanyB total_customers + CompanyC total_customers)
In the denominator we only aggregate total_customers for companies that have an avg_customer_revenue.
The Calculation is:
(0 * 3,234,320) + (37.17 * 2,944,000) + (32.25 * 2,694,000) / (2,944,000 + 2,694,000) = 34.82
So weighted avg_customer_revenue for France is $34.82
We know how to calculate the denominator in MDX, but how can we calculate the numerator? We have experimented with the Aggregate and Generate functions as well as using named sets but nothing has worked so far....
Thanks for your help!!
August 17, 2009 at 11:59 am
Jon Beer (8/17/2009)
Our formula is:(CompanyA avg_customer_revenue * CompanyA total_customers) + (CompanyB avg_customer_revenue * CompanyB total_customers)
+ (CompanyC avg_customer_revenue * CompanyC total_customers)
/ (CompanyB total_customers + CompanyC total_customers)
In the denominator we only aggregate total_customers for companies that have an avg_customer_revenue.
The Calculation is:
(0 * 3,234,320) + (37.17 * 2,944,000) + (32.25 * 2,694,000) / (2,944,000 + 2,694,000) = 34.82
You're making this much more complicated than it needs to be. Your numerator is simply total sales for whatever slice you are interested in. Let's look at the math:
(CompanyA avg_customer_revenue * CompanyA total_customers) + (CompanyB avg_customer_revenue * CompanyB total_customers) + (CompanyC avg_customer_revenue * CompanyC total_customers)
/ (CompanyB total_customers + CompanyC total_customers)
But since (CompanyA avg_customer_revenue * CompanyA total_customers) is just the revenue for CompanyA, and the same is true for CompanyB and CompanyC, you can simplify your formula to
( CompanyA_Revenue + CompanyB_Revenue + CompanyC_Revenue )/(CompanyB total_customers + CompanyC total_customers)
But CompanyA_Revenue + CompanyB_Revenue + CompanyC_Revenue is just your total revenue, so you can further simplify
Total_Revenue / (CompanyB total_customers + CompanyC total_customers)
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
August 18, 2009 at 5:19 am
Drew - thanks for simplifying the formula, it does make it clearer.
However we are still at a loss with what the MDX would be to calculate total_revenue i.e..
(CompanyA avg_customer_revenue * CompanyA total_customers) + (CompanyB avg_customer_revenue * CompanyB total_customers) + (CompanyC avg_customer_revenue * CompanyC total_customers)
Thanks, Jon
August 18, 2009 at 8:17 am
Revenue should be one of the facts in your fact table and this fact should become a measure when you create and process your cube. If revenue is not available as a measure, then you need to rethink the design of your cube.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
August 18, 2009 at 8:39 am
Hi Drew,
We do already have company total revenue as a measure. This isn't actually the measure in question.
The leaf measure we're working on is company average_revenue_per_customer, which may be empty for some companies. So our goal is to work out the average_revenue_per_customer at the country level. In our situation you can't just aggregate it up because it will be missing those companies that we have no recorded average revenue for.
We're basically trying to backfill missing average revenue values but only when looking at the aggregated country level.
At the moment our only solution is make this prediction during the data load into the fact table so that we never have any missing values.
August 18, 2009 at 9:29 am
Ps - to clarify things a bit further, unfortunately we can't just calculate average revenue per customer by dividing total revenue by total customers - this is because the avg_revenue_per_customer value is a leaf metric that gets reported into our system by our researchers. The value may well be different to total revenue / total customers.
It sounds a bit unconventional I know - but this is what we have to deal with unfortunately.
August 18, 2009 at 10:31 am
I think the confusion is coming because of the term "Total Customers". There are actually two separate measures which could be called "Total Customers" and you're mixing the two. Let me call them the Total Customer Base and Total Actual Customers. In your sample Company A, Company B, and Company C would all contribute to the Total Customer Base, but only Company B and Company C would contribute to Total Actual Customers.
You're right that the measure you're looking for is not equivalent to Total Revenue / Total Customer Base, but I think that you'll find that it is equivalent to Total Revenue / Total Actual Customers.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
August 19, 2009 at 11:55 am
Perhaps the larger issue is the value of numbers being generated based on a prediction, especially when combined with "real" numbers. I don't mean to come across as a purist, but the real value in an OLAP implementation comes from absolute consistency, not only in the data model, but in the presence of data as well. The moment you have a design where complete consistency just isn't there, much of the value of what OLAP and SSAS can do for you, is being dropped out the window, along with the reliability of many of the statistical measures that can be so valuable. One has to be very careful to ensure that one represents reality accurately with both the data and the data model. Otherwise, whatever statistics you generate might be no better than just some numbers you randomly throw up on the screen. As Mark Twain is oft quoted as saying: "There are lies, damned lies, and statistics!" ...
I'd be most concerned about the data model here, because it appears to be inadequate to the task... Drew's comments are right on, and should definitely help.
Steve
(aka smunson)
:-):-):-)
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
Viewing 8 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply