Sum & Count Functions for Measures in Cubes

Question

Sum & Count Functions for Measures in Cubes

JenniferS

SSC-Addicted

Points: 445
More actions
March 6, 2003 at 11:08 am

#105625

Hello to All,
I have a column in my fact table that contains nulls. When I bring this into my cube as a measure and apply the Count Function, it seems to recognize the null values and not count them, where it will count zero values. That's what I want.
But the Sum Function seems to convert the nulls to zero. This does not make a difference to accuracy of aggregated sums, but...
My objective is to have for each column a single measure using the Sum Function that delivers accurate info about its status as null or zero, and have that result be used by calculated members, where it will be critical to know what's zero and what's null.
Is there a way to force the Sum Function to use nulls?
Do you think that is possible without changing my source data?
Thanks in advance,
Jennifer

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply

stevefromOZ SSC-Forever Points: 43646 More actions · Answer 1

Why do you want to Sum the NULLS? And more importantly, how would you *like* them summed?

If you have a critical need (in your calc members) to know whether the value is null, zero or non-zero, why not do an nested IIF statement in there, and include/exclude as necessary?

If you send thru some details on what you are trying to achieve as an end result it may make making suggestions a little easier.

HTH,

Steve.

Steve

Steve.

JenniferS SSC-Addicted Points: 445 More actions · Answer 2

Hey Steve!! Thanks!!

Sorry for the lack of info. This is what I'm trying to do.

I want one measure with the fact table data, and the only way I can see to do that is with a sum measure.

I want to add a "Count" calc member on a dimension different from measures. I wish I'd thought this up, but it should make a huge cube easier to deal with. This CountCalcMem will use the data in the SumMeasure and it will count 1 every time it sees a 0, if that 0 originates from another 0 or from a null.

So this is why I want that SumMeasure to tell me if it has a null. Then my CountCalcMem won't count nulls and will give me an accurate count of measures with non-null data.

Wacha think?

Jennifer

JenniferS SSC-Addicted Points: 445 More actions · Answer 3

Here is a short sample cube to construct to illustrate:

Make a new SqlServer database and build these 2 tables to keep track of purchases:

This is the dimension table:

PurchaseKeyPurchaseName

1Breakfast

2Chair

3Lunch

4Piano

This is the fact table:

FactKeyPurchaseKeyPurchaseAmount

110

22250

33

44947

Note that breakfast cost me $0.00 becasue my friend paid. But I don't know yet how much I'll spend on lunch, so it is null.

Make a new cube with this database as the data source. The single measure will be a Sum on PurchaseAmount. Accept default properties.

Now add another measure using PurchaseAmount again , but this time make it a Count. Accept default properties otherwise.

Process the cube. Look at the data and you'll see that the amount for breakfast is 0 and the count is 1, which is right: I had breakfasst but it did not cost me anything.

The amount for lunch is also 0 with no count. (Its count is really 0 but is formatted not to display 0's.)

The Count function was able to identify the null in the fact table and report a 0 for the count of lunch amount, and still identify the value of 0 for breakfast and give it a count of 1.

What I want is for the Sum function to be similarly intelligent and report to the cube a null sum if the amount available to it is null. As best as I can understand it, it does not do that. Instead, it reports that null amount as a sum of 0.

Do you see a way to force the Sum function to do what I want?

Thanks in advance,

Jennifer

stevefromOZ SSC-Forever Points: 43646 More actions · Answer 4

Hi Jen,

I think the following does what you want, but before putting it into production, I would ensure that you're happy that the count measure is working as expected in all cases (i.e. when I created my cube as per your demo, my count showed empty for lunch, but was really reporting zero).

I created a calc member in the measures dimension as follows ->

IIF([Measures].[Purchaseamount_cnt] = 0, NULL, [Measures].[Purchaseamount_sum])

Where Purchaseamount_cnt was your original count measure, and Purchaseamount_sum was your original sum measure.

You now have 1 too many sum measures, so hide (set the visible property to false) the original one, and you now have a count measure and sum measure that show 'empty' (when formatted correctly for the count!) for lunch.

Hope this has helped...

If this doesn't make sense, let me know and i'll shoot you thru my sample and you can restore it to take a look.

Cheers,

Steve.

JenniferS SSC-Addicted Points: 445 More actions · Answer 5

Steve,

Thanks for this. I can see how this will work. And I'd do this if my data were not so bountiful! My aim is a smaller cube, even at the expense of time for executing calc members.

I'm trying to make a huge cube more manageable by keeping only one measure in the measures dimension. Then I want another dimension that has nothing but calced members. This should produce a kind of spreadsheet kind of thingie (forgive me for being so technical) where the measure in the left column (which amounts to the fact table data) can be calced by the dictates of the calc members from the other dimension across the top.

Make another dimension table called tblCalcs with one row, CalcsKey=1, CalcsName="Sum". And add the key to the fact table.

So the calc member for count is not on the measures dimension and depends only on the sum measure for its definition. This is why it is important for the sum measure to expose its null origins.

Does that make better sense?

Jennifer

JenniferS SSC-Addicted Points: 445 More actions · Answer 6

Steve,

If I put the count calc on the measures dimension, then I need a calc member for each sum measure where I want a count, which is all of them.

With this other layout, I need only one count calc memeber as it will apply to all measures.

Jennifer

stevefromOZ SSC-Forever Points: 43646 More actions · Answer 7

:-S

Not sure I'm following the theory here, do you have one, or many, sum measures? But you are aiming to have only 1 sum member? Or you have many measures but are trying to keep the measure count at count(sum_measures) to keep cube size down? Therefore all other measure types other than sum are relegated to a special dimension?

Whichever way you're trying to cut it, it's not looking good. BOL states that where AS/MDX encounters an empty cell, it is treated in a variety of ways. Of particular note is that where it's participating in a +-/* type calculation (and you'd have to assume sum is '+'?), then it behaves as zero. 🙁 Interestingly, my quick tests with a calc member using count(with excludeempty) read the value of the summed null as zero, not empty, which goes along with the theory above re:emptys being coerced to zeros when summed etc. But interestingly, when using the Count aggregate as a true measure based on the $ value coming in, it counted correctly.

Sounds like a fun cube to be playing with.

Steve.

JenniferS SSC-Addicted Points: 445 More actions · Answer 8

Steve,

I've got about 500 facts in my fact table and I want each as a sum measure in the measures dimension. So that means about 500 sum measures.

Then I've got about 75,000 rows in my fact table.

You're right. The idea is to reduce the number of entries in the measures dimension and tranfer that work to another dimension.

And you are finding the same thing I am, that count will count a null fact table value as 0 and that sum reports a null in the fact table as 0. I assume that's because for the reasons you refer to in BOL that sum is summing and it converts nulls to 0.

I am hoping there is a way for sum to distinguish between the leaf-level load from the fact table, that really does not require the work of sum but more the work of copy, and not change the data.

And there is no way, as I understand it, to get data into the cube without using the aggregates supplied in the dropdown in the interface. And there is no copy there.

There also is not average, and that's probably because the sum function lies ;(.

Oh.. a thought. I wonder if I could load the cube, then with some ADO connect to the fact table and the cube and update the cube to nulls in the appropriate places. But I'll bet if that worked, the sum function would convert it on first access to a 0 again.

I'm beginning to think this is a loosing battle and I need another approach.

What do you think?

Jennifer

stevefromOZ SSC-Forever Points: 43646 More actions · Answer 9

Hi Jennifer,

Is it *really* a business requirement to have all 500 measures in the one cube? If you can, I would try to make individual cubes, with logical groupings of measures in each one (maybe us AS data mining to cluster them?) and then where required, make virtual cubes to show a few of the cubes rolled into one.

I havent been able to find an AS version of this document, but if you take a look at http://www.msdn.microsoft.com/library/default.asp?url=/library/en-us/dnolap/html/olapperf.asp it will show you an indicative AS record size for your cube. Note the bytes used for float type measures Vs integer types -> twice as much.

I don't think loading and then going back to manipulate the data will work, as it would really appear that sum will always convert empty/null to zero.

Do you need this count type measure for each and every sum? I know it was a smaller set, but with your sample, you could have added a field (say with a sql view) that could be a flag to show whether a transaction had actually occured or if it was zero balanced. But this would then give you an additional 500 dimensions, each essentially a bit flag (ie yes or no).

Sorry I'm not giving you a whole lot of options/ideas here.

Depending on your data, the following may work. Recently we had a client system that had many different 'types' of revenues, which they wanted all as measures within a cube, which we gave them. But, we were also able to pivot the measure columns into a 'type' column (ie where the col name may have been 'Rev_ABC', 'Rev_DEF' became two different rows, where the new type column showed 'Rev_ABC' for row 1 and 'Rev_DEF' for row 2). We could then create a hierachy of revenues within a separate dimension. So this left only 1 measures, that when used in conjunction with this new dimension, could split the revenue by type.

Hope this makes sense, and maybe gives you some hints to wards making your cube work.

Steve.