Problem with MDX calculated member

  • Hi

    I'm not that experienced in MDX, and having some trouble with a query I'm writing.

    Here it is:

    WITH

    MEMBER Measures.GIDCount

    AS

    COUNT(

    NONEMPTY(

    ([Dim Employees].[GlobalID].Members-

    [Dim Employees].[GlobalID].[All]),

    ([Measures].[Outliers - ACD Calls],

    [Dim Phone Logins].CurrentMember)

    )

    )

    SELECT

    {Measures.GIDCount,

    [Measures].[Outliers - ACD Calls]} ON 0,

    NONEMPTY

    (

    (

    [Dim Phone Logins].[Phone Logins].Members

    - [Dim Phone Logins].[Phone Logins].[All]

    --[Dim Phone Logins].[Phone Logins].[Phone Login].&[874007]

    )

    *

    ([Dim Employees].[GlobalID].Members

    - [Dim Employees].[GlobalID].[All])

    ,

    [Measures].[Outliers - ACD Calls]

    )

    ON 1

    FROM [TheCube]

    WHERE

    (

    [Dim Call Distributors].[Call Distributors].[Call Distributor].&[88],

    [Dim Call Routes].[Call Routes].&[335361],

    [Dim Local Time].[Local Time].[Days].&[20150501]

    )

    You can see that on the Rows (1) axis, I'm cross-joining the members of two dimensions (PhoneLogin and GlobalID), and getting rid of the combinations that don't have any value for the measure concerned.

    On the 0 axis, I show the measure value.

    The purpose of the query is to work out where one member from PhoneLogin has data (within the slice) for more than one GlobalID. This is a data problem which I'm trying to diagnose and debug.

    So I have this calculated member that counts the non-empty (on the basis of the measure) GlobalIDs for the current PhoneLogin (again, within the slice).

    Here's the problem.

    a) When I test this with one member from PHoneLogins (you can see this commented out), it comes back blindingly fast with the correct values.

    b) When I test it with the entire PhoneLogins dimension, but without the calculated member, again it works blindingly fast: out of c. 600,000 members in dimension PHoneLogin, and c. 300,000 members in dimension GlobalID, it returns just the 400 or so rows that matter.

    c) When I test it with the entire Phone Logins dimension, and with the calculated member, it just runs and never finishes.

    What am I doing wrong?

    thanks for any ideas!

  • First, let's correct some dimensionality issues. Instead of doing the following:

    [Dim Phone Logins].[Phone Logins].Members - [Dim Phone Logins].[Phone Logins].[All]

    Just step one level down:

    [Dim Phone Logins].[Phone Logins].[Phone Logins]

    The real issue with your query is the calculated member. Imagine it as a pseudo-cursor, executing for every combination of Phone ID and Employee Global ID. While you are attempting to make the calculation aware of where you currently are in the Phone Logins dimension (one level too high in the attribute hierarchy though), you are not doing the same for the Employee dimension.

    In my opinion, your calculated member should look something like this:

    count

    (

    nonempty

    (

    {[Dim Employees].[GlobalID].[GlobalID].currentmember}

    * {[Dim Phone Logins].[Phone Logins].[Phone Logins].currentmember}

    ,[Measures].[Outliers - ACD Calls]

    )

    )

    The above will only work for a query that returns the members of these 2 dimension attributes, otherwise there may not be a "currentmember". If you envision excluding one of these attributes from the query at some point, you may need to step down to the appropriate level by using the descendants() function.

    Not knowing exactly what your cube structure looks like, are you not trying to return a count of fact rows here? If so, you could simplify the calculated member as follows (which will work for all combinations of dimensional attributes):

    count

    (

    [Measures].[Outliers - ACD Calls]

    )

    Hope the above helps...and hope I didn't miss anything syntax-wise. Please post the solution once you have it working.

  • Hi

    Thanks for your reply.

    You've picked out the problems in the MDX I wrote, and I'm looking forward to trying out your suggestions on Tuesday when I'm back at work.

    The problem is that I'm deliberately trying to aggregate the set in 2 ways at once in one query:

    1. By PhoneLogin, Employee, for the measure total; and

    2. By PhoneLogin only, to count the distinct instances of Employee per PhoneLogin (more than one per PhoneLogin is bad, I want to highlight these cases, and perhaps develop the query to return only them).

    In SQL I'd do this with a CTE, going a GROUP BY PhoneLogin, Employee on it, and joining this query to a GROUP BY PhoneLogin (only) query, on PhoneLogin. So you're spot on with your "pseudo-cursor" comment: I'm asking the query to do the same thing twice.

    Anyway, I'll try out your suggestions on Tuesday and post the results (hopefully a solution!).

    regards

    seb

  • Hi Martin

    Thanks for your help with this!

    Martin Schoombee (5/22/2015)


    First, let's correct some dimensionality issues. Instead of doing the following:

    [Dim Phone Logins].[Phone Logins].Members - [Dim Phone Logins].[Phone Logins].[All]

    Just step one level down:

    [Dim Phone Logins].[Phone Logins].[Phone Logins]

    That's great advice. I didn't realise I could use the level expression [Dim Phone Logins].[Phone Logins].[Phone Login] (the last has no "s", for some reason) to return all the members of the level. This makes the query far simpler to read.

    The real issue with your query is the calculated member. Imagine it as a pseudo-cursor, executing for every combination of Phone ID and Employee Global ID. While you are attempting to make the calculation aware of where you currently are in the Phone Logins dimension (one level too high in the attribute hierarchy though), you are not doing the same for the Employee dimension.

    Here the problem is that I'm trying to count the number of of measure values using a different grouping from the main query. For the measure value, I'm effectively doing a GROUP BY PhoneLogin,GlobalID. What I want for the count is the count GROUP BY (only) PhoneLogin; but counting only within the set that the query returns. (Typically, the count of GlobalIDs per PhoneLogin should be 1, but is sometimes 2 or 3: which is the case I'm trying to detect).

    Your alternative calculated measure:

    count

    (

    nonempty

    (

    {[Dim Employees].[GlobalID].currentmember}

    * {[Dim Phone Logins].[Phone Logins].currentmember}

    ,[Measures].[Outliers - ACD Calls]

    )

    )

    (or your even simpler alternative COUNT(NONEMPTY([Measures].[Outliers - ACD Calls])) ) works fast and correctly returns the COUNT "GROUP BY PhoneLogin,GlobalID" - which is always 1.

    So what I'm trying to do here is:

    - Get results based on summing the measure value by PHoneLogin, GlobalID (in a particular slice of the cube).

    - From that set, also get a count of GlobalIDs GROUP BY PhoneLogin.

    I can't figure out a way to do this. I did experiment with putting the main query (without the COUNT) into a subquery; but doing that I run into this error

    A set has been encountered that cannot contain calculated members.

    Apparently I have to set the SubQueries connection property to 1 to avoid this (I'm guessing the calculated member it's objecting to is the cross-joined PhoneLogin/GlobalID on axis 1). I've tried this (I'm using SSMS) but it doesn't seem to make any difference.

    I could really use some more help if you have time to look into this again!

    thanks and regards

    Seb

  • Hi Seb,

    Afraid I'm a little lost with your ask here...could you please provide some examples with data, and show the current results plus what you're expecting or like to get?

  • Hi Martin

    Thanks for your further reply!

    I hope this example makes things clearer.

    Here's what I'm running at the moment, which uses a calculated member specifying .CurrentMember on both of the dimensions involved. (You can see my original, which specified CurrentMember only on the PhoneLogin dimension, commented out):

    WITH

    MEMBER Measures.SGIDCount

    AS

    COUNT(

    /*NONEMPTY(

    ([Dim Employees].[GlobalID].[SGID],

    [Dim Phone Logins].CurrentMember),

    [Measures].[Outliers - ACD Calls]

    )

    */

    NONEMPTY(([Dim Phone Logins].[Phone Logins].CurrentMember,[Dim Employees].[GlobalID].CurrentMember),

    [Measures].[Outliers - ACD Calls])

    )

    {Measures.SGIDCount,

    [Measures].[Outliers - ACD Calls]} ON 0,

    NONEMPTY

    (

    (

    --[Dim Phone Logins].[Phone Logins].[Phone Login].&[785522]

    [Dim Phone Logins].[Phone Logins].[Phone Login]

    )

    *

    ([Dim Employees].[GlobalID].[SGID])

    ,

    [Measures].[Outliers - ACD Calls]

    )

    ON 1

    FROM [Development - Do not use]

    WHERE

    (

    [Dim Call Distributors].[Call Distributors].[Call Distributor].&[88],

    [Dim Call Routes].[Call Routes].&[335361],

    [Dim Local Time].[Local Time].[Days].&[20150520]

    )

    The results I get from this are like this (selection shown):

    PL SGID SGIDCountACD Calls

    7801920004644641 3

    7802020004648841 16

    7802520004623051 38

    7802520004764271 38

    7802620004435391 40

    (sorry table layout is not as good as it should be - if there's a better way to show it, do let me know).

    You can see that there are two rows for PL 78025, with different SGIDs. Here, because your suggested calculated member specifies .CurrentMember on both dimensions PhoneLogin (PL here) and GlobalID (SGID here), it correctly shows 1 (as it does for all rows). What I'm trying to do is create a calculated member that would show a 2 in the rows for PL 78025, and 2 (or sometimes even 3) in similar rows with duplicates; because the PhoneLogin value has more than one measure value by SGID (GlobalID).

    Which means that you were exactly right to call it a "pseudo-cursor". What this measure would do (ideally) is wait for the entire query result to be complete, and then look through it and count the instances of PL values that appear more than one. In SQL, as opposed to MDX, this is not a trivial thing to do, but it's doable: I'd join two different GROUP BYs on the same CTE to each other (or GROUP BY both PHoneLogin and SGID in the CTE, and then join to the CTE to a Group By SGID on itself).

    So I realise that the calculated member is a clumsy way of doing w: somehow, the basic query itself (as shown above) manages to crossjoin and "NONEMPTY-ify" the two dimensions in seconds: but if I try to do the same row by row in the calc's member, using only PhoneLogin.CurrentMember crossjoined to all SGIDs, it takes long enough that I cancel it to avoid putting too much load on the server.

    There must be a way!

    thanks

    Seb

  • Makes better sense with the examples, thank you.

    I would change the calculated measure to the following, and I think this will achieve what you're looking for:

    count

    (

    nonempty

    (

    (

    [Dim Phone Logins].[Phone Logins].currentmember

    ,[Dim Employees].[GlobalID].[All]

    )

    ,

    [Measures].[Outliers - ACD Calls]

    )

    )

  • Hi Martin

    Well, this is embarrassing.

    One of the older versions now works. I used this as the calculated member:

    COUNT(

    NONEMPTY(

    ([Dim Employees].[GlobalID].[SGID],[Dim Phone Logins].[Phone Logins].CurrentMember),

    [Measures].[Outliers - ACD Calls]

    )

    )

    and it came back with the right result, in about 6 minutes. Previously, it had sat there for about 10 minutes without doing anything; or cancelled itself due to "locking conflicts".

    I think the problem is that this cube is a bit of a mess (I didn't do it, honest! I inherited it...), and is hosted on an underpowered server. So I imagined that my earlier attempts were going out and looking at the entire cube (i.e. not the slice I specified in the WHERE clause), when they were working correctly, just not reliably or fast. I've confirmed by changing the slicer that this version isn't actually doing this, but only looking for and counting duplicates in the specified slice.

    Lesson learned for me is that I'm going to test anything new like this on AdventureWorks, because the live cube is just too big and strange to be a good testbed. (Try it on a test version of the cube? Test version, what's that? We don't need no steenking test environment here... [facepalm]).

    I did find out a lot about the EXISTING keyword. If I wrap the NONEMPTY above in EXISTING(), the query's much faster, but I get the wrong results (1 in every row). The documentation about EXISTING, I've found, is very confusing, in that it doesn't precisely define the "current context" that EXISTING restricts the set to. Some sources suggest that this "current context" is the entire subcube defined by the slicer (WHERE); others that it's also the current tuples in the query axes. In fact, it seems to be both:

    a) if I use EXISTING but put only PhoneLogin on the Rows axis, I get the correct result (2 where there's more than one GlobalID)

    b) If I then add the GlobalID to the Rows axis (cross-joined with PHoneLogin), I get only 1s: which suggests that EXISTING is also consulting the current members on the axes to work out the current context.

    So, finally, thank you for helping me to sort this out - and apologies for leading you down a bit of a false trail.

    Seb

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply