SSAS - How do I handle dimensions with valid duplicates?

  • I'm new to SSAS and I got a problem I'm not able to resolve. Let's start with two tables, this one being a fact table with accounts and their values:

    Fact Table:

    Then there is the account hierarchy table that feeds a dimension. This dimension contains the accounts hierarchy. Accounts are grouped into Groups, and a given account can be placed as many times as wanted inside several different groups. The case of having the same account two or more times inside the same Group is not possible. For performance reasons, the hierarchy is naturalized (is this the right term?), and then handled using the HideMemberIf property. Groups have no values by themselves, they take their value from the agreggation of their children. The real hierarchy has 16 levels, but for simplicity here's an example with only 4 levels:

    Dim Groups-Accounts:

    The goal is to relate the last level of the naturalized hierarchy, Level 3 in this case, with the appropiate value in the fact table:

    The problem? When processing the dimension, SSAS complains:

    Errors in the OLAP storage engine: A duplicate attribute key has been found when processing: Table: 'CT_DimGroupsAccounts', Column: 'Level_3', Value: 'Account 2'. The attribute is 'Level_3'.

    This is true, as Account 2 and Account 3 are duplicated in the Level 3 attribute. If I tell SSAS to ignore Duplicate key errors then processing ends successfully. However, when navigating the resulting cube, the result is:

    The duplicated keys (Accounts 2 and 3 as children of Group 5) are not in the dimension, and the result is incorrect. How can I achieve the correct result given this scenario?

    Cheers!

  • Why don't you use the ID field of the groups-account dimension as a surrogate key in your fact table?

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • You mean this?:

    And then join DimID with the dimension ID? I already considered doing this, but it's not possible. In the real fact table there are more foreign keys, the most typical being Date. As you can see Account 2 and Account 3 values are duplicated, so when filtering by date in the cube I get wrong values.

  • OK, then your problem looks a lot like a many-2-many issue.

    So you could try to model your dimension as a many2many dimension in SSAS.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • How could I do that with only one dimension?

  • You probably will need to change your design in the database.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • And then join DimID with the dimension ID? I already considered doing this, but it's not possible. In the real fact table there are more foreign keys, the most typical being Date. As you can see Account 2 and Account 3 values are duplicated, so when filtering by date in the cube I get wrong values.

    Sorry, but I'm not following this. If each of the dimension entries has its own ID, which it should, this shouldn't be an issue. The repetition of keys occurs with SCD 2 dimensions. This doesn't look like one of those, but you should be able to keep in separate, just like names for counties and cities are kept separate.

    I'm not ruling out a many-to-many issue as the previous poster has indicated, but I'm not convinced that that must be the issue. Please provide more details if you'd like more help.

  • As I said, the fact table has more foreign keys, I didn't expose them in my original post for simplicity. Indeed, if there were no other dimensions Koen's solution would be ideal as it becomes a one-to-one relationship. However, when we consider the real scenario:

    Then if we apply Koen's solution it becomes like this:

    In the second case, if we filter by the Groups-Accounts dimension only, everything would be correct. However, if we filter by date (20130901) then we'll have a Value of 110. This is incorrect, if you look at the original fact table the total for 20130901 should be 60.

    To my understanding this is a one-to-many relationship (a single account with a single value can appear many times in the Groups-Accounts dimension), but I'm not sure if: a) this can be handled as such in SSAS without issues and performance penalties, and b) it's possible to convert this to a regular relationship to keep a star schema, which would be the ideal solution.

  • To my understanding this is a one-to-many relationship (a single account with a single value can appear many times in the Groups-Accounts dimension), but I'm not sure if: a) this can be handled as such in SSAS without issues and performance penalties, and b) it's possible to convert this to a regular relationship to keep a star schema, which would be the ideal solution.

    The answer to these is yes. There can be a performance hit if the names can't be designated as unique, but it's normally not signficant. My location has a lot of repeated city and the county names which repeat in the various states. Despite having one entry per zip code, there is no human noticeable delay retrieving the data.

    I need to see all the relevant columns in the dimension table. You show them in a heirarchy. That means there should be at least three columns in the dimension: surrogate ID, group, account. The latter two could even have their own key, which I would find ideal.

    Thanks,

  • The dimension table is what I showed, there is only one flatted hierarchy, the Groups-Accounts hierarchy:

    The Parent-Child table from where this originates would be like this:

  • Given that you already have a parent-child table, did you try setting it up that way in SSAS? It actually does so more naturally than in an OLTP design. If there are tens of thousands of entries, there may be a performance issue, but it should be worth a try.

    If you didn't or for some reason are not allowed, let me know, and I'll look at the first table as if it's a standard dimension. Definitely looks like it should be possible, however. You could even consider setting it up as a ragged dimension if necessary. That might give you a more natural appearance. Read about that before your next post to see if that helps.

  • Yes, I did, and I face the same problem: in a parent-child dimension a given child can have only one parent. In my case, Account 2 and Account 3 have two parents: Group 3 and Group 5. This gives two duplicate key errors that don't let me process the dimension. If I ignore these errors and process anyway, then Account 2 and Account 3 only appear as children of the first parent, in this case Group 3, just like with the flattened hierarchy.

    I've read about ragged hierarchies, and if I'm not mistaken my flattened hierarchy is already ragged. As said in the first post I'm using the HideMemberIf property and it works just fine.

  • OK, I see the real issue better. Let me think on this a day and I'll get back with you.

  • It seems odd that the amount essentially gets counted twice. Assuming that makes sense, however, I would recommend something along the line of the previous suggestion. I wouldn't necessarily use the surrogate key, but I would create an artificial key if there is not a natural key to use that would be different for all the accounts, even those of the same name. The ETL process would take the single value for the repeating accounts and ensure that "both" accounts are populated with the proper single value. SSAS would see two accounts, but it would still function as you've described.

    What do you think?

  • That would work if this dimension was the only one present in the cube. Imagine I duplicate the values for Account 2 and 3 in the fact table, each with its own unique key to the dimension, and create a regular relationship between them. This is easy and works beautifully, but if I add the time dimension when filtering by date the sum would be more than the real sum because there are more fact records than before. I could tell the client something like "hey, you have this account 3 times in the dimension, so it's normal that when filtering by date you get values as if the account was triplicated", but I don't think this is a viable solution. Another one would be doing two cubes, one with only this dimension and one with the rest, but this is ugly as well.

    I can't believe something like this can't be properly or directly addressed by SSAS. Dimensions with duplicates are very common in all kinds of business. I understand the duplicate key issue, but I don't understand how SSAS doesn't do something internally to handle this and consider each duplicate as unique. I've been trying the many-to-many relationship and the damn thing doesn't support unary operators (another requirement) properly, so this solution is out too.

Viewing 15 posts - 1 through 15 (of 24 total)

You must be logged in to reply to this topic. Login to reply