Data Model design impact on aggregations

  • On my new project, I came across a very poorly designed database that is used to populate the cube. It is a star schema has 2 fact tables that join directly. each fact table has different measures associated with it. if you run a query on the main fact table to sum all the measure values, you get a different number than if you run the same query joined to the other fact table. the relationship is 1-Many from the main fact to the secondary fact. then they want to run analysis using dimensions associated with the secondary fact and measaures on the main fact (WTF??!!!!!)...

    unsurprisingly, the numbers showing up in SSAS at aggregate level seem to be wrong.. my theory is that SSAS is not able to reliably calcuate the aggregated values because the results vary based on the join to the secondary fact table. but i don't have a concrete way to prove this. Here is an example of 2 queries on the underlying data that would yield different results - to illustrate the issue:

    SELECT

    SUM(DOLLARS_GOT)

    FROM FACT_MAIN M

    SELECT

    SUM(DOLLARS_GOT)

    FROM FACT_MAIN M

    INNER JOIN FACT_SUB S

    ON M.F_ID = S.M_ID

    I am thinkign that you can't reliably build a cube on this unless these numbers are the same for both queries here, especially if the plan is to allow users to draw numbers from fact_main by itself in some cases, but filtering on fact-sub dimensions in others, using the fact_main - fact-sub relationship..

    any ideas or references i could use to explain this?

  • Nick,

    What sort of data is in these two tables? I would suspect that the issue that you are seeing with the data not aggregating up as you are expecting it to is due to a dimension in the cube that acts as a switch between the measure groups. Is this a cube that is in a production environment or is this something that someone else starting working on and never finished.

    Thanks,

    Frank

  • It may not be a design issue. It sounds as if you are looking directly as the table design and not through SSAS. If that is the case, the fact tables may be joined. In a proper OLAP design, each fact table represents a single level of granularity. It is possible that one fact table is the parent granularity to the other.

    If you try to sum these as you describe, you will get wrong results because if there are two records in whichever table is the child, the main row will repeat twice. You will need to create a derived table on the child so that it sums all the data on whatever is the FK field to the parent table. Then join the results, which should be now down to a single line, to the parent table.

    It would even be better to query an OLAP design through the OLAP tools. I know, however, that sometimes we have to do what we can with what we have.

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply