Long versus wide fact tables

  • Hi,

    I'm working at an insurance company which has several different dimensionally modelled warehouses using differing techniques.

    One of these is using a method for measures in the underlying table structure which I've not come across before and I'm interested in getting feedback on whether it is a good design methodology.

    For reference, the system is using SQL Server 2008 R2 and the method in question basically uses a separate table to store measure names, while the fact table holds a measure field holding the ID of the measure a particular value is associated with. This makes the fact table very long and thin as opposed to wide, which I've traditionally seen in other solutions.

    e.g. Fact table structure

    FactID DimensionID1 DimensionID2 MeasureID Value

    Measure Table

    MeasureID MeasureName

    My question is does this provide any performance issues if reports are using SQL queries rather than querying cubes built on top of the database?

    I can see the advantages in this method such as making it easily extendible with new measures, but as the bulk of the SSRS reports use SQL and not MDX I'm worried by them aggregating measures across such a table could cause problems.

    Any advice will be appreciated.

    Thanks,

    Rich

  • richardkeithmoss (11/27/2012)


    Hi,

    For reference, the system is using SQL Server 2008 R2 and the method in question basically uses a separate table to store measure names, while the fact table holds a measure field holding the ID of the measure a particular value is associated with. This makes the fact table very long and thin as opposed to wide, which I've traditionally seen in other solutions.

    e.g. Fact table structure

    FactID DimensionID1 DimensionID2 MeasureID Value

    Measure Table

    MeasureID MeasureName

    It would seem that this would result in a cross-tab sort of setup where you have multiple rows for one dimension and fact combination instead of one fact table row with multiple measures. Is there only one shared fact table for everything? It honestly seems like a very specialized solution to me and not something that I would use by default. You would have a lot more reads to return data. Even if you only had one set of entries per customer per month, you would go from normally having 12 rows a year to 12*(# measures) per year. If you have a lot of measures then this would grow quite large. While narrow tables are better than wide ones, I think this sort of special design might be taking that a bit far. It looks like they decided to build the fact table around specific reports instead of making it flexible enough to use multiple ways.

    To sum things up, I would only consider a table design like this as a special reporting layer. I would still have my regular fact and dimension structures and then this separate structure. I would only use this structure if a report could not be implemented using the normal tables.

  • OK thanks for that, unfortunately it's not my call and this design has been pushed forward as the way all new warehouses should be built going forward.

    I'll have to see if I can battle this one going forward.

    Thanks for your help!

  • It's an interesting design but goes against everything I have seen in my BI/Data Warehousing studies. I am half tempted to design a similar schema just to see what happens. Too bad I don't have that sort of time! Good luck working with this. I would let the architects or management know that this design might prevent the use of SSAS in the event that OLAP is ever discussed.

  • Actually in this instance SSAS is being used. A view is used to pivot the data back into the traditional structure that the cube is built on.

    The original concept for the narrow table is because Qlikview is also in use and needs that structure, but SSAS cubes are used with performancepoint too, as well as them being accessed directly with Excel.

    In this case though, I'm not so concerned about the cubes, once they're processed the data is there, although the length of time to perform that load could be a concern.

  • Ah, I can see where the structure would work better for QlikView. Since it has it's own processing engine that builds a cube in memory (more or less). I did something similar when I used QlikView to get around the limitations of it not allowing multiple joins between objects in the structure (the whole circular reference thing).

    Out of curiosity, why are you using Performance Point (PP) and QlikView? QlikView is more flexible than PP but I would think that maintaining both would be a pain. Are the QlikView loads filtered for a subset of your facts? I could see using it for departmental BI and PP for higher level stuff. We gave up on PP because Sharepoint was a pain to configure and we didn't feel that PP was flexible enough for our needs. We considered QlikView but didn't want to report from the relational layer since we put so much time into building out the OLAP. Unless anything has changed in the past 2 years, QlikView is not very OLAP friendly. So we went with Dundas Dashboard instead. It's worked out well so far.

  • Its an historical leftover from the company buying other companies, as a result there is a whole mismash of different warehouses and reporting tools all over the place.

    Qlikview is also not widely liked within the company, although I've heard the latest version of Qlikview will connect to SSAS, but you have to write the MDX scripts yourself to achieve it. Not tried it though, so don't hold me to that.

  • That is what I saw as well. You have to write is like an openrowset() type query I think. Inside that you would have your MDX. I like QlikView as far as departmental use goes or specific use goes. I don't think it scales well. I would not try to create an enterprise system using it. Good luck with the mishmash!

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply