Unsure about Fact and Dimension

  • Hi

    I have been working with SQL for years but i am new to data warehousing. i am trying to model some insurance data and am wondering if i can have a Policy Transaction fact who's granularity is an individual policy line from the policy table and also have a policy dimension from the same table.

    Also my fact table has about 30 measures is this to many?

    Kind Regards

    Paul

    Cheers, Paul

  • paul.ingham (7/2/2010)


    ...am wondering if i can have a Policy Transaction fact who's granularity is an individual policy line from the policy table and also have a policy dimension from the same table.

    Not sure what you mean.

    In a star schema you can expect your FACT_POLICIES table to be surrounded by a bunch of DIMensional tables.

    paul.ingham (7/2/2010)


    Also my fact table has about 30 measures is this to many?

    There is not such a thing as too many, fact table should have as many measures as needed.

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • Thanks PaulB

    I guess what i am trying to say in the first question is can the facts from one table be used in the fact table and have a dimension from the same table, ie factPolicy have a dimension of DimPolicy both getting their source data from the same table.

    The policy table has non measures in it that are facts that will need o be reported on but the table is mostly made up of measures.

    Thanks Paul

    Cheers, Paul

  • Hi Paul,

    I have a similiar design that I have been struggling with... and still are... I was wondering if you came up with a solution in the meantime 🙂

    Some detail:

    In essence the "Policy" entity can sometimes behave as a fact (i.e policytransactions fact table )

    and sometimes as a dimension... (i.e. an entity with many descriptive attributes which can be used as filters for other fact tables)

    I'm thus not sure exactly how to model this...

    Does one create a fact table AND dimension table and populate both via the ETL process OR

    Do you create the fact first via ETL and then generate a dimension using the fact as source?



    What's this "backup strategy" everyone is on about?

  • I would always choose to materialize my facts and dimensions separately in ETL as you suggest TVR. I am unclear on what business event you are attempting to measure (having never worked in the insurance industry). However, it sounds to me like you would need a Policy dimension with all its glorious attributes sourced from a dimension table that you materialize through some ETL process. Additionally you would need one or more fact tables to measure your business events which is built by a separate ETL process.

    You can do a lot of things in the UDM including using a fact table as the source for a dimension. However, my personal bent is to push all the hard work into the relational layer. It is much easier to support, tune, QA, and troubleshoot.

    Chris Umbaugh
    Data Warehouse / Business Intelligence Consultant
    twitter @ToledoSQL

  • Hi Chris,

    Your reasons for doing all the hard work in the relational layer makes alot of sense.

    Thanks for the input!



    What's this "backup strategy" everyone is on about?

  • I can heartily recommend The Microsoft Data Warehouse Toolkit by Joy Mundy for new starts (the second edition includes SQL Server 2008 R2.) And I agree that the RDBMS layer is the right place to be assembling your facts and dimensions; it's a well recognised solution and reduces a lot of data-lineage, conformity and quality issues.

    Yes, it's not too uncommon to have both a fact and a dimension derived from the same source. If you think of your dimension as your "(mostly) static reference data" and the fact as the "transactions" or "point-in-time summaries", then splitting an entity like "policy" into those two different kinds of things makes sense.

    As for 30 measures... that is a little steep. Are they all at the same grain (for example, while they all may be related to the same policy, some may be annual and some may be monthly)? If not, you may want to break the fact table into multiple facts. Another way to break up (vertically partition) fact tables is into subject-related facts. So some of those measures may be related to quite different things, in which case breaking them into smaller fact tables would make them easier to work with as well.

    To illustrate (excuse my simplistic ASCII), assume you have a fact table something like:

    D1 D2 D3 m1 m2 m3

    x x a a

    y y b b

    Even if they are at the same grain, it could make sense to break these into different (but related) subject facts. You should still be able to drill across between them because they share "conformed" dimensions.

    TVR (10/11/2011)


    Does one create a fact table AND dimension table and populate both via the ETL process OR

    Do you create the fact first via ETL and then generate a dimension using the fact as source?

    In your ETL, split them apart as part of your staging process (perhaps within an SSIS package or withing T-SQL transformation code). The dimension would be populated first, as you will need the surrogate key for subsequent loading of fact entries. I suspect you will find you want a type 2 (tracking history) "slowly changing" dimension, so you will absolutely need to generate a surrogate key.

    The book I mentioned has a great discussion over how (and why) to set up dimensions, facts and so forth.

    One final consideration: just because they are all lumped together in "policy" in the source system doesn't mean you must treat them that way in your data warehouse. There's a certain value to maintaining familiarity (the users are used to dealing with policies, so naturally expect to be able to report on them), but at the same time you can break them apart and infer other dimensions.

    For example, you may want a separate customer dimension, a separate address dimension and a mapping "fact" table between them that supports changes in address. None of that stuff needs to live in a "policy" dimension; but the fact table should refer to the customer dimension as well as the policy dimension.

    Actually, you're more than likely to have policies related to groups of customers, and those groups can change over time too. So some fun modelling there! 😀

  • Thanks Bruce!

    I needed to get that confirmation that the same entity/table in the OLTP environment can become seperate entitiies/dims/facts in the modeling world.

    I got my copy of the MSDW toolkit (SQL 2008 R2 edition) a few days ago - I'll get stuck into that a bit more.

    Oh BTW, good luck to the All Blacks for the weekend!!



    What's this "backup strategy" everyone is on about?

  • T_VR (10/11/2011)


    In essence the "Policy" entity can sometimes behave as a fact (i.e policytransactions fact table )

    and sometimes as a dimension... (i.e. an entity with many descriptive attributes which can be used as filters for other fact tables)

    I'm thus not sure exactly how to model this...

    Since the FACT table is supposed to store the facts a.k.a. measurements and the DIM table is supposed to store the attributes needed to slice and dice the facts it is not unusual to have both a FACT and a DIM table somehow describing the same OLTP entity - like a fact_policy and a dim_policy in this particular case.

    T_VR (10/11/2011)


    Does one create a fact table AND dimension table and populate both via the ETL process OR

    Do you create the fact first via ETL and then generate a dimension using the fact as source?

    My preference is to let the ETL process load both the FACT and DIM tables.

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • PaulB-TheOneAndOnly (11/22/2011)


    T_VR (10/11/2011)


    Does one create a fact table AND dimension table and populate both via the ETL process OR

    Do you create the fact first via ETL and then generate a dimension using the fact as source?

    My preference is to let the ETL process load both the FACT and DIM tables.

    Especially since the fact usually ends up pointing to the dimension! In most cases you will generate the dimension before the fact.

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply