One or two fact tables?

  • Hello,

    I am currently working on a dimensional model for insurance claims.

    When a claim is made money is paid out by the insurance company at a policy level (stored in a transactional format at source).

    Money can also be recovered against a claim. This is stored in the source system in a transaction level but at a lower level of detail than policy.

    Having one fact table for claims logically makes the most sense but the payments made in the insured does not break down below policy level.

    I wanted to throw this out there to get some ideas on how best to design this if poss?

    Thanks

  • Is a claim bound in any way to a policy? If yes and claim is just a smaller level of detail of a policy, than I would create just one fact table at the claim level.

    If you need data at the policy level, you just aggregate it.

    It's a best practice to create your fact tables at the lowest level of detail (aka the grain) possible.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Koen Verbeeck (8/8/2013)


    Is a claim bound in any way to a policy? If yes and claim is just a smaller level of detail of a policy, than I would create just one fact table at the claim level.

    If you need data at the policy level, you just aggregate it.

    It's a best practice to create your fact tables at the lowest level of detail (aka the grain) possible.

    Yep +1 🙂

    ==========================================================================================================================
    A computer lets you make more mistakes faster than any invention in human history - with the possible exceptions of handguns and tequila. Mitch Ratcliffe

  • Hi Both,

    Thanks for your replies...I'll provide a little more detail based on your response.

    Source system has the following tables:

    Policy table (a record for each policy)

    Policy risk (a record for each policy/risk type/premium class per each policy renewal)

    Claim Table (a record for each policy/claim/risk type/premium class) - this table contains the total paid out and the total recovered.

    Claim Recovery Table (a record for each policy/claim/risk type/premium class/treaty/reinsurance) - this table holds the recovered.

    A policy can have 1 to many associated risk types

    A policy can have 1 to many associated premium types

    A policy can have 0 to many claims

    A claim can have 0 to many recoveries

    So I'll create a simple policy dimension containing a record for each policy.

    I have then toyed with the idea then of creating a mini dimension(s) to store the rapidly changing risk and premium types as each policy could end up with hundred of records. (each policy appears an average of 47 times on the policy risk table). What are your views on this?

    I want to have a single fact table at the lowest grain possible. However I am struggling with the total paid figure which is not available at the same grain as recovery.

  • Is it possible to post some sample data (this may be fake).

    It would be easier to wrap my head around. 🙂

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Ok thanks for your help:

    declare @policy table

    (policy_number int)

    insert into @policy

    select 100

    union

    select 101

    union

    select 102

    declare @policy_risk table

    (policy_number int,

    risk_type int,

    premium_class char(3))

    insert into @policy_risk

    select 100, 1, 'AA1'

    union

    select 100, 1, 'ACC'

    union

    select 100, 1, 'GHT'

    union

    select 100, 1, 'DES'

    union

    select 100, 1, 'FIR'

    union

    select 101, 1, 'FIR'

    union

    select 101, 1, 'SEC'

    union

    select 101, 1, 'PPL'

    union

    select 102, 2, 'STA'

    union

    select 102, 2, 'KLS'

    union

    select 102, 2, 'ZZA'

    union

    select 102, 3, 'HTD'

    union

    select 102, 3, 'DTR'

    declare @claim table

    (claim_number int,

    policy_number int,

    risk_type int,

    premium_class char(3),

    total_paid money,

    total_recovered money)

    insert into @claim

    select 1, 101, 1, 'FIR', 345, 0

    union

    select 2, 102, 2, 'STA', 7465.21, 100

    declare @claim_recoveries table

    (claim_recovery_id int identity(1,1),

    claim_number int,

    policy_number int,

    risk_type int,

    premium_class char(3),

    treaty char(3),

    reinsurance_id int,

    total_recovered money)

    insert into @claim_recoveries

    select 2, 102, 2, 'STA', 'QSH',50, 25

    union all

    select 2, 102, 2, 'STA', 'AAA',51, 25

    select * from @policy

    select * from @policy_risk

    select * from @claim

    select * from @claim_recoveries

  • The total_recovered value in claim (100) is not the same as the sum of the recoveries in claim_recoveries (2 x 25). Is this normal?

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Apologies! My mistake sorry.

    It should total the 100. Updated code attached:

    declare @policy table

    (policy_number int)

    insert into @policy

    select 100

    union

    select 101

    union

    select 102

    declare @policy_risk table

    (policy_number int,

    risk_type int,

    premium_class char(3))

    insert into @policy_risk

    select 100, 1, 'AA1'

    union

    select 100, 1, 'ACC'

    union

    select 100, 1, 'GHT'

    union

    select 100, 1, 'DES'

    union

    select 100, 1, 'FIR'

    union

    select 101, 1, 'FIR'

    union

    select 101, 1, 'SEC'

    union

    select 101, 1, 'PPL'

    union

    select 102, 2, 'STA'

    union

    select 102, 2, 'KLS'

    union

    select 102, 2, 'ZZA'

    union

    select 102, 3, 'HTD'

    union

    select 102, 3, 'DTR'

    declare @claim table

    (claim_number int,

    policy_number int,

    risk_type int,

    premium_class char(3),

    total_paid money,

    total_recovered money)

    insert into @claim

    select 1, 101, 1, 'FIR', 345, 0

    union

    select 2, 102, 2, 'STA', 7465.21, 100

    declare @claim_recoveries table

    (claim_recovery_id int identity(1,1),

    claim_number int,

    policy_number int,

    risk_type int,

    premium_class char(3),

    treaty char(3),

    reinsurance_id int,

    total_recovered money)

    insert into @claim_recoveries

    select 2, 102, 2, 'STA', 'QSH',50, 50

    union all

    select 2, 102, 2, 'STA', 'AAA',51, 50

    select * from @policy

    select * from @policy_risk

    select * from @claim

    select * from @claim_recoveries

  • OK, now that I see the data, I think you should have 2 fact tables, because indeed they are at two different grains.

    Recovery is at the treaty level, while total paid is at the claim level.

    Also the fact that some claims don't have recoveries makes me think in the direction of two fact tables.

    But that's OK. When you analyse the data later on in Excel, you drag the total_paid measure next to the pivot and claims to the rows. When you add the recovery, it will automatically aggregate the data to the claims level.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Thanks for your time on this. The conclusion I kept coming to was two fact tables as well.

    What did you think about the mini dimension solution for those more frequently changing attributes?

  • Koen Verbeeck (8/8/2013)


    OK, now that I see the data, I think you should have 2 fact tables, because indeed they are at two different grains.

    Interesting topic. I've been struggling with a similar, two grains but related dimensions situation. Was coming to the conclusion that two fact tables were needed. Any suggested references for reading more about the use of two fact tables? My attempts to search for/find references were not fruitful.

  • aaa121 (8/8/2013)


    Thanks for your time on this. The conclusion I kept coming to was two fact tables as well.

    What did you think about the mini dimension solution for those more frequently changing attributes?

    If the risk and premium class have fixed number of combinations, the mini-dimension is probably a good idea.

    I do think you'd drop the policy risk dimension though, and add a surrogate key into the fact table(s), so that your policy dimension stays small. This way, the risk/premium class is directly tied to the facts.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • kl25 (8/8/2013)


    Koen Verbeeck (8/8/2013)


    OK, now that I see the data, I think you should have 2 fact tables, because indeed they are at two different grains.

    Interesting topic. I've been struggling with a similar, two grains but related dimensions situation. Was coming to the conclusion that two fact tables were needed. Any suggested references for reading more about the use of two fact tables? My attempts to search for/find references were not fruitful.

    The Data Warehouse Toolkit by Ralph Kimball would be a good start.

    Also check out his regular design tips. For example:

    http://www.kimballgroup.com/2013/08/05/design-tip-158-making-sense-of-the-semantic-layer/

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Koen Verbeeck (8/9/2013)


    kl25 (8/8/2013)


    Koen Verbeeck (8/8/2013)


    OK, now that I see the data, I think you should have 2 fact tables, because indeed they are at two different grains.

    Interesting topic. I've been struggling with a similar, two grains but related dimensions situation. Was coming to the conclusion that two fact tables were needed. Any suggested references for reading more about the use of two fact tables? My attempts to search for/find references were not fruitful.

    The Data Warehouse Toolkit by Ralph Kimball would be a good start.

    Also check out his regular design tips. For example:

    http://www.kimballgroup.com/2013/08/05/design-tip-158-making-sense-of-the-semantic-layer/

    Been doing a lot of reading in the Toolkit. Clearly haven't gotten to the right section yet. 😀

    Have only run into the design tips a few times based on general searches. Looks like I need to do a more thorough look at that section of the kimballgroup site. Thanks so much for both suggestions!

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply