Changing Granularity of Fact Table

  • I've been asked to make a change to our Data Warehouse and I'm not sure how to go about it.

    We have a fact table at a granularity of, essentially, person. One person has one row. I've been asked to add something to this row to indicate whether a person has an raised risk. I've also been asked to add a risk dimension (which sounds like something out of Dr Who) to show the reason for the raised risk. Each person can have multiple reasons why the risk is raised and this where I'm stumped. I need to find a way to link the fact table and the dimension. If I add a risk SID to each risk reason and put that on the fact table I'll add a row for each risk, thus changing the granularity of the fact table. My thinking at the moment is to put a flag on the person row to indicate the risk has been raised. From that, I can easily link the person to the reason using the person's ID number.

    Is this a good way to go about it? I've not got a great deal of data warehousing experience and at the moment I haven't got any better ideas.


    On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.
    —Charles Babbage, Passages from the Life of a Philosopher

    How to post a question to get the most help http://www.sqlservercentral.com/articles/Best+Practices/61537

  • How do you load the data warehouse?

    Can a person have risk at one load and then not have a risk the next load?

    How will implement the risk reason dimension? Do you need to keep a history of risk reasons?

    Can you use a view that does an outer join to the risk reason dimension to show persons with a risk? Like:

    Select P.*, CASE WHEN RR2.RiskReasons IS NOT NULL THEN 1 ELSE 0 END As IsRisk from dbo.Person AS P LEFT JOIN (SELECT RR.PersonID, COUNT(*) RiskReasons FROM dbo.RiskReason as RR) as RR2 on P.PersonID = RR2.PersonID

  • Jack Corbett (2/5/2016)


    How do you load the data warehouse?

    Can a person have risk at one load and then not have a risk the next load?

    How will implement the risk reason dimension? Do you need to keep a history of risk reasons?

    Can you use a view that does an outer join to the risk reason dimension to show persons with a risk? Like:

    Select P.*, CASE WHEN RR2.RiskReasons IS NOT NULL THEN 1 ELSE 0 END As IsRisk from dbo.Person AS P LEFT JOIN (SELECT RR.PersonID, COUNT(*) RiskReasons FROM dbo.RiskReason as RR) as RR2 on P.PersonID = RR2.PersonID

    It's done through stored procs called by SSIS. We first load a staging table then the fact table.

    Yes they can and vice versa.

    Not sure yet and yes. We provide the data warehouse for a third party and they want to analyse the risk data.

    In theory yes we could. However, we're really tied by politics, procedures and a desire not to give the third party anything more than they asked for.

    It has been suggested that putting four columns on the fact table, one for each of the categories of risk, is the best way to do it. It does what we were asked to do and completely complies with the spec but is very simple to implement and doesn't provide any more data than we need to.


    On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.
    —Charles Babbage, Passages from the Life of a Philosopher

    How to post a question to get the most help http://www.sqlservercentral.com/articles/Best+Practices/61537

  • BWFC (2/5/2016)


    It's done through stored procs called by SSIS. We first load a staging table then the fact table.

    Okay so that makes the process of maintaining that flag easier since you can put the logic in the procedure.

    Yes they can and vice versa.

    Is there a requirement to have this be some type of SCD so you know when and how long a person was a risk?

    Not sure yet and yes. We provide the data warehouse for a third party and they want to analyse the risk data.

    In theory yes we could. However, we're really tied by politics, procedures and a desire not to give the third party anything more than they asked for.

    So you could got he view route and have a specific view for the third party that controls what they can see.

    It has been suggested that putting four columns on the fact table, one for each of the categories of risk, is the best way to do it. It does what we were asked to do and completely complies with the spec but is very simple to implement and doesn't provide any more data than we need to.

    I wouldn't recommend this approach. What happens when you add another category of risk? Yeah, I know, that'll never happen. It will seem like the simplest implementation until you have to change something.

    I don't think adding a Risk Dimension as you are planning on implementing it is pure star or snowflake schema. To do a true star schema you would want to change the granularity in your fact table to the risk level where there is a row for each risk a person has. This might mean having a "Not a Risk" Risk in the risk dimension table or allowing NULL In the RiskID column in the person table.

  • Jack Corbett (2/5/2016)


    BWFC (2/5/2016)


    It's done through stored procs called by SSIS. We first load a staging table then the fact table.

    Okay so that makes the process of maintaining that flag easier since you can put the logic in the procedure.

    Yes they can and vice versa.

    Is there a requirement to have this be some type of SCD so you know when and how long a person was a risk?

    Not sure yet and yes. We provide the data warehouse for a third party and they want to analyse the risk data.

    In theory yes we could. However, we're really tied by politics, procedures and a desire not to give the third party anything more than they asked for.

    So you could got he view route and have a specific view for the third party that controls what they can see.

    It has been suggested that putting four columns on the fact table, one for each of the categories of risk, is the best way to do it. It does what we were asked to do and completely complies with the spec but is very simple to implement and doesn't provide any more data than we need to.

    I wouldn't recommend this approach. What happens when you add another category of risk? Yeah, I know, that'll never happen. It will seem like the simplest implementation until you have to change something.

    I don't think adding a Risk Dimension as you are planning on implementing it is pure star or snowflake schema. To do a true star schema you would want to change the granularity in your fact table to the risk level where there is a row for each risk a person has. This might mean having a "Not a Risk" Risk in the risk dimension table or allowing NULL In the RiskID column in the person table.

    It's definitely easier to maintain using the stored procs. I'm planning to move towards SSIS eventually but that's very much a long term, spare time project.

    I know what you're saying about the problems of adding another column further down the line, however as it stands it might be the least worst option. The categories are derived from a view that we maintain. If another category were to be added, we'd have to change the view to generate it first and on the back of that, adding another column to the fact table would be relatively trivial. I know it seems like I'm ignoring the advice I've asked for but after stepping back a little it seems that I have a bit more flexibility than I thought on how to do things.

    We don't actually have to provide the reason dimension at all. The spec is a bit vague about what is actually needed and in how much detail. If I go down the flags route, I can meet the spec without exposing too much to the third party.

    I may well have under-stated the fact table a little too. It is essentially the fact table on which all of the third party's invoices are based. Changing the granularity of this will be a major PITA.

    Thanks for your advice and you can certainly reserve the right to say 'I told you so' when this comes back to haunt me later.


    On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.
    —Charles Babbage, Passages from the Life of a Philosopher

    How to post a question to get the most help http://www.sqlservercentral.com/articles/Best+Practices/61537

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply