Design Question - Late arriving part of a fact

  • Hi,

    My query is based around designing a method of handling the following problem, and wondered if anybody had any similar experiences.

    I have an Invoiced Sales fact table populating each night, the granularity is per invoice line item and everything is going great :-).  I have been asked to add Cost of Goods to this table in order to calculate the contribution, this isn't a problem as the granularity is correct for this need.

    During my interviews, however,  it has come to light that the business uplifts the costs of goods sold by using a percentage based calculation - this may differ each month.  This uplift is not available at the time the invoiced sales data is populated, so the structure of the fact table I have in mind is to extend the invoiced sales fact table with the CostOfGoods and the UpliftFactor (as per below) and then populate the uplift as and when it becomes available (Speaking to the business, this will probably have to be done manually via some kind of gui). 

    Access to the 'Fact' tables would be controlled via two SQL views for 'Invoiced Sales' and 'Contribution', where contribution data is only available where the uplift factor is not 0 or I could expose all the data and show where the contribution uplift had not yet been applied.  These items are so closely related it makes logical sense to me to approach it in this manner, I just wondered if anybody had suggestions for alternative approaches?

    Dim1,
    Dim2,
    Dim3,
    SalesAmt,
    CostOfGoods, - additional field
    UpliftFactor (default 0) - additional field

  • henryKrinkle - Tuesday, January 24, 2017 6:20 AM

    Hi,

    My query is based around designing a method of handling the following problem, and wondered if anybody had any similar experiences.

    I have an Invoiced Sales fact table populating each night, the granularity is per invoice line item and everything is going great :-).  I have been asked to add Cost of Goods to this table in order to calculate the contribution, this isn't a problem as the granularity is correct for this need.

    During my interviews, however,  it has come to light that the business uplifts the costs of goods sold by using a percentage based calculation - this may differ each month.  This uplift is not available at the time the invoiced sales data is populated, so the structure of the fact table I have in mind is to extend the invoiced sales fact table with the CostOfGoods and the UpliftFactor (as per below) and then populate the uplift as and when it becomes available (Speaking to the business, this will probably have to be done manually via some kind of gui). 

    Access to the 'Fact' tables would be controlled via two SQL views for 'Invoiced Sales' and 'Contribution', where contribution data is only available where the uplift factor is not 0 or I could expose all the data and show where the contribution uplift had not yet been applied.  These items are so closely related it makes logical sense to me to approach it in this manner, I just wondered if anybody had suggestions for alternative approaches?

    Dim1,
    Dim2,
    Dim3,
    SalesAmt,
    CostOfGoods, - additional field
    UpliftFactor (default 0) - additional field

    I have not ran into similar, but I think the logic is sound. I would even go as far as to store the fields you need to another table like a lookup if you have primary keys on the data. That way you can keep your metrics needed to calculate the late metric later and just delete those records from the lookup when that data finally arrives. Then obviously you can reapply it back to the main fact.

    This is assuming you are saying you will have all the dims with the sales amount and the cost of goods, then you need other information to calculate the uplift later on down the road. If the uplift factor is not being used, you can just toss that into a new table with the primary key and cost of goods to calculate the uplift data comes available, then update the existing fact and delete the record from the secondary table.

  • Hi, 
    I guess your parameters (cost of goods and uplift factor) are related to a product or articles, which I can call dim1
    I would try to handle these attributes as slowly changing attributes in the dim1. Then you can include them in the calculations with the SalesAmt if they are already available.
    When busisness deliver the values you will need to update the dimension table and not the fact table ( I always avoid updates in a transactional fact table).
    The only heavy reason to keep the values in the fact table (CostOfGoods, UpliftFactor) is, if you want to run aggregation over their values, let's suppose average of cost of goods and so on, but I guess this is not the case and business most probably want to calculate the margins and revenues. 

    I hope this helps,

    Kind regards,
    Paul

    Paul Hernández
  • Hi,

    Thank you both for your feedback.  It has been very useful.  I did consider an SCD in the product dimension but after digging deeper, it isn't data that is suited to a product. 

    The business have stated they wanted the facts delivering daily, to match with invoiced sales, so through demonstrating to them how the current process will mean latency in the delivery of the facts, they have agreed that I should calculate this on a weighted average basis, meaning I can now include it in my invoiced sales fact without too much bother.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply