Proper DW Fact and Dimension design questions

  • Sorry, I'm not great at Data Warehouse design. =(  I'm modeling a bakery for a friend of mine. He purchases ingredients and the prices for them may change over time. (Flour prices are crazy).  The design for that part of the problem is kinda like this:

    BakeryProduct --- (1,m)---Recipe(BaleryProductKey, IngredientKey, Weight)--(M,1)--Purchase (IngredientKey, DateKey, VendorKey, Units, UnitPrice)---(M,1)--Ingredient(IngredientKey,  UnitWeight, IngredientName....)

    Given that he bakes every week, and may purchase ingredients every week, how do I deal with the price changes? Do I expire an IngredientKey (validFromDate, validToDate)?  The part I don't totally understand is how I would relate the Recipe(BakeryProductKey, IngredientKey, Weight)  All the ingredients are just weighed... much simpler that way!

    How do I handle ingredient price changes? Do I have to generate a new ProductKey with the same alternate key to indicate a new "recipe" (well, the prices of the ingredients are different). Gotta read Data Warehouse Toolkit chapter, I'm sure, but is there a good tutorial on this part of the problem somewhere? His sale prices are pretty constant unless ingredient prices go completely haywire, but the price of flour (for example) has increased by 50% or more in the past 18 months... and given that bread is primarily flour, that's a significant cost increase. Articles and/or Data Warehouse Toolkit chapters or Star Schema: Complete Reference chapters are all fine.

    Don't mean to be lazy, I just don't totally get how to model the primary keys etc and how to expire old prices without having to update the recipe fact for each Bakery Product.

    Thanks!

    Pieter

  • Thanks for posting your issue and hopefully someone will answer soon.

    This is an automated bump to increase visibility of your question.

  • I think your Ingredient table containing the price should be set up as a slowly changing dimension (even if it changes fast). That way the Recipe fact doesn't have to change, but you obtain the price using a combination of IngredientKey and date.

  • Okay, I think I get it. Thanks, Chris!

    So I would have an Ingredient table (IngredientKey, IngredientSK, IngredientName, Category, UnitPrice, ActiveDate) and then use FILTER() and/or TOPN() to get the most recent active Ingredient value relative to "Manufacture" date of Product.

    I know how to get the latest price for a product using T-SQL ( CROSS APPLY and TOP), but how do I do it in DAX? (Wait, I'm pretty sure I've read/seen a video/article where Alberto Ferrari shows how to do it)... I'd just do that and use FILTER(Purchases) instead of just Purchases to return the latest value before a given manufacture date. Right?

  • pietlinden wrote:

    So I would have an Ingredient table (IngredientKey, IngredientSK, IngredientName, Category, UnitPrice, ActiveDate) and then use FILTER() and/or TOPN() to get the most recent active Ingredient value relative to "Manufacture" date of Product.

    Yes, that's the gist of what you need to do. I'm not familiar with DAX syntax, so I can't be more specific.

  • No problem. Just looking for general directions. I think I can figure the rest out myself. (Well, I hope I can!)

  • Hi Chris, bit late to the party on this, but price is an attribute of the ingredient.

    DimIngredient would have a surrogate key as well as a general business key to uniquely identify it at a given point in time. Include the price and a start and end date for that price.

    FactOrders can then be loaded by multiplying your quantity by the price at the time of the fact and tagging the appropriate surrogate key to the fact.

    FactSales stays as it is.

    This also allows the bakery to account for stock on whatever accounting basis they want if they want to use the DW for drawing up FIFO and LIFO inventory valuations for accountancy purposes.

    For price increases, you would need to factor in the timing - how much flour in stock at the old price is still being used to make current products versus when the new, more expensive ingredients are being used. Either that or simply pass the price increases on as soon as the price goes up and reductions when the price goes down, assuming a fairly constant level of stock. That's a business decision, but with your DW, you'd be able to help improve that decision making process. 🙂

  • pietlinden wrote:

    Sorry, I'm not great at Data Warehouse design. =(  I'm modeling a bakery for a friend of mine. He purchases ingredients and the prices for them may change over time. (Flour prices are crazy).  The design for that part of the problem is kinda like this:

    BakeryProduct --- (1,m)---Recipe(BaleryProductKey, IngredientKey, Weight)--(M,1)--Purchase (IngredientKey, DateKey, VendorKey, Units, UnitPrice)---(M,1)--Ingredient(IngredientKey,  UnitWeight, IngredientName....)

    Given that he bakes every week, and may purchase ingredients every week, how do I deal with the price changes? Do I expire an IngredientKey (validFromDate, validToDate)?  The part I don't totally understand is how I would relate the Recipe(BakeryProductKey, IngredientKey, Weight)  All the ingredients are just weighed... much simpler that way!

    How do I handle ingredient price changes? Do I have to generate a new ProductKey with the same alternate key to indicate a new "recipe" (well, the prices of the ingredients are different). Gotta read Data Warehouse Toolkit chapter, I'm sure, but is there a good tutorial on this part of the problem somewhere? His sale prices are pretty constant unless ingredient prices go completely haywire, but the price of flour (for example) has increased by 50% or more in the past 18 months... and given that bread is primarily flour, that's a significant cost increase. Articles and/or Data Warehouse Toolkit chapters or Star Schema: Complete Reference chapters are all fine.

    Don't mean to be lazy, I just don't totally get how to model the primary keys etc and how to expire old prices without having to update the recipe fact for each Bakery Product.

    Thanks! Pieter

    None of this is going to work unless the "baker" makes an entry for every "withdrawal" of ingredients that identifies which "lot" each ingredient came from and what the exact quantity was.  That also means marking the inventory with lot numbers, etc, if you want any precision at all.

    I'm not sure that's worth the effort.  Might be better to just keep track of the purchases over time and generate a rolling average for each month, or similar.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I think this may be destined to fail. Convincing him to do simple things like separating his personal and business purchases into separate receipts from Costco is a challenge. Hate to say it, but I'm not sure about this whole thing. At this point, I'm pretty sure it's little more than an intellectual exercise. =(

  • Sorry, been doing other things for a while... =)

    That's kinda what I was thinking. Using FILTER() to return a table of Ingredient Price records that are "active" for the given context. Something like

    ActiveIngredientPriceList = FILTER ( Ingredient,

    [EffectiveStartDate] <= MAX('BakeList'[ManufactureDate]) &&

    [EffectiveEndDate] >MAX('BakeList'[ManufactureDate])

    )

    Then I'd join to that table instead of the whole Ingredient table when evaluating the ingredient costs of an item as of a given date.

    Right?

    thanks for reading!

    Pieter

  • Say I have a slowly changing dimension for my Ingredients

    (IngredientKey, IngredientAK, Name, UnitPrice, WeightInKg, Weight, StartDate, EndDate)

    The part that's messing me up is that I can't reallly join RecipeIngredients(ProductKey, IngredientKey, Weight) to the Ingredient SCD. So do I create a calculated table for that part? (because at any given point in time, each Ingredient only has one price). I guess I use something like

    FILTER('scDimIngredient', [StartDate]< MAX('Sales'[SaleDate] && [EndDate]> MAX('Sales'[SaleDate] ))

    (All of this worked fine until I started along the path of Ingredient is a Slowly Changing Dimension! -- up to there it works fine... with each "product" (loaf) having a recipe and a yield and all that.)  Can I join to something like a CALCULATETABLE() expression?

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply