• RonKyle (8/25/2016)


    Well, reporting only on what is used is the same as using independent dimensions again.

    No, it's not. Chances are independent dimensions when the items would be an attribute of product would be a design error. This is an ETL choice. But as you point out that it's important to know what hasn't moved, then why are you asking if you should load the unsized products even if it will double the size? It sounds like you know you have to load them. If in the dimension I described I had to report on the 75% unused combinations, there would have been no choice but to load them.

    I don't see how including the unsized doubles the data in any case. I would see it something like

    Shirt XS

    Shirt S

    Shirt M

    Shirt L

    Shirt XL

    Shirt US

    In this case, the unsized adds one more row to the 5 already there. I'm sure you're real life situation is not a simplistic, but the idea likely pertains.

    Sorry, I misunderstood. You were referring to adding unsized to only products that have unsized via the ETL and not just adding it to all products. I thought you were saying that you could just build the dimension based on what's used in the fact, i.e.: only include attributes that have sold in the dimension.

    That would be the smart bet, but if you do have product colors, it wouldn't just be 1 new record for the product, if you have 5 colors, it would be 5 new records at max for the product. That's when you want to ensure you are only adding what has changed per the ETL (as Ron mentioned). Thus, if you have 10,000 products with 10 colors, then you're talking at max, 100,000 new records added to the DimProduct table over time (the amount that unsized could increase if all products and colors have it over time). Not exactly double per his comments, but a lot more records if unsized happens for all 10,000 products and 10 color combinations.