• Chriz L. (12/1/2012)


    I found the solution and I'm posting it in case someone else have the same problem.

    The answer is to create relational tables as following, instead of having one table with all the data in it.

    Table Sales (SalesID, SellerTypeID, ShopLocationID, ItemCategoryID,ItemConditionID, Sold, Price, ShippingCost)

    Table SellerType (SellerTypeID,SellerTypeDescription)

    Table ShopLocation (ShopLocationID,ShopLocationDescription)

    Table ItemCondition (ItemConditionID, ItemConditionDescription)

    Table ItemCategory (ItemCategoryID, ItemCategoryDescription)

    Regards 🙂

    I have to say that "It Depends". What you have above will work fine until a shop changes location (and that does happen) and then you'll have "lost" historical data. Descriptions can also change at the drop of a hat. The only way you could preserve the historical nature of the information is if you instantiate something like "Type 2 Slowly Changing Dimensions" in the "finger tables" of this "Star Schema".

    You're also missing some key information in your "hub" table (even without the "finger" tables). You're missing the date of the sale which is why the other people asked you to post the CREATE TABLE statement for your table if you had one. They wanted to check on all the stuff you might be missing.

    To wit, this hasn't answered your question of "is it possible to have data mining on a single table with a large amount of data", at all, because you still have a large, single table as a hub and you'll need to join it to several other tables and, to keep the historical information intact, you'll also need to instantiate the Type 2 SCD's I previously spoke of.

    Sales tables, like this one, are historical archives. There's very few reasonable explanations as to why any normalization, like you're done above, should even be attempted. All it does is severely complicate things and, without the Type 2 SCD's (one of the few reasons to try such normalization on such a static reporting table), it won't allow you to mine for things like "did sales go up or down when we changed the name of an item".

    To answer your question, "It Depends". Do you want to be able to mine for things like if description changes have affected sales? If so, this is going to get a whole lot more complicated. If not, then go back to your flat historical table and save yourself the trouble. Don't forget to add a date. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)