DataMart Design for Newbie

  • Hi All.

    I am designing first datamart/cube for our company and I would like to get some suggestions...

    Our first mart/cube is very simple: tracking sales and returns.

    My Fact table will have following:

    Sales fact table:

    - Transaction date/time

    - Sales Qty

    - Sales Amount

    Return fact table:

    - Transaction date/time

    - Return Qty

    - Return Amount

    My Dimensions are

    - Customer

    - Product

    - Employee

    - Shipment

    - Promotion

    We have multiple branch sales offices, and our management wants to track sales agent and branch performance. I was going to create branch dimension and create relationship to employee. However, I really don't have any branch attributes other than Branch Name. In this case, would it be okay to store branch information with Employee dimension or store it to Fact table?

  • Why have two different fact tables? Any specific reason?

    Also, I do not see any reference of Dimension Key in your fact tables. It is like the fact tables are isolated.

    Raunak J

  • Thanks Raunak Jhawar. I didn't list any reference between fact table and dimension. As for the two different fact table, I think you are right :). So my revision for DM is follow:

    Transaction Fact table

    -Transaction Date

    - Transaction Amount

    - Transaction Qty

    - Customer Key

    - Product Key

    - Employee Key

    - Shipment Key

    - Promotion Key

    Dimension:

    - Customer

    - Product

    - Employee

    - Shipment

    - Promotion

  • sdhan79 (3/3/2011)


    Thanks Raunak Jhawar. I didn't list any reference between fact table and dimension. As for the two different fact table, I think you are right :). So my revision for DM is follow:

    Transaction Fact table

    -Transaction Date

    - Transaction Amount

    - Transaction Qty

    - Customer Key

    - Product Key

    - Employee Key

    - Shipment Key

    - Promotion Key

    Dimension:

    - Customer

    - Product

    - Employee

    - Shipment

    - Promotion

    You almost got it!!

    You have a date dimension missing, Customer, product, employee, shipment should be 4 different dimensions. Promotion cannot be a dimension. It should be a junk dimension.

    I suggest you read Kimball DW reference book for better understanding of the subject.

    Raunak J

  • Thank you for your input sir!

  • sdhan79 (3/3/2011)


    Thank you for your input sir!

    Just to add a level of complexity - real world is a complex environment and systems should reflect such a complexity; think about how are you going to handle changes on your dimensions.

    A fact of life is that dimensions change, let's use "DIM_Customers" to visualize it. Over time particular customers will evolve, change like moving to a new address, etc. This doesn't happen every day so we call them Slowly Changing Dimensions.

    Problem is, you have to decide how to track such changes.

    Option #1 is: you do not track changes then each particular DIM_Customers row will show the latest valid information.

    Option #2 is: you do track changes so you know how a particular customer looked like at any point in time.

    Why you want to track changes?

    Because sooner or later a business intelligence report will show market share decrease in Columbus, OH and market share increase in Mobile, AL just becasue a customer happened to move from Columbus to Mobile. Good reporting should show activity in Columbus when customer was there and activity for the same customer in Mobile after the move. I do like to treat these cases as SCD of the type III, plenty of documentation on the net.

    Hope this helps.

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • sdhan79 (3/2/2011)


    We have multiple branch sales offices, and our management wants to track sales agent and branch performance. I was going to create branch dimension and create relationship to employee. However, I really don't have any branch attributes other than Branch Name. In this case, would it be okay to store branch information with Employee dimension or store it to Fact table?

    I'd still break it out into it's own dimension...won't add a lot of overhead and will keep your design flexible. Address, city, state, region, etc etc.

    definitely read Kimball. In fact, its probably time I give it another read through just to brush up. Really awesome book.

  • Raunak Jhawar (3/3/2011)


    sdhan79 (3/3/2011)


    Promotion cannot be a dimension. It should be a junk dimension.

    But what if they have different types of promotions and would like to analyze the effectiveness? Would that be a case where it was justified to break out promotion into its own dimension?

  • My two cents: I completely agree with @wta306. I would also add that there might not actually be a real relationship between a branch office and an employee. I assume here that the employees that you are interested are salespeople. A salesperson might have a Branch Office Attribute but what happens if that salesperson one day splits sales credit with a salesperson from another branch office. You will have a sales fact (probably two, one for each salesperson receiving sales credit) keyed to branch office A even though you also have a salesperson key where the salesperson "belongs" to branch office B. Where would you report the sale? Branch office A or Branch office B? The answer of course depends on how the branch office is keyed in the fact and what the business would like to do with it.

    Scenario 1: Business says, "Split the sales credit and report 50% of the sale in Branch Office A and 50% in Branch Office B." In this scenario, if Branch Office were an attribute of customer and not its own dimension, you should be fine. You would be equally fine having two separate dimensions.

    Scenario 2: Business says, "100% of the sales credit should be reported in Branch Office A because that is where the customer is." In this scenario, Branch Office must be its own dimension.

    Additional Complexity: What happens if the business chooses "Scenario 1" and then later changes its mind?

    Repeat after me: "The business does not know what it wants." Users WILL see the information produced in the cube and realize that they now have new questions that they did not think about before. It is your job as an dimensional modeler to anticipate future needs/requests.

    If you think that I am adding undue complexity to the matter and that your client/employeer will never do it this way, I would reply: 'Repeat after me: "The business does not know what it wants."'

    Again, just my two cents.

    Chris Umbaugh

    Chris Umbaugh
    Data Warehouse / Business Intelligence Consultant
    twitter @ToledoSQL

  • For a REALLY good primer on data warehouse design for absolute beginners, check out David Stein's Data Warehousing week (Spring 2011) at SQL University[/url] . He does a great job covering everything you need.

    =============================================================
    /* Backups are worthless, Restores are priceless */

    Get your learn on at SQL University!
    Follow me on Twitter | Connect on LinkedIn
    My blog: http://sqlchicken.com
    My book: Pro Server 2008 Policy-Based Management

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply