Approach to dimension requirements and design

  • We're building up a new data warehouse, driven by the requirements our BA has collected from our customers. I've built up a prototype star schema, and I want to make sure the dimensions are as rich and useful as possible. What are some approaches for having useful discussions with BA's and customers regarding dimension attributes and possibilities for adding value? I'm thinking of things such as:

    • Adding attributes that may be useful

    • Determining which attributes need history tracked (SCD Type)

    • Date dimension attribute formats

    • Cleanup rules for "messy" source data

    • Replacement mappings or other data quality improvements

    The customers and BA are not experienced with data warehouses, so they won't ask for any of these things unless we offer them. Do you typically put an initial data model out there and wait for enhancement requests to come in through usage? Or do you take a more proactive approach to designing rich dimensions at the start? If so, what are some tips and tricks for leading discussions at that level of detail?

  • This is just my two cents.

    I would start off by asking the end users what questions they would like to ask of the data. This will give you a good idea of how to answer those questions with the model. It's really that simple.

    The issue you may run into is trying to help them realize those questions. I typically coach them with some example questions I would ask of the data. Things like, "What is my total revenue by city and state?"

    That question means I need to at least have a fact table that has revenue data by two attributes: city and state.

    With that, put away all the semantics of what you need to make that happen. Don't get hung up on the cleaning rules, the refresh rates, visualization and whatnot. Just get to the root of what answers you need to provide with the data. Get that on paper somewhere first, then go out and see what you can and can't answer. If you at least do that, regardless if you can or can't answer everything, you at least have an idea of what you need to do to make a gameplan.

    Without it, you're shooting blindly in the night. You'll end up with a data warehouse that has little to no value.

  • Hi,

    I would try to define first the most important thing, that is, the granularity and type of fact tables. Then the same for the dimensions, the grain and basic attributes. Then you will need to check that the business key you identify are strong enough and test the quality of the source data. Sometimes you need to go back and forward and redefine some tables.
    Of course as xsevensinzx said, the business questions are the boss end should drive the design. Usually is the case that business people don't know exactly what they want and then ask for everything, creating a hugh overhead in the project.
    I will try to read some literature about your specific domain, i.e., supply chain management, sales, finance, retail, healthcare, etc. Then I will organize some sessions with business users to know how they work and how the data is collected in the source systems (get their point of view).
    I would try to make a kind of agile development and try to fast prototype some solutions, including reports or tabular cubes to show business why they can obtain from the data.
    I would even try to create some query/reports/tabular cubes over the source data in the pre staging or staging area to get a flavour before deciding complicated aspects like SCD types or clean-up mechanism. 

    Best regards,
    Paul

    Paul Hernández

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply