Dimensional Modeling Case Study Part 2 - Days Dimension

  • @lisa,

    My whole point is that the use of a function for such temporal range labeling is actually easier for the users to use because they don't need to us a join to a temporal range table to use it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • @jeff,

    Wait ... what?  I apologize, in that case, because all this time I thought you were arguing on the side of perf and resource usage, and all I was saying was that they weren't top priority in self-service BI scenarios.  As long as perf and resource use weren't outrageous; ease of use was most important to the success of the project.

    If you are actually saying that your recommendations promote ease of use... can you elaborate on your experiences, what types of users, what types of reporting environments, this is true for?

    My own experience is somewhat broad and deep in this area, and it suggests otherwise. But I'm eager to learn from you as always.

    I will remind you that I posited a  suite of possible metrics, which would not be simple or quick to gather, to get some harder data around relative ease of use and user preferences.  Absent actually running such a study, we have to resort to anecdotal evidence. I have lots more I can share, but am looking forward to yours.

    >L<

  • Just want to clarify the point on ease of use. Based on Kimball’s statements regarding the goals of dimensional modeling, ease of use refers to the data model’s usage from the report writer’s perspective, not necessarily the end user or self-service perspective. Let’s say we need to present a data layer to web or data visualization developers to visualize data for end users. Is it easier for report developers to consume a dimensional star schema or a presentation layer made up of stored procedures and functions?

    In my view, it’s all about balancing between the effort on ETL and the front end. Dimensional modeling will shift the balance heavily towards ETL. In my current challenge of a stored procedure-based front end, there is a lot of heavy lifting at run-time due to a lack of proper modeling, which also discourages data engineers from developing set-based queries.

  • Hi Lisa,

    The journey to my Age group dimension in the article has been quite a process. I used to rely on a hard-coded view similar to your script. My early model evolved from segmenting different groupings by group key to pivoting them into columns. Structures from various stages are still in production in previous companies.

    Only in the last couple of years have I realized that all the early versions had significant limitations, necessitating the addition of dimension keys in the fact table or involving an untidy range join in the query. With over 10 years of enriching the date dimension with various temporal features, I've come to understand that we can simply add more columns to the date dimension, enhancing its power. Why can't we apply the same principle to other value banding dimensions as long as we start from the lowest grain? This aligns precisely with Kimball's teachings on enriching dimensions by extending them horizontally, not vertically.

    Even for experienced DW practitioners, breaking out of the box and considering why construct a table with 130 rows when a table with only 5 rows could achieve the same task can be challenging. Well, a 130-row table offers unlimited extendibility without any change to the billions-row fact table. The performance overhead of a 130-row, or even a 13,000-row dimension in a join, compared to a 5-row table, is just a few milliseconds. However, compared to querying a fact table with billions of rows in a few seconds, joining 13,000-row tables takes almost the same time as a 5-row table.

    Taking a small step of breakthrough over the age group dimension is a significant leap in understanding the essence of dimensional modeling. I believe the code used to load dimensions is crucial for ETL or data engineering, and the critical aspect of the case study lies in the thought process of determining the right grain of dimension.

  • Hi Hang Liu,

    Everything you're saying makes sense to me.  Thank you.  I am 100% sure I would "graduate" from the FKs in the fact table to additional dimensional work if I was working at higher scale, in the example that I gave.

    >L<

     

  • Oops, Hang Liu, let me clarify something in turn:

    ease of use refers to the data model’s usage from the report writer’s perspective, not necessarily the end user or self-service perspective. Let’s say we need to present a data layer to web or data visualization developers to visualize data for end users. Is it easier for report developers to consume a dimensional star schema or a presentation layer made up of stored procedures and functions?

    If I have muddied the waters by using self-service as a term differently than you are using it here, I'm sorry about that.  From my POV, the report writer *is* a self service user, and when I say the end user/self-service user definitely prefers to consume a dimensional star schema, I mean that my end users are accustomed to addressing their needs by ad-hoc exploration f the data on their own.

    Whether they are using Report Builder or Power BI or Excel to do it, these are the people I am most concerned with, and I consider them to be "report developers", even though you might not, because they don't have formal training.  Nevertheless they are highly intelligent, and have an almost endless supply of questions to ask, as I said earlier <s>.

    Depending on the department or business domain, the questions might result in internal forecasts, government compliance reporting, or marketing/public relations statistics.  Usually a couple of individuals in the dept get comfortable with the schema and work on their own.  If they have a need to "promote" their results to a published verson on some schedule, or a more formal presentation,  then my self-service framework gets involved and provides the necessary exposure opportunities, with appropriate security.

    Additionally the framework provides a super-version (a "wizard") that allows questions to be asked dynamically without any interaction with the model at all, and with a very generic type of output.  At this level, the underlying code *does* use stored procedures and functions, not a dimensional model, because quite frankly it's easier for me to generate the necessary (dynamic) queries that way. The "wizard" UI provides for storage, naming, and retrieval of the user's selections, not only for the user's later convenience but also, again, for the eventual examination of these queries for promotions to more generally-available reports, where usage statistics indicate that they would be useful to the larger community.

    We agree on the main point here, and I think we're just working in different environments with different user personae.

    Again, excuse me for using a familiar term ("self-service") in an unfamiliar way.  There is a history behind my selection of this term for the framework, which is probably completely irrelevant to you, but I'll share if you are interested !

    Thanks again for all your interesting and clearly-expressed thoughts.

    >L<

     

  • Sorry Lisa, for my loose reference to ‘end user’ and self-service. It’s true that an end user of a data product can be a report writer or report reader, depending how we look at the consuming process. I am working on a data product used by hospital doctors and directors, providing self-service requires data visualisation team to leverage BI tools to provide drag-drop interface for end user. A few years ago, on top of  dimensional modeling, I also developed SSAS cube, tabular model using MDX and DAX and the dimensional model user is myself.

Viewing 7 posts - 16 through 21 (of 21 total)

You must be logged in to reply to this topic. Login to reply