Questions about Fact Table structure

  • @ronkyle...

    Apologies for the off-topic question but...

    Since you're a Guru that's been working with DWs and actually teach it, I'm just curious as someone who doesn't work with DW's and am, in no way, aiming at anyone to have a fight about it one way or the other.  I've seen arguments from all three sides but most don't have the DW experience you have.

    Here's the question burning in my mind...

    What does your "DateDim" table look like (if you have one)?  More specifically, I'm interested in the PK for the table... Is it an INT ID that has no relation to a date, is it an INT representation of a date in YYYYMMDD format, or is it a temporal data type such as DATE?  I'd also be interested in any reasoning for your choice.

    And, yeah... if you'd rather, I can move this question to a private PM.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • @jeff Moden.  It's all  but impossible to imagine a data warehouse without a date dimension.  Not impossible though.

    As you point out, the PK for a date dimension is not uncontroversial.  Kimball recommended an auto integer field in his early work.  In his later works (maybe even his second edition book but certainly in his third edition) he is okay with an integer that is in a YYYYMMDD format.  Now that there is a date (only) data type that is only 3 bytes many opt for that.

    That said, I personally use an auto incrementing small integer.  I try to make Day 1 fall in a year on which Jan 1 is a Sunday.  I shave every byte I can from the row size.  Especially in my accumulating snapshot table where there are many dates (with the date dimension serving as a role playing dimension) every bit helps.  This has the downside that unless I start in the negative range that there are only some 90 years of dates available.  So for some warehouses this might not be an option.  But if on the off chance someone is still using these decades after I am gone, they just have to change the data type to an integer if they don't want to purge the old data.  I'm assuming there's every chance that things would be so different by then that the work done now won't be relevant.  But if someone needs a historical load going well into the past they would want to use an integer.

    A small int or int value helps with determining the time between dates.  It is simply a subtraction issue that can easily be done in MDX.  Math on YYYYMMDD does not work when passing boundaries.  As for using a date, fact tables should be human-meaningless integers that point to dimensions, metrics, and maybe a few text fields that are needed to support a drill through action (eg purchase order number) and facilitate the identification of the row.  Dates are out of place.  It's not hard to make the join.

    Again, this is not uncontroversial.  But this solution works very well for me.

  • Well done and very much appreciated, Ron.  Thank you for taking the time to "splain the gazintas" for me.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 3 posts - 16 through 17 (of 17 total)

You must be logged in to reply to this topic. Login to reply