Alternatiives to Junk Dimensions

  • Hello SSC,

    I am not sure if I posted this in the right spot, but I have a data warehouse question...

    I have a star schema with a "junk" dimension. I have never encountered junk dimensions before and from what I understand, they are simply dimensions that have no correlation to the fact table, but they are used for reporting. I have been tasked to come up with an alternative to this "junk dimension."

    My first instinct is to change this to a "factless fact table" and only have the surrogate keys of the values and fetch the data for reporting joining to the dimension tables. My second instinct would be to just add it to the fact table, but that could widen the table. I am considering going with my first instinct, but if SSC has any guidance on this, it would be greatly appreciated!

    Thanks,

    Dave

    The are no problems, only solutions. --John Lennon

  • It would really depend on the specific use-case. Can you provide some examples of the contents of your junk dimension?

  • Full disclosure, I am going on a face to face interview and this is a question that was asked in the phone screen. I would say that a factless fact table with just surogate keys back to the dimension  tables would work, but I don't think that is the answer they are looking for.

    I am not sure of the data, so I am trying to use logic here, but it's challenging without context.

    • This reply was modified 4 days, 6 hours ago by  Lord Slaagh.
    • This reply was modified 4 days, 6 hours ago by  Lord Slaagh.

    The are no problems, only solutions. --John Lennon

  • A factless fact table would probably not be a feasible solution, because the purpose of the junk dimension is to take degenerate dimension values with low cardinality and place it in a dimension.

    I guess you could have a junk dimension with key-value pairs that replace what would have been different individual dimensions, but adding another fact table wouldn't necessarily be better than just adding the degenerate values into an existing fact.

    To your point, I guess you can also split the fact table and end up with a factless fact...but it would be very dependent on the situation.

    I find myself in situations periodically where I have a dimension and fact table at the same grain, for example an insurance claim dimension where all the descriptive attributes are stored, and a similar fact table with all of the measures. Not necessarily great from a cardinality perspective, but I guess you can call that a junk dimension if we're going to get nitpicky about it 🙂

  • Are you sure they said "junk" and not "jump"?  There is good use for tables that only contain IDs from the Fact Tables and the Dimension tables.  In relational databases, some refer to the as "bridge" or "bridging" tables but a common name for them is "Jump" tables.

    For example... a single customer may have 5 addresses.  It would not be uncommon for someone to have no rows containing ANY information about addresses in the customer table itself.  Instead, the relationship would appear as 5 rows in a table named something like Customer_Address and would contain the same ID of the customer in 5 rows and each row would have the unique ID for one of the 5 addresses of the customer.

    The reason they're sometimes referred to as "Jump" tables is because they help you "jump" to other tables without the data being listed in either of the two "outer" tables.  Same goes for the name of "bridge" table... helps you form a "bridge" from one table to a related table.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • It's "junk dimension" 🙂

    Bridge/Helper tables are also commonly used in dimensional modeling to address M:M relationships, but not the same as a "junk dimension".

  • Never mind about me thinking they were talking about "Jump" tables.  I'm not a DW kind of person.

    I asked Perplexity.AI the following...

    What is a "junk" dimension in a data warehouse? What and how is it used?

    It provided a rather interesting summary answer.  In fact, I've never heard of a "junk" dimension and it's explanation of what, how to use, and how to create them (100,000 ft level... needs extra study for details) is down-right fascinating.  It's a really good summary and the tool also provides links for where it gets the information.

    As a bit of a sidebar, I'm VERY wary of AI in general but it makes a really, really good first search for articles to read as well as some pretty nicely worded sentences.  The early days had me laughing with how stupid it could actually be.

     

     

     

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Right on 🙂

    I don't particularly like the term "junk dimension", but that's how Kimball coined it back in the day. Here's the official reference(s) for anyone stumbling across this thread:

    https://www.kimballgroup.com/data-warehouse-business-intelligence-resources/kimball-techniques/dimensional-modeling-techniques/junk-dimension/

    https://www.kimballgroup.com/2009/06/design-tip-113-creating-using-and-maintaining-junk-dimensions/

     

  • Thanks for the links, Martin.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply