James is a big data and data warehousing technology specialist at Microsoft. He is a thought leader in the use and application of Big Data technologies, including MPP solutions involving hybrid technologies of relational data, Hadoop, and private and public cloud. Previously he was an independent consultant working as a Data Warehouse/Business Intelligence architect and developer. He is a prior SQL Server MVP with over 30 years of IT experience. James is a popular blogger (JamesSerra.com) and speaker, having presented at dozens of PASS events including the PASS Business Analytics conference and the PASS Summit. He is the author of the book “Reporting with Microsoft SQL Server 2012”. He received a Bachelor of Science degree in Computer Engineering from the University of Nevada-Las Vegas.
Junk dimensions are dimensions that contain miscellaneous data such as flags and indicators. When designing a data warehouse, you might come across a source system that has a bunch of yes/no indicator fields. If those fields needs to be tracked in a fact table, the result could be many small dimension tables (each with just a few rows) along with much more information stored in the fact table, causing performance issues.
Instead, use a junk dimension that holds all the unique combinations of those indicator fields into a single dimension and assigns a unique key. This key is what is stored in the fact table. So you will have only one additional dimension table and will reduce the number of fields in the fact table. A key consideration when forming junk dimensions is how many combinations exist. If the number of combinations is too high the junk dimensions size may be unmanageable, in which case you might want to have more than one junk dimension.
Kimball Design Tip #48: De-Clutter With Junk (Dimensions)
Design Tip #113 Creating, Using, and Maintaining Junk Dimensions
Data Warehousing: Junk Dimensions
Mystery or Junk data warehouse dimensions