I sometimes get asked from customers if they should use a star schema or should they use one large single table/flatfile for their dataset in Power BI. The answer is almost always star schema, and Patrick LeBlanc does a great job explaining what a star schema is and why star schemas are so important for Power BI in his video Why Power BI loves a Star Schema. I wanted to list out his four reasons:
- Usability: If you have just one large table with all the data, when creating a report it is difficult to find the fields you are looking for in the “Fields” list. With a star schema, the Fields list is much easier to navigate. Star schemas make your data model much cleaner and easier to use
- Simpler DAX: The code for DAX will be much simpler against a star schema compared to a wide flat table. It will be shorter, easier to write, easier to read, and easier to maintain
- Performance: Star schema’s can handle huge amounts of data without impacting performance. The test Patrick ran was against 7 million rows and a query took 29ms against the one large table compared to 7ms against the equivalent star schema. You can imagine the performance difference against 10’s of millions of rows
- Faster refresh: In Power BI Desktop, Patrick used Power Query against 7.8 million records from a SQL Server Database and choose Close and Apply. For the large table, this took 34 minutes (2069 rows/second). The star schema took only 4 minutes (28k rows/second)
Patrick also has a video showing how to turn a flat file into a star schema: Power BI Tutorial | From Flat File To Data Model.
And a report using a flattened table could return inaccurate numbers, as pointed out by sqlbi: The importance of star schemas in Power BI.
For more information on star schema, check out the “bible” by Ralph Kimball: The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling.