Well, the PASS Summit has come and gone and it was *awesome*. What a difference from my first Summit experience, when I went to every session I could and then went back to the hotel. This time around there was something going on every evening and I made the most of it. I think the fact that I slept for 14 hours when I got home on Saturday speaks volumes.
Besides sessions and networking, I also managed to pass 2 of my certification exams last week. That leaves the much-dreaded 70-463 exam. Dun-Dun-Duuunnnn! So it’s time to start preparing for that exam, and as I go along I’ll post here so you can study along with me. Ready?
Exam 70-463 focuses on implementing a data warehouse in SQL Server. Obviously this is a pretty big topic that goes way beyond even what the exam covers, but let’s set the foundation with some fundamentals. Most of us are very familiar with OLTP database design. It’s highly normalized and great for line of business (LOB) applications that are primarily using lots of small transactions of very short duration. The problem with a normalized design is that it’s not so great for analytical reporting. First of all, normalization means joining many tables which leads to poorer query performance and also makes for much more complex queries. The other problem is that all of your LOB data reside in different databases, so comparing or aggregating data between different systems is difficult. To get around this, we build a data warehouse where all of our disparate OLTP data can be merged for analysis and historical reporting. What makes the data warehouse so great for this type of reporting is its fundamental design, and that’s what we’re going to cover in this post.
There are two main types of tables in the data warehouse environment: fact tables and dimension tables.
- Fact – Fact tables store quantitative information for analysis or reporting. An example of this would be a sales table like the FactInternetSales table in the AdventureWorksDW2012 database. Every row in the FactInternetSales table records a sale that happened. Customer A purchased Product B on date C, and so on. However, your typical sales fact table wouldn’t contain the actual customer information. Take a look at FactInternetSales and you won’t see customer data. Rather it stores a foreign key reference to the customer data. And that’s where dimension tables come in.
- Dimension – A dimension table contains descriptive attributes that can be used to filter, group, and label facts in the fact table. They provide context. For example, the FactInternetSales table mentioned earlier would likely be joined to a Customers dimension table, as well as a Products dimension. The more dimensions a fact table is associated with, the more ways you can slice the data. This is referred to as dimensionality or granularity.
It’s the logical design or schema of the data warehouse, or how those fact and dimension tables are laid out and connected to each other, that makes the data warehouse so great for reporting. And your data warehouse will typically use one of two schemas: the Star schema and the Snowflake schema.
- Star schema – in a Star schema, a central fact table is surrounded by a single layer of dimension tables. One Star schema will cover one business area in the DW, but you can (and probably will) have multiple Star schemas in your DW. These schemas should be connected by sharing dimension tables (shared dimensions). If the various Star schemas in your DW don’t share dimension tables, you lose the connection between fact tables, making it difficult, if not impossible, to compare data between different fact tables.
- Snowflake schema – a Snowflake schema will start out as a Star schema, but then some of the dimensions are normalized, creating the snowflake-like pattern. An advantage of the Snowflake schema is that is can make hierarchies more apparent. For example, a Product dimension might be normalized to join to a ProductSubCategory dimension, and that might in turn be joined to a ProductCategory dimension. In general, Snowflake schemas are discouraged for performance (more joins) and maintainability reasons.
Today we covered the two primary types of table found in the data warehouse, the fact table that holds your quantitative data and the dimension table that holds your qualitative, contextual data. We also reviewed the two most common schema designs, the Star schema and the Snowflake schema. Next time we’ll talk more about dimension tables, their columns, and how to handle updates.