I used a newer version (v3.4) of the model tool that was released with the book called: The Microsoft Data Warehouse Toolkit (MDWT) that was released in 2006. Recently, they updated the book and associated tools to support SQL Server 2008 R2. The MDWT model tool is based on Microsoft Excel and is specifically designed to support Kimball dimensional data warehouse modeling on the SQL Server platform.
I have used a few different data modeling tools over the years and the one I have the most experience with is Embarcadero's ER/Studio which has modes that support both classic entity relationship modeling and dimensional modeling (although it does not line up perfectly with Kimball's dimensional modeling methodology). There were no data modeling tools available in the organization outside of what comes with Visual Studio Database Edition (which is not a modeling tool in my opinion) so I decided to proceed with the MDWT model tool when the Data Warehouse project was approved.
Because the business used SAP BW and Business Objects in addition to SQL Server 2008, I had to create custom solutions around the metadata and front office integration so the Kimball MDWT model tool would still be effective. The metadata generation was done in the model tool as intended by the Kimball authors, but the metadata report portion had to support the Business Objects and SSRS environments.
I added code to the build that created a special report database which was used for creating Business Objects universes and also the complete data warehouse data model diagram.
I also made some minor modifications to the model tool code (it uses Excel VBA) in order to support putting the indexes and fact tables on their own filegroup. This change was not mandatory as it is possible to move the objects around after database creation, but it is easier to create them in the desired filegroup in the generated DDL file. My main point here is that the code inside the MDWT model tool is accessible if you need to perform modifications to suit your needs.
I used partial SQL code Snippets edited directly in the Excel cell of the model tool to overcome a couple of issues.
Creating calculated columns is not directly supported by the model tool. The work a round I came up with is to put the calculated column definition in the datatype column and end the statement with a single line comment (--) so the following NULL statement auto generated in the DDL will be ignored during processing.
Calculated column example using the MDWT model tool "datatype" column:
as datediff(dd, CalendarDate, getdate()) --
The model tool does not support empty string ('') as a column default value because Excel treats the first single quote character in a cell as defining the cell to be a TEXT value. The work a round for this problem was to add the DEFAULT statement to the model tools "size" column.
Empty string column default example for a varchar(255) attribute using the MDWT model tool "size" column:
255) DEFAULT (''
Using the tool
Using the MDWT model tool is simple if you have experience with designing Kimball dimensional models on SQL Server. As the core design process in data warehousing, the most difficult part of the modeling process is getting adequate requirements and understanding of the business process, data sources and data processing to enable a decision on how the dimensional model should be comprised. To capture these requirements, I typically work with the analysts to devise an excel based draft model file called a model map ("model map file" is a term I use and not a standard Kimball method name). The model map has the detailed source to target information and also the dimensional target table information such as column name, description, SCD type and so on. The model map file main worksheet has the table information in a form that is very similar to the MDWT model tool so it can be copied to the Model tool and then easily edited to fit the model needs.
The main advantage of the MDWT model tool is to allow for quick model generation that includes the critical metadata into a build so the database can be validated by the analysts, ETL, Report and DBA teams. I was able to perform multiple builds each day if needed and would typically produce a version release to the team every 1-7 days.
Typical MDWT model tool workflow
- Perform data profiling, requirements analysis, source system SME /customer interviews etc.
- Attend model map meetings, finalize tables for current release and refine tables for next release
- Integrate completed model map tables into the MDWT model tool
- Integrate changes from bug fixes and change requests (e.g. data type, spelling, new columns etc.)
- Perform a model build to validate the updated model (PKs, FKs, metadata format etc.)
- Perform an integration build to validate the model against the ETL and FrontOffice systems.
- Fix any issues until a clean integration test is returned.
- Generate release artifacts (version description, release notes, model diagrams etc.)
- Perform a formal release of the updated model.
- Until the final release is completed, Go to step 1
I have found the key to making any model tool workflow efficient, is to have an automated build and integrated test system. This allows you to get the basic changes into the model and then kick off a test run that validates the database creation, schema, special data loads (historical data, static tables etc.) and performs as much of the initial data load and ETL processing as is practical for the project conditions. I have been running builds and integration tests using SQLCMD since 2007 in conjunction with other model tools and have found the system worked well with the MDWT model tool.
User view creation
The system design called for end users to have access to an end user schema called Rpt and access the data schema (dbo) through views. There were two types of views, core views which are simple views over the base data warehouse tables, and dimensional role views which are views over the core views with names changed to accommodate the role as appropriate.
For example, the date dimension table was dbo.DimDate that had a column named CalendarDate.
The end users accessed this table through the core view Rpt.DimDate that had a column named calendarDate.
The FactSales table had several foreign keys to date dimension table which were accessed by end users through dimensional role views as shown below:
Rpt.DimOrderDate that had a column named OrderCalendarDate.
Rpt.DimInvoiceDate that had a column named InvoiceCalendarDate.
Although dimensional roles are core to Kimball dimensional modeling, I have not found a tool that handles generating these role views gracefully and I have used various methods to create them over the years. For this project I wrote a script in the build that automatically generated them based on the FK names which only is an option if you have a consistent naming convention. Since The MDWT model tool generates consistent foreign key names this worked well.
One of the toughest challenges to creating a really good data warehouse is having comprehensive metadata that allows users, administrators and maintainers to find useful information in the data warehouse and understand the different aspects of the information.
The MDWT model tool is designed to output metadata that integrates with a set of Kimball developed Reporting Services reports. The reports provide a way to navigate through the metadata generated in the model tool. The MDWT metadata is extensible by simply adding additional cells to the appropriate area in the model tool you can add your own metadata information. For this particular project I had multiple data sources for several of the tables and was able to alter the model tool to record and output metadata source to target information for three data sources.
The MDWT model tool uses SQL Server extended properties to propagate metadata into the data warehouse. This works fine as long as users have direct access to the same schema as the extended properties were created in which was not the case for this project. I tried to use the various SQL Server features to allow users with different privileges to access the extended properties, only to find that you must be owner of the schema the extended properties were created in to access them from a different schema. As a work around, I simply created a metadata table and loaded in the metadata information from the extended properties during the build which solved the problem.
Supporting Business Objects
Putting the complete dimensional data warehouse model in a Business Objects universe with all of the tables, including the dimensional roles with foreign keys to the fact tables is a labor intensive manual process. Since there is no easy way to script the process, we decided to automate as much as possible.
Relationships between tables in the universe can be auto-mapped based on names, but this does not work as it creates extra invalid relationships between fact tables. The manual effort to clean these extra relationships up negated much of the value of the solution.
The way we overcame this issue is to create a special report build database that converted the dimensional roles to tables and had all the dimensional role foreign keys so it was easy to load the correct relationships into the universe. This required extra effort because the model tool only outputs the physical schema in the form of SQL DDL code. Since SQL Server does not allow foreign keys to views, there is no direct way to create a link between a dimensional role view and the fact table key. To overcome this issue, I used the schema views in the core database created by the build to generate the special report database. A second stage of the build would kick off after the initial database build completed and would generate the required tables and foreign keys in the special report database. This option was only made possible by consistently following a standard naming convention for the database objects.
This allows the Business Object developer to quickly build universes by connecting to the special report build database and using the foreign keys to create the object links in the universe. A special metadata extract file was created during the build that allowed quick uploading of table and column definitions to the Business Objects universe.
The model tool does not support drawing a complete data model diagram for the entire warehouse, but does create simple star schema diagram for each fact table.
Two of the fact tables had a large number of dimensions which caused the model tool to skew the dimension lines off center on this particular project. I am sure this could have been fixed by modifying the model tools Excel VBA code, but that was not needed in this case because each model version release included a complete data warehouse model diagram in Visio which makes it possible to create a a diagram for any combination of tables with little effort.
To create the main model diagram, I used Visio to reverse engineer the special database that was created for providing Business Objects with all tables mapped by the appropriate foreign keys which solved the issue (The marts database would not have worked because the role dimensions were implemented as views which means the foreign keys were mapped to the core dimensions).
One of the common issues I have had with modeling tools is the difficulty in managing database features across different environments (e.g. filegroup mappings to the various LUNs / Disk drives).
Another common difficulty when using a model tool is trying to implement partitioning, filtered indexes and the other newer database features. These types of features seem to be difficult for designers to integrate into IDE or Modeling tools which explains why data modelers always add custom process steps to accommodate them.
The MDWT system did as well as other model tools I have used in this regard.
The Kimball MDWT model tool performed better than I expected, especially considering the changes required to accommodate the Business Objects system. Since the Kimball MDWT model tool only generates a DDL file, the upstream and downstream model processing system design decisions are easy because there are no exotic tool features to consider. This tool simplicity means the modeler needs to know the database system well in order to be productive. I personally think this is an important attribute for a modeler to have regardless of the tools being used anyway. Since SQL Server has a native language and supports configurable variables that easily perform many the advanced implementation steps, the lack of advanced modeling tool features would only matter in edge cases where there is a standard process that involves a specific tool or special artifacts that must be created.
I believe the Kimball MDWT model tool is adequate for most SQL Server data warehouse projects, especially where there is not a data model tool standard already in place. I like the Kimball MDWT model tool and would definitely consider using it for a future SQL Server data warehouse implementation.