The Common Data Model (CDM) is a shared data model that is a place to keep all common data to be shared between applications and data sources. Another way to think of it is is a way to organize data from many sources that are in different formats into a standard structure.
The Common Data Model includes over 340 standardized, extensible data schemas that Microsoft and its partners have published. This collection of predefined schemas includes entities, attributes, semantic metadata, and relationships. The schemas represent commonly used concepts and activities, such as Account and Campaign, to simplify the creation, aggregation, and analysis of data. Having the metadata stored along with the data is very helpful when using a data lake which normally does not have metadata due to the schema-on-read nature. This graphic shows some elements of the standard entities (for more information see Common Data Model repository on GitHub):
Use the Visual Entity Navigator for interactively exploring entities, entity extensions/inheritance, attributes, and relationships.
Industry Accelerators are foundational components within the Microsoft Power platform and Dynamics 365 that enable ISVs and other solution providers to quickly build industry vertical solutions. The accelerators extend the Common Data Model to include new entities to support a data schema for concepts within specific industries. Think of the time savings if you are in one of these industries and need to create a data model. Instead of spending weeks or months creating one from scratch, you have one already built for you. Microsoft is focused on delivering accelerators for these industries and others:
- Education including Higher Ed and K-12
- Automotive (Private Preview)
- Financial Services (Private Preview)
With Power BI Dataflows, the common data model stores the data into Azure Data Lake Storage (ADLS) Gen2, either internal storage provided by Power BI or stored in your organization’s ADLS Gen2 account (see Dataflows and Azure Data Lake integration (Preview)). Dataflow can map the output of a query to an entity in the common data model. This feature is handled with the “Map to Standard” option in the Dataflow Power Query Editor (see CDM and Azure Data Services Integration). Note that healthcare is currently the only industry accelerator supported for Power BI Dataflows.
The following diagram shows a sample CDM folder in ADLS Gen2, created by a Power BI Dataflow, that contains three entities:
Still to be determined is how exactly CDM fits into a modern data warehouse solution as there are still a number of products that do not have integration points into it yet (i.e. Azure Data Factory is not yet able to write directly to CDM). However, there are manual steps you can do to integrate Azure Data Factory, Databricks, Azure Machine Learning, and SQL Data Warehouse with CDM until the products gain built-in integration (see CDM and Azure Data Services Integration).
If we think about just Power BI Dataflows, the current state is limiting as Power BI requires its own filesystem inside of ADLS Gen2, which means that sadly we can’t integrate any of it with the rest of our data lake. In turn that means it’s difficult to frame the enterprise data lake plus Power BI Dataflows, which may be CDM entities or custom entities, plus CDM from other products like Dynamics.
Also, keep in mind that a modern data warehouse also has a relational database in the solution that is fed from the data lake, so the CDM in the data lake would need to be replicated in the relational database. However, using Power BI Dataflows a business user can do the work of copying and transforming data from multiple source systems into the CDM in the data lake, instead of waiting for IT to do it (who may have a backlog and who may not understand the data).
One more thing to point out: the Open Data Initiative (ODI) is a collaboration between Microsoft, Adobe, and SAP to use the same common data models in their products.