One of the biggest announcements at Microsoft Ignite that seemed to be overlooked by a lot of people was Azure Synapse Analytics database templates, now in public preview. I wanted to dive into it a bit in this blog because I feel this is an exciting new feature that will be used by a lot of companies.
Basically, database templates are a set of industry-specific database templates that are integrated into Synapse Studio at no additional cost. The database templates are actually common data models (see my blog Common Data Model), and an earlier version of this feature was called Synapse CDM. They were also part of a product in preview called Industry Data Workbench that was merged into Synapse. The idea is that instead of creating a data model from scratch, which can take weeks if not months, you have pre-built data models you can use instead (if you are in an industry that is currently supported). In addition to the time savings, you will have a model that is very well thought-out and tested so you won’t have to worry that it is deficient like you would if you created your own model from scratch. This greatly helps to solve the challenge of bringing in all your data from various similar sources into a standardized format to more easily analyze the data.
Within Synapse there is a new database designer that gives you the ability to create and modify a database model using a database template. And you have the option to create a new database model from scratch or add tables from an existing data lake.
The model will be stored in a lake database in Azure Synapse Analytics. The lake database brings together database design, meta information about the data that is stored, and a possibility to describe how and where the data should be stored. Lake databases use a data lake on an Azure Storage account to store the data of the database. The data can be stored in Parquet or CSV format and different settings can be used to optimize the storage.
The database templates started with six industries, and they have already added five more industries (see New Azure Synapse database templates in public preview):
I expect many more database templates to be added in the near future as Microsoft already has 75 industry vertical schemas that it acquired when it purchased ADRM software (press release).
To create a data model, go to the Data tab in Synapse and click “+”. Then choose “Lake database (preview)”. You have now created a lake database and can proceed to add data models to it. You do this by selecting the “Table” drop-down menu and choosing “Custom” to create a brand new model, or “From template” to create a data model using one of the industry templates. You can then select a table from the designer pane to modify it.
To map fields from the source data to the Synapse lake database tables, use the Map Data tool in Synapse as described here (it uses the mapping data flow in ADF). Hopefully in the future Microsoft will add default mapping templates for popular sources such as Salesforce and SAP. I also hope to see the ability to create the data models in a Synapse dedicated pool along with the ADF code to transfer the data from the lake database to the Synapse dedicated pool, as I can see the need to query the data in a relational database instead of a data lake (see Data Lakehouse & Synapse for the reasons querying from a relational database may be better than a data lake).
Using database templates means we will know the shape of the data which provides another benefit: we can use pre-built ML and AI models on that data. Microsoft has already provided one in the gallery under “Database templates – AI Solutions” called “Retail – Product recommendations”, which creates a Jupyter notebook with Python code for you, and I expect to see many more pre-build ML models in the near future.
For more info about Azure Synapse Analytics database templates check out the documentation.