The HUGE announcement at Microsoft Build yesterday was Microsoft Fabric (see Introducing Microsoft Fabric: Data analytics for the era of AI), now available in public preview. I have been at Microsoft for nearly nine years, and this is easily the biggest data-related announcement since I have been here. Satya Nadella, Microsoft’s CEO, even said Microsoft Fabric is “The biggest launch of a Microsoft data product since the launch of SQL Server”. I will introduce Microsoft Fabric in this blog post, and then follow-up with other blog posts that will go into more detail on specific features.
The best way to understand Microsoft Fabric is to think of it as an enhancement to Power BI that adds SaaS versions of many Microsoft analytical products to the Power BI workspace, now called a Fabric workspace. Those products include Azure Synapse Analytics, Azure Data Factory, Azure Data Explorer, and Power BI. Do not think of it as a new version of Azure Synapse Analytics. SaaS versions of all these products now become available to all Power BI users via the Fabric workspace, making it much easier for business end-users to get insights into their data and not have to wait until IT builds a solution for them. Synapse workspaces (old) are how you are managing your services, Fabric workspaces (new) are how you are managing your content.
But Fabric is not just for departmental use. IT will also use it to build enterprise solutions, providing one place for everyone to build solutions. This means you won’t have to decide between using Synapse or Fabric. Fabric is going to run your entire data estate: departmental projects as well as the largest data warehouse, data lakehouse and data science projects.
When using Microsoft Fabric, you won’t even realize you are using Azure. There are no more subscriptions, creating storage, or up-front time filling out various configuration properties to get a resource created. You want a data lakehouse? Simply enter the lakehouse name and in a few seconds you will have it – no other info needs to be specified. There are minimal knobs, and Fabric is auto-optimized and auto-integrated with a centralized administration for everything.
Power BI capacities become Fabric capacities and all the compute you require is pulled from the Fabric capacities as needed in a serverless fashion. That means no more serverless pools, dedicated pools, DWU’s, or Spark clusters. Everything has become simplified.
All data that is stored within Fabric is in delta lake format. Since delta lake is open sourced, that means anything you create in Fabric can be used outside of Fabric by any product that can read from a delta lake (which is nearly all products). For example, you can use Databricks to access data created in Fabric. Use whatever compute is easiest and/or cheapest. This deep commitment to a common open data format means that customers need to load the data into the lake only once and all the workloads can operate on the same data, without having to separately ingest it.
Even data for a warehouse is stored in delta format. There is no more relational storage. Fabric has fully embraced the data lakehouse concept.
On the home screen of Fabric, you will be asked to choose a persona:
From then on, the Fabric workspace will be customized to the persona chosen. For example, if Synapse Data Engineering is chosen, the main screen will contain the options for creating a lakehouse, notebook, or spark job definition.
The various items you can create in Fabric are listed below:
There is also a very impressive new feature called OneLake, which is a single SaaS lake for the whole organization. There is no need for you to create this data lake as it is provisioned automatically with your tenant. When you create a workspace, a folder is created in OneLake storage (ADLS Gen2 behind the scenes) on your customer tenant. All workloads automatically store their data in OneLake workspace folders in delta format. Think of it as a OneDrive for data. Even better, you can create shortcuts within OneLake that point to other data locations, such as ADLS Gen2 or even AWS S3 and Google Storage (coming soon). A shortcut is nothing more than a symbolic link which points from one data location to another, just like you create shortcuts in Windows. The data will appear in the shortcut location as if it were physically there. Your OneLake becomes a logical container that can point to many physical containers, so you can think of it is an abstraction layer or a virtualization layer. So you can use your existing data lakes within Fabric.
By adopting OneLake as the store and delta as the common format for all workloads, Microsoft offers customers a data stack that’s unified at the most fundamental level. Customers do not need to maintain different copies of data for databases, data lakes, data warehousing, business intelligence, or real-time analytics. Instead, a single copy of the data in OneLake can directly power all the workloads.
If you already have Power BI, you can try Microsoft Fabric today by having your Power BI admin turning it on via the admin portal in your Power BI tenant. There is a free trial period that last until Fabric is GA’d and then is extended another 60 days:
If you do not have Power BI, you can sign up for the Microsoft Fabric free trial.
I know this announcement will lead to a lot of questions, and I will be posting blogs over the next few months that will hopefully answer most of those questions. In the meantime, please post your most pressing questions in the comment section below and I will answer them directly or with a blog post.
And one more thing: coming soon is Copilot in Fabric. You can use conversational language to create dataflows and data pipelines, generate code and entire functions, build machine learning models, or visualize results. Check it out: video!