Data Fabric defined


Another buzzword that you may have been hearing a lot about lately is Data Fabric. In short, a data fabric is a single environment consisting of a unified architecture with services and technologies running on it that architecture that helps a company manage their data. It enables accessing, ingesting, integrating, and sharing data in a environment where the data can be batched or streamed and be in the cloud or on-prem. The ultimate goal of data fabric is to use all your data to gain better insights into your company and make better business decisions.  If you are thinking this sounds a lot like a modern data warehouse that I posted a video on recently at Modern Data Warehouse explained, well, I would argue it basically is the same thing except a data fabric expands on that architecture. A data fabric includes building blocks such as data pipeline, data access, data lake, data store, data policy, ingestion framework, and data visualization. These building blocks would be used to build platforms or “products” such as a client data integration platform, data hub, governance framework, and a global semantic layer, giving you centralized governance and standardization. Ideally the building blocks could be use by other solutions outside of the data fabric. At EY, my new place of employment, we are building a data fabric that will be the subject of a future blog post.

You may now be thinking how does a data fabric compare to a data mesh? (If you are not familiar with a data mesh, check out my blog Data Mesh defined). A data fabric and a data mesh both provide an architecture to access data across multiple technologies and platforms, but a data fabric is technology-centric, while a data mesh focuses on organizational change. Another difference is a data mesh is decentralized (or “distributed”) where each of the sets of data is a domain (treated like a product) that is kept within each of the various organizations within a company , whereas in a data fabric all the data is brought into a centralized location. I need to point out here that this is my interpretation of a data fabric compared to a data mesh and you will find many who have variations of my view, and some that can be very different. In fact, two companies can have very different technology solutions for a data fabric or a data mesh that can both be correct as what is correct is the best solution based on your company’s data (size, speed, and type), security policies, skillset, performance requirements, and monetary constraints.

Fundamentally, the data fabric is about collecting data and making it available via purposed built APIs (optionally also via direct connection to the data stores for those tools that don’t support APIs). The data mesh involves building data products via copying data into specific datasets for specific use-cases but built by the dept/domain who keeps and owns the data.

As an example, say I want a dashboard that measures sales vs inventory. In the data fabric world I would ingest the data in the sales system and well as the data in the inventory system in a central location, then I would build an API that joins them together and expose that to the dashboard. Data fabrics are more about technical data integration and don’t really dictate who does it or who owns the data. In the data mesh world I would get the sales team to copy data from the sales system to a sales product dataset and the inventory management team to copy data from the inventory system to an inventory dataset and get the dashboard owner to build a joined table that the dashboard uses.

In summary, a data mesh is more about people and process than architecture, while a data fabric is an architectural approach that tackles the complexity of data and metadata in a smart way that works well together.

In the future the technology used to build a data mesh could look very different than the technology used to build a data fabric, but for now some of the technology needed to build a true data mesh does not exist, so the result is a built data mesh may look more like a data fabric. If you still find this all confusing, you are not alone! Please share your thoughts by entering a comment below.

More info:

Data Virtualization in the Context of the Data Mesh

Disambiguation of Data Mesh, Fabric, Centric, Driven, and Everything

The Role of the Data Fabric In Your Target State Architecture

Catalog & Cocktails #32: Is Your Data Fabric a Mesh?

Catalog and Cocktails #44: Why it’s time to mesh with your data architecture

What is a Data Fabric? | Talend

The post Data Fabric defined first appeared on James Serra's Blog.

Original post (opens in new tab)
View comments in original post (opens in new tab)