SQL Clone
SQLServerCentral is supported by Redgate
Log in  ::  Register  ::  Not logged in

Master Data Management (MDM) Hub Architecture

The Master Data Management (MDM) hub is a database with the software to manage the master data that is stored in the database and keep it synchronized with the transactional systems that use the master data.  There are three basic styles of architecture used for Master Data Management hubs: the registry, the repository, and the hybrid approach.

Repository (also called Enterprise or Centralized or Transactional) – The complete collection of master data for an enterprise is stored in a single database, including all the attributes required by all the applications that use the master data.  The applications that consume, create, or maintain master data are all modified to use the master data in the hub, instead of the master data previously maintained in the application database, making the master data hub the system of entry (SOE) as well as the system of record (SOR).  So if you have a CRM application, it would be modified to use the customer table in the master data hub instead of its own customer table (either by accessing the data directly in the hub or by the data in the hub being transferred to the source).  Some of the benefits:

  • There are no issues with keeping multiple versions of the same customer record in multiple applications synchronized, because all the applications use the same record
  • There is less chance of duplicate records because there is only one set of data, so duplicates are relatively easy to detect

But there are some issues to consider:

  • It’s not always easy or even possible to change your existing applications to use the new master data (i.e. you are using an off-the-shelf product that does not have a feature to use or import data from another source)
  • Coming up with a data model that includes all the necessary data, without it being so large that it’s impossible to use (i.e. you have multiple applications that require different address formats)
  • What to do with data elements that are not used by all applications (i.e. a customer added by an order-entry application would likely have significantly fewer attributes than a customer added by the CRM application)
  • It can be extremely expensive and take a long time to implement, because it requires changes to the applications that maintain and consume the master data
  • You will need to transform and load all the current databases into the hub, removing duplicates in the process.
  • You will need to figure out how to handle history since you are changing your databases to use a new key for all you master data so you have to deal with many years of history that was created using different keys for the master data.

Registry (also called Federated) – The opposite of the repository approach, as each source system remains in control of its own data and remains the system of entry, so none of the master data records are stored in the MDM hub.  All source system data records are mapped in the master data registry, making the master data registry the system of record (a virtual master data system).  Data maps show the relationship between the keys of the different source systems (i.e. one row in a table for each master data entity and columns for the keys of the application systems).  For example, if there are records for a particular customer in the CRM, Order Entry, and Customer Service databases, the MDM hub would contain a mapping of the keys for these three records to a common key.  Benefit: Because each application maintains its own data, the changes to application code to implement this model are usually minimal, and current application users generally do not need to be aware of the MDM system.  Downside: Every query against MDM data is a distributed query across all the entries for the desired data in all the application databases. Plus, adding an application to the MDM hub means adding columns to the key-matching table, which is not a big issue, but it may also mean changing queries to include the new source of information.  Finally, while it helps you find duplicates, it does not help you in cleaning them up (i.e. if a person has many records with different phone numbers, there is not a way to determine the one to use)

Hybrid -  Includes features of both the repository and registry models: Leaves the master data records in the application databases and maintains keys in the MDM hub, as the registry model does. But it also replicates the most important attributes for each master entity in the MDM hub, so that a significant number of MDM queries can be satisfied directly from the hub database, and only queries that reference less-common attributes have to reference the application database.  Issue:  Must deal with synchronization, update conflicts and replication-latency issues.

There are other variations to the above three basic styles: A “Data aggregation implementation” that involves the creation of a new system that is neither the system of entry nor the system of record but a downstream system used to aggregate the attributes of multiple systems and pass them to lower level subscribing systems; a “System-of-record-only implementation” or “Hub based implementation” where the master data hub is the system of record, but the system of entry remains the sources systems and new records are transferred to the master data hub and any discrepancies in the data defer to the master data hub.  Data flow is bidirectional as new records in the master data hub are pushed into the source systems.  And optionally new records could be added into the master data hub, making it very similar to the Repository method (with the only difference being the master data hub is not the only system of entry).

More info:

Master Data Management (MDM) Hub Architecture

Choosing MDM Hub styles

Master Data Management: Bringing Order to Chaos

Master data registry implementation

An introduction to the Master Data Management Reference Architecture

MDM Data Hub Styles Part 4: Which one is the best to do MDM?

Understanding Master Data Management (MDM)

Sage Consulting

Master Data Management Implementation Styles

Kimball University: Pick the Right Approach to MDM

James Serra's Blog

James is a big data and data warehousing technology specialist at Microsoft. He is a thought leader in the use and application of Big Data technologies, including MPP solutions involving hybrid technologies of relational data, Hadoop, and private and public cloud. Previously he was an independent consultant working as a Data Warehouse/Business Intelligence architect and developer. He is a prior SQL Server MVP with over 30 years of IT experience. James is a popular blogger (JamesSerra.com) and speaker, having presented at dozens of PASS events including the PASS Business Analytics conference and the PASS Summit. He is the author of the book “Reporting with Microsoft SQL Server 2012”. He received a Bachelor of Science degree in Computer Engineering from the University of Nevada-Las Vegas.


Leave a comment on the original post [www.jamesserra.com, opens in a new window]

Loading comments...