Welcome!

SOA & WOA Authors: Mark O'Neill, Yeshim Deniz, Keith Swenson, Jacques Durand, David Strom

Related Topics: SOA & WOA

SOA & WOA: Article

Master Data Management

Integrated information is not complete information

Business and government agencies execute better when they make decisions based on complete and accurate information. However, amassing flawless information becomes more challenging as entities evolve. As organizations expand into multiple geographies, their information assets become fragmented across database instances managed by a variety of service providers.

Decision makers rely on such information assets to produce actionable insights, which require integrated information about, for example, citizens for government, patients for healthcare providers, and customers for retail organizations. However, providing these decision makers with just integrated information doesn't automatically translate into offering complete, relevant information. The information that's shared across the integration channels invariably references a set of key business information, including customers, product, assets, and facilities that are important business objects to all of the participants in the integration. If these key business objects, called master data, are inappropriately identified or interpreted, then the integration of this information will be unsuccessful.

To solve the challenge of accurately identifying and interpreting master data, organizations are required to use secure platforms to develop, deploy, and manage single-view Master Data (MDM) of composite applications using a service-oriented architecture (SOA) approach. Such platforms will preserve investments in existing applications while enhancing, aggregating, and leveraging the associated data to provide a single "best" entity view.

Problems of Master Data
When a company has ineffective and disjointed views of master data, they lose potential revenues and they end up spending time on activities that have negative repercussions on their growth. When customer data is stored in disparate applications without a single view of the customer, companies cannot take full advantage of customer opportunities. Data management, data quality, and access policies driven by regulatory and corporate requirements become increasingly difficult to enforce and monitor. Without consistent and complete policy enforcement and monitoring, the company is exposed to extensive liability and their credibility is undermined.

The result of ineffective data management can have far-reaching consequences. A company's reputation for customer service becomes tarnished if records are lost or distorted. Customers could get the wrong service or product, reducing their trust in the company. Productivity would decrease because employees spend too much time looking for information or simply end up with the wrong information. Marketing and promotional expenditure could target the wrong customer segments and thus not produce the expected results.

Benefits of Master Data Management
Increased visibility of master data, greater data quality, and improved policy enforcement results in better business decisions and execution. The ability to share views of master data can be extended across the firewall to partners and other trusted collaborators thus drastically improving the overall customer experience and ability to cross-sell partner products at various points of customer interactions. Superior multi-channel service will improve customer satisfaction levels and strengthen customer loyalty.

Benefits from MDM include improved visibility into customer behavior leading to better decision making about product promotions and pricing. Innovative business models and go-to-market models can be evolved to increase revenue as well as optimize returns from corporate expenditure. Improvements in supply chain management can be realized as a result of improved insight into supplier relationships. And the use of MDM will ensure the success of SOA projects enterprise-wide and help deliver expected business benefits. Harrod's department store recently implemented MDM successfully, and says that "the single customer view database has become the backbone of our CRM analytics because the quality of the data and information is much, much better."

Master Data Management & the Single View
Definitions
Master data or reference data are key core entities vital to businesses or organizations. Examples of master data are customers, citizens, employees, accounts, locations, and products. Master data are often non-transactional in nature. For example, in a banking scenario, master data may contain customer personal information such as name, address, and phone numbers but not his or her credit card transactions.

Master Data Management involves a set of technology, tools, work flows, and processes that maintains and presents a consistent unified view of master data out of (often inconsistent) data fragments held in various applications in the organization.

MDM Activities
MDM activities generally fall into the six areas shown in Figure 1.

Profiling
The process of cleansing and resolving conflicting data starts with analyzing the incoming information, using statistical analysis tools, to evaluate the degree of cleanliness and uncover the peculiarities of the information. The cleanliness and peculiarities are identified by source systems and fields, allowing the users to trace quality problems to their source and resolve them either by correcting application behavior at the source or by implementing new logic to translate or amend as necessary (or by the activity of normalization). In master data projects, data profiling is typically performed at the time of the creation and loading of the index and then again periodically to help enforce data quality governance.

Normalization
The term normalization refers to the clean up and conflict resolution of different attributes of information so that the attributes conform to governance standards. These standards could cover many different aspects of the attributes, from low-level aspects such as geo-coding, capitalization, and phone number encoding, to higher-level aspects, such as enforcing industry standards for measures, industry standardization coding, or business rules about the inter-relationships of fields. Normalization procedures identify these situations and either emend them or reject the transaction containing the violations and execute escalation procedures.

Data Stewardship
A key component of an MDM solution is a facility for data stewards to model, review, and maintain the master data. The data steward designs the index model, deciding what entities, relationships, and attributes should be stored in the model and which external entities should be pointed to by the index. This model then becomes the "single version of the truth" all master data is mapped to and becomes the backplane against which the mashup facility can deliver relevant views (really cross-sections of the total index) to the participating applications. Thus, the index delivers the "single source" aspect of MDM and, when combined with the "matching and de-duplication" features described below, the index delivers the "uniqueness" aspect of MDM.

Because of the uniqueness aspect of the index, potential duplicate management is one of the primary functions of data stewardship. A steward needs a quick way to review and compare the potentially duplicate records reported by the system and merge those that belong to the same entity. The steward must also be able to undo merges done manually as well as those created by an automatic match. Another key function is the ability to review changes to the master data. This includes the ability to find the master record of interest using various search criteria, and the ability to review the changes to the master data over time. An auditing facility is also required to track who accessed what record at what time. All of these features need to be combined with a strong composite application layer, so the activities can be combined with related activities the data steward might do as a part of his job responsibilities.

Matching
Matching refers to establishing links between records that aren't connected by some common identifier, but in reality represent the same physical entity. Typically, matching is done statistically; a probabilistic algorithm calculates two types of conditional probabilities for each of the attributes relevant in the match, describing a given record by using an optimization approach. Then it measures a match weight as the ratio of those two probabilities. Next, it calculates a composite weight by summing up all the individual field weights, given that the different records' fields are statistically independent.

Configuration of the matching process requires establishment of potential duplicate thresholds. In simple terms, the distribution of weights generated by the cross-comparison of two sets can be looked at as two separate groups (the true matches and the true non-matches) with a third group of hard-to-resolve weights that fit in a fuzzy area between the two groups and that we call potential duplicates. The third group is simply the result of our lack of certain knowledge of the true matches and non-matches. These weights generally need manual intervention, an activity of the data steward.

With the matching capability, the MDM can deliver to the index and its users certainty that the information that it offers about any master data (customer, part, etc.) represents the fullest and best available; there are no stores of more information about the entity that are unrepresented in this view.

De-duplication
This refers to correcting the fragmentation that occurs when two records are accidentally created for the same real entity. Once the matching activity described above completes, the MDM layer can inform the participating applications that the entities should be merged in their stores as well. The MDM layer will communicate this merge in electronic transactions for those applications that can receive such a transaction or by executing a business process or workflow for those that cannot.

Enterprise Data Mashup
Enterprise data mashup can be depicted as a virtual aggregation of heterogeneous sources, inside and outside the enterprise firewall, into one unique integrated space offered as a service through standard semantics (WSDL, REST, SOAP, etc.). It relies on exposed APIs (public interfaces), Web Services, enterprise search engines, and others to bring together different types of data with diverse formats, enrich them by carrying out configurable modifications, and finally expose them under one integrated virtual view.

In MDM, the data mashup engine serves to create a CRUD (create, update, delete) layer over the index and the participating applications. This layer can then be sliced and diced into various views representing a logical business activity relevant for a particular job role. For example, an MDM implementation of a single view of a customer could include a data mashup the customer that delivers a "CRM" view of the customer, a "marketing" view of the customer, a "finance" view of the customer, and so on. In each case, the attributes included would be relevant to the job profile both in terms of job responsibility and security level.

Implementing MDM
Setup
Master Data Management creation starts with the object model definition. Early on, the implementers will decide how comprehensive the data model should be. In one extreme, the master data would be a superset of all the fields in all participating applications. In the other extreme, the master data could be the minimal set of common fields necessary to uniquely identify the master data entity. An organization may chose an implementation model anywhere between these two extremes depending on the organization's requirements and participating applications.

The next step is to extract the relevant data from the source applications and map them into the master data model. Since the data volume could be quite large, a bulk extract, transform, and load (ETL) capability is often needed.

Next, data profiling tools can help identify these data discrepancies by identifying possible values, patterns, and frequencies for any fields in the data model. The data profile can then be used by a data cleansing tool to map and transform the source data elements into a standard format expected by the master data model.

After the standardization phase, the records from the various source applications need to be matched and de-duplicated as described above. Finally, the pre-matched image is loaded into the MDM repository and the system is ready to use.

Operations
Once the master data repository is established, the next phase would be to leverage this information for the benefit of master data consumers as well as master data providers in the organization. Consumers will need access to services that can provide the comprehensive single view of master data entities. The originating source applications can benefit from the cleansed, de-duplicated, and most complete information through synchronization with the master data repository. A SOA platform is ideal for exposing the MDM services to data consumers. A robust integration infrastructure is necessary to enable synchronization between participating applications. Business process capability is useful to define workflows and implement a governance framework. All these components are key ingredients in a complete MDM solution.

Example MDM Scenarios
Virtually all businesses and organizations can benefit by applying MDM practices, if not MDM technology. Any company that has a presence on the Web for people to enter information must deal with the risk that the information is entered incorrectly or incompletely; and if it wants to learn about the user (and it's likely it will if it has a Web site to begin with), it will need to be able to address the errors or incompleteness to tie all interactions together. Here are some particular vertical industries and their specialized MDM requirements:

  • Healthcare: Healthcare applications of MDM typically involve aggregating and indexing information about patients and providers that exists in the diverse and complicated landscape of applications that hospitals, insurance companies, and physicians depend on. Users rarely have the luxury of needing information only from one of any of these systems, so Master Data Management helps link the information about individual patients or providers (or other entities) together in a common framework and organizes this information into views that can be secured, personalized, and easily presented to the consumers (doctors, nurses, partners, etc.) in a manageable fashion.
  • Telecommunications: MDM again provides the layer that can link information across operational and business branches and the "views" that can be personalized, aggregated, and secured so that new applications can benefit from information across the landscape.
    With comprehensive MDM views, telecom service providers can create better self-service Web sites that give subscribers more control of their services; they can create new mobile applications that integrate advanced communications features with over-the-top Web Services like calendar and weather. And of course, with better visibility into their customers' information, they can make better business decisions and offer better service.
  • Government: MDM is a part of many "Single War Fighter View" projects where all the benefits and services provided by the military to a war fighter could be provided to the soldier or sailor depending on rank, years of service, and base location. These benefits could include medical benefits, insurance claims, housing benefits/allowance, financial/retirement benefits, e-mail/calendar, and so on.

MDM - Crossing the Firewall
We've talked about how the single view concept is so important in allowing an organization to present a complete and accurate view of key information to the right users. What if the organization can execute even more effectively if this single view comprises the organization's complete record and that of the organization's trusted partners? A travel agency, for example, might want to create an application to allow administrative assistants at a customer to track over the Web not just the itineraries of an executive's trip, but whether the traveler has checked into the hotel. This application would require not just a view of what the travel agency knows (the traveler's itinerary), but also a view on what the traveler agency's partner (the hotel) knows. Another example could be an online retailer that would like to show more accurate information on the status of a customer's shipments by extending the view on the customer's orders to include a view in the logistics providers' applications. These examples illustrate business models that are more effective if partners are engaged in automation in a close, but trusted and secure, manner.

Adding Trusted Service Providers (Partners) to the Single View
The key to extending the MDM single views over the Internet is to implement a trusted exchange with partners using open standards that protect the exchange from intrusion and preserve identity, authentication, and authorization across the firewall. The best way to do this is to make these exchanges in accordance with the SAML standard.

SAML, an official OASIS industry standard, is an XML standard for exchanging authentication and authorization data between security systems. It specifies the conversations between three parties:

  • User Agent: A user application
  • Service Provider: A provider of an arbitrary business service
  • Identity Provider: An authority on identities and privileges

With MDM, the user agent maps to the enterprise data mashup; the mashup engine is the actor that requires a service (some information about a master entity) from a service provider (presumably a business partner of the company hosting the MDM), and the identity provider is a authoritative source that "asserts" the privileges of the user agent to the service provider, and is typically a federated access management product that allows companies to implement SAML easily.

Federated Data Management User Case Scenario
Consider the travel agent example defined above. Put in the context of an MDM application with federated access management capabilities, the scenario involves the following activities:

  • User Agent: The mashup engine, actually acting as multiple user agents corresponding to the different views offered: one might be the traveler's view, another might be the hotel's view, and a third could be the view of the traveler's administrative assistant.
  • Service Provider: The reservation system of a partner of a travel agency, hotel, car, air travel, etc.
  • Identity Provider: The travel agency's store of identities and authorization, presumably equipped with federated access management

To walk through an example, let's consider the administrative assistant (AA) in the travel agent scenario. In many companies, the AA often makes the arrangements for the traveler, if the traveler is senior enough. So the AA has certain privileges in the travel agency landscape of records pertaining to the traveler. And it would make sense for the travel agency to build a particular view on the traveler's records using index and the data mashup capabilities. Furthermore, it would make sense for all parties for the AA to have limited visibility into the hotel and airline systems because the AA would be able to track the traveler's movements in real-time and because the travel companies (agencies and providers) would want to provide the most up-to-date information to the traveler's designated contacts.

The mechanics of this scenario are shown in Figure 2. The "AA" view of the traveler is provided by the mashup engine, which works with the index to map out where the information needed to fulfill the view is. Most of the information will be found in its internal systems, but some will come from partners. Thus, the mashup engine will interact with the access management layer to provide the identity information about the particular AA for each session, and then the AM layer will populate the appropriate SAML assertions in the requests to the partner applications.

Conclusion
Master Data Management is but one of many components of a successful and adaptable IT infrastructure. Like its many antecedents (EII, Data Quality, Data Federation, etc.), it's a technology that must be matched with good practices and cooperation of multiple parties to fully deliver on the promise of better execution. However, the maturity of the MDM products, the capabilities of today's advanced SOA technology, and the availability of proven security standards like SAML have shown that MDM is an investment with a high and reliable return on investment.

Sun's MDM offering, the Sun MDM Suite is a leading edge MDM solution backed by the first and largest MDM open source community, Mural. For more information about the product or the open source community, see Sun MDM and Mural Community.

More Stories By Sofiane Ouaguenouni

Sofiane Ouaguenouni is an IT Architect at Sun Microsystems. He is an expert in Master Data Management technologies, predominantly data quality tools and algorithmic, such as matching and standardization. He holds a PhD in computational physics, and was previously a research scientist at Caltech.

More Stories By David K. Codelli

David Codelli is Group Manager of Segment Marketing at Sun Microsystems. He holds a bachelor's degree in Information and Computer Science from Georgia Tech. A 15-year veteran of business integration, David integrated systems and applications in health care, financial services, and manufacturing before becoming product manager of the flagship eGate Integrator product from SeeBeyond Technologies Corporation, which was later acquired by Sun Microsystems. He now manages the marketing of Sun's Master Data Management products.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.