Welcome!

Microservices Expo Authors: Liz McMillan, Mehdi Daoudi, Pat Romanski, Andy Thurai, Elizabeth White

Related Topics: Microservices Expo

Microservices Expo: Article

SOA Data Strategy

Vital to a successful SOA transformation

The adoption of Service Oriented Architecture (SOA) promises to further decouple monolithic applications by decomposing business functions and processes into discrete services. While this makes enterprise computing assets more accessible and reusable, SOA implementation patterns are primarily an iteration over previous application development models. Like most application development evolutions, SOA approaches inject more layers and flexibility into the application tier, but have often neglected the most fundamental building block of all applications: the underlying data.

Current Data Environment of Most IT Organizations
The condition of a typical organization's data environment is usually not where it needs to be before the organization can begin a SOA transformation - from an enterprise perspective, there's often a lack of authoritative sources and a wide array of technologies used for storing and processing data. Generally, there's no single system that offers a complete view of the organization's core business objects, since most large IT organizations have their core enterprise data spread out and replicated across multiple stove-piped systems. Each system in an enterprise often maintains data within its specific context rather than the context of the enterprise. Data quality and interoperability issues abound, especially when data-consuming systems access a variety of data-producing systems, each of which maintains an isolated view of enterprise data. These differences lead to inconsistencies and inaccurate views of the business processes. Figure 1 illustrates these data access and management challenges impacting SOA transition initiatives.

An SOA transformation amplifies and exacerbates an organization's existing data problems. Because of the integrated nature of SOA-based applications, an organization will be building on top of a very weak foundation unless it first addresses the issues with its current data environment. This is, in many ways, analogous to constructing a high-rise building on top of a landfill.

Consider the lack of authoritative enterprise sources as an illustrative example. Suppose that in an organization's supply chain systems' portfolio there are five systems that hold supplier information internally. Each of these can be considered a legitimate source of supplier data within the owning department. When building a service to share supplier data, where should the source of supplier data be?

  • One of the five current systems that have their own copy of the supplier data? If so, which one?
  • A new database that's created for this specific purpose? How does this data source relate to the existing sources?
  • Does data have to come concurrently from all of the five databases?
Each of these solutions has their pros and cons; there's no right or wrong approach. The point is that these data issues must be resolved before an implementation team can proceed. By the time the implementation team takes over and begins building the requisite services and infrastructure these kinds of questions should be answered already by the organization at the business level. Unless they are these data issues will often perpetuate and hamper the benefits of creating services that share data. In other words, a service may end up sharing an incomplete set of data or, worse, exhibit incorrect behavior because it's not working with the "right" data.

Target Vision of Data Environment in a SOA
The way an organization thinks about applications and data must evolve - it must stop thinking about data as a second-class citizen that only supports specific applications and begin to recognize data as a standalone asset that has both value and utility. Organizations should establish their data environments with "hubs of specific data families" that expose data services that comply with industry standards and service contracts. The goal is to create a set of services that becomes the authoritative way to access enterprise data. In this target service-oriented environment, applications and data work together as peers. Thus, both an organization's business functionality and data can be leveraged as enterprise assets that are reusable across multiple departments and lines of business. This target vision, illustrated in Figure 2, enables the following desired characteristics of the enterprise's data environment:

  • Single logical sources from which to get a complete view of the enterprise data objects
  • Increased awareness of the profile and characteristics of the data in the enterprise
  • Improved data quality across the enterprise
  • Enforced data standards by using a data services layer
  • Data that's clearly visible and readily accessible
  • Reduced reliance on custom interfaces and proprietary formats
  • Clearly identified authoritative data sources that are effectively used throughout the enterprise
  • Security that's "baked into" the solution, and not an afterthought
  • Data that's easily discoverable by potential consumers across the organization
SOA Data Strategy
A comprehensive strategy that defines how the enterprise's data should be managed in an SOA environment is needed to achieve the target vision. This strategy addresses issues such as data governance, data modeling from an enterprise SOA perspective, data quality, security, and technology solutions such as data services.

Data Governance
Governance is often cited as an important part of SOA. However, this generally refers to the governance of services and not the data shared through services. Just as proper governance of services is critical to an SOA, proper governance of the data is equally, if not more important. Many of the problems associated with an organization's data environment can't be solved through technology solutions alone. Decisions and policies must be issued at the organizational level that can then be implemented through the technology. For example, the absence of an enterprise data ownership concept is a classic data governance issue. Different divisions in an organization control the data within their own system boundaries. They can make changes to that data as they see fit and these changes can ripple across other divisions and ultimately impact the interoperability of the enterprise as a whole.

Without a definition of enterprise ownership and stewardship of the data controlling such changes are difficult. So an SOA data strategy should include establishing an enterprise data management function as the data governance mechanism. A centralized management function is needed to treat data as an enterprise asset instead of as the assets of individual departments. The group responsible for this function addresses data issues and establishes policies and processes that cut across multiple departments. The responsibilities of such a group should include:

  • Defining the roles and responsibilities of data producers and data consumers across the enterprise
  • Deciding the issue of data stewardship - "who's in charge of managing this particular data family?"
  • Vetting and institutionalizing logical and physical models
  • Establishing policies and compliance guidelines for adhering to data standards chosen by the enterprise
  • Mandating the use of specific schemas as the format for exchanging core enterprise data
  • Establishing processes for exceptions, changes to standards, version control of models, and changing control procedures
  • Mandating the use of specific services as authoritative sources for the data objects/families that they serve.
Enterprise Data Models
To realize the target data environment, some agreement is needed about which core data elements and structural business rules are represented by those services accessing them. While it's possible to implement services on top of the current data sources by leveraging the existing data models in those systems, this is not optimal. Such an approach will continue to proliferate non-authoritative data sources, each with its own model designed to support specific needs without enterprise-level consistency. When creating the enterprise data models, an organization must shift away from modeling the data from a systems-only perspective. In other words, the organization must look at the data families themselves and focus less on the details of the specific applications that are using them.

How does an organization make this shift? First, it must decide on its "core" data families, which are sometimes also referred to as "master" data. Core data is relatively easy to deduce, given a general understanding of the key business processes. For example, the "supplier" data family in a supply chain business could be considered core data. While it may be tempting to model every core data family in full detail, it may be wiser to identify "a good first set" and begin with that. A good approach is to simply tackle the obvious core data families first, learn from the experience, and then apply those lessons learned to model the rest of the data.

Next, the enterprise model must decide which data elements are strategic for operational, reporting, and accountability purposes, and which are relevant only to one or several subsets of the business. Common strategic data should be thought of as the subset of fields that any application that uses this data family can fully support. All the other data attributes should be considered "optional," even if they're critical to certain applications. In the supply chain example, "supplier performance" isn't part of the core data but it may be critical to one or two systems in the organization. Since the enterprise data model is applicable to the entire organization, the standard would always expect the core data to be provided. It should also give each system the flexibility to be extended with additional data that's relevant to its own purpose.

Technical Considerations
SOA implementations usually exhibit decentralized federated topologies. So the ability to merge data properly and enable authorized enterprise-wide access are necessary to ensure that information can be leveraged to support enterprise objectives. Enabling these capabilities presents numerous challenges in the areas of data quality, security, and the data services architecture. The next sections describe some of these challenges and provide recommendations for addressing them.

Data Quality
SOA initiatives often focus on the implications of connecting disparate systems. A fundamental concern of implementing such connections is how to ensure that the data exchanged is accurate, meaningful, and understandable by all participating parties. Users, consuming services, and data sources all operate as peers in an SOA. These peers will often use data in new and unanticipated ways. So it becomes increasingly difficult to serve meaningful information without normalizing existing data assets. This includes not just schematic normalization, but instance-level de-confliction as well.

To this end, data quality studies are paramount in ensuring the success of a SOA implementation. Typically this includes understanding what data is available, where this data is located, and what state it's in:

  • What types of data are used in the enterprise and for what purpose?
  • What underlying quality issues can be identified based on design metadata and current business rules?
  • What kinds of data are core to the business and what are ancillary or only necessary for augmented records?
  • For core data, how many non-SOA systems currently store this data and in what format?
  • For core data, what are the instance-level values for data records and to what extent are they different across systems?
  • What are the intended semantics, or business meaning, encoded in the data structures and values?
  • Is any of this data stale or outdated? Is any of it incorrect? Has any of it been improperly imported?

More Stories By Tieu Luu

Tieu Luu works at SuprTEK where he helps the U.S. government create and implement strategies and architectures that apply innovative technologies and approaches in IT. You can read more of Tieu’s writing at his blog at http://tieuluu.com/blog.

More Stories By Sandeep Maripuri

Sandeep Maripuri is an associate with Booz Allen Hamilton where he designs and implements data sharing architectures that apply service-oriented concepts.  Prior to joining Booz Allen Hamilton, Sandeep held architecture and engineering positions in both large consulting firms and a commercial software startup, where he was an architect and lead engineer of one of the first commercially-available semantic data interoperability platforms.

 

More Stories By Riad Assir

Riad Assir is a senior technologist with Booz Allen Hamilton where he designs enterprise systems for commercial and government clients.  Prior to Booz Allen Hamilton, Riad held Senior technology positions at companies such as Thomson Financial, B2eMarkets and Manugistics, where he worked on large supply chain systems development.

Comments (2)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


Microservices Articles
"NetApp's vision is how we help organizations manage data - delivering the right data in the right place, in the right time, to the people who need it, and doing it agnostic to what the platform is," explained Josh Atwell, Developer Advocate for NetApp, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
TCP (Transmission Control Protocol) is a common and reliable transmission protocol on the Internet. TCP was introduced in the 70s by Stanford University for US Defense to establish connectivity between distributed systems to maintain a backup of defense information. At the time, TCP was introduced to communicate amongst a selected set of devices for a smaller dataset over shorter distances. As the Internet evolved, however, the number of applications and users, and the types of data accessed and...
DevOps is often described as a combination of technology and culture. Without both, DevOps isn't complete. However, applying the culture to outdated technology is a recipe for disaster; as response times grow and connections between teams are delayed by technology, the culture will die. A Nutanix Enterprise Cloud has many benefits that provide the needed base for a true DevOps paradigm. In their Day 3 Keynote at 20th Cloud Expo, Chris Brown, a Solutions Marketing Manager at Nutanix, and Mark Lav...
Kin Lane recently wrote a couple of blogs about why copyrighting an API is not common. I couldn’t agree more that copyrighting APIs is uncommon. First of all, the API definition is just an interface (It is the implementation detail … Continue reading →
The Jevons Paradox suggests that when technological advances increase efficiency of a resource, it results in an overall increase in consumption. Writing on the increased use of coal as a result of technological improvements, 19th-century economist William Stanley Jevons found that these improvements led to the development of new ways to utilize coal. In his session at 19th Cloud Expo, Mark Thiele, Chief Strategy Officer for Apcera, compared the Jevons Paradox to modern-day enterprise IT, examin...
The Software Defined Data Center (SDDC), which enables organizations to seamlessly run in a hybrid cloud model (public + private cloud), is here to stay. IDC estimates that the software-defined networking market will be valued at $3.7 billion by 2016. Security is a key component and benefit of the SDDC, and offers an opportunity to build security 'from the ground up' and weave it into the environment from day one. In his session at 16th Cloud Expo, Reuven Harrison, CTO and Co-Founder of Tufin, ...
Many organizations are now looking to DevOps maturity models to gauge their DevOps adoption and compare their maturity to their peers. However, as enterprise organizations rush to adopt DevOps, moving past experimentation to embrace it at scale, they are in danger of falling into the trap that they have fallen into time and time again. Unfortunately, we've seen this movie before, and we know how it ends: badly.
Kubernetes is a new and revolutionary open-sourced system for managing containers across multiple hosts in a cluster. Ansible is a simple IT automation tool for just about any requirement for reproducible environments. In his session at @DevOpsSummit at 18th Cloud Expo, Patrick Galbraith, a principal engineer at HPE, discussed how to build a fully functional Kubernetes cluster on a number of virtual machines or bare-metal hosts. Also included will be a brief demonstration of running a Galera MyS...
Your homes and cars can be automated and self-serviced. Why can't your storage? From simply asking questions to analyze and troubleshoot your infrastructure, to provisioning storage with snapshots, recovery and replication, your wildest sci-fi dream has come true. In his session at @DevOpsSummit at 20th Cloud Expo, Dan Florea, Director of Product Management at Tintri, provided a ChatOps demo where you can talk to your storage and manage it from anywhere, through Slack and similar services with...
Growth hacking is common for startups to make unheard-of progress in building their business. Career Hacks can help Geek Girls and those who support them (yes, that's you too, Dad!) to excel in this typically male-dominated world. Get ready to learn the facts: Is there a bias against women in the tech / developer communities? Why are women 50% of the workforce, but hold only 24% of the STEM or IT positions? Some beginnings of what to do about it! In her Day 2 Keynote at 17th Cloud Expo, Sandy Ca...