Welcome!

SOA & WOA Authors: David Linthicum, Rebel Brown, Liz McMillan, Miko Matsumura, Yeshim Deniz

Related Topics: SOA & WOA

SOA & WOA: Article

SOA and Data Integration

The marriage of data integration and SOA could end up in divorce

First, the history. Data integration is the name the vendors have adopted to replace the ETL (Extract Translate Load), data cleansing, and data warehousing tools of days gone by.

These tools actually pre-date the notion of EAI, and were really the first sets of technology designed to deal with data and the use of that data for decision support (business intelligence now). They would extract large amount of data from a single or several operational data stores, clean the data, roll up the data, and put it in another data store, typically the data mart or data warehouse for analysis. From there somebody would "mine" the data to extract relevant information, such as productivity over time, sales effectiveness over years, profit by division, you get the idea. Very powerful notion for its time, and very powerful technology today.

When I was first working on the notion of EAI, I used these tools, and thought they had value. However, I also understood that the value of integration was really about the movement of information in real-time, system-to-system, in patterns that resemble actual business processing (sales to inventory to accounting, etc.), and while there was some business intelligence there it was really in the domain of real-time monitoring (BAM). Thus, you really had two different threads of technology born...ETL (now data integration) and EAI (now integration or SOA).

Enter SOA, and the hype around it, and everyone looking to link to it. All data integration vendors, and EAI vendors for that matter, are repositioning, retooling, and remarketing their technology as a "SOA solution." So, where is the actual fit for data integration?

Unlike SOA, which can support real-time data movement, data integration (typically) provides adequate business information (data replication) without up-to-the-minute access of information. In many cases, the data is weeks, even months, old, and the data mart or data warehouse is updated through antiquated batch, extract-aggregate-and-load, processes. Indeed this is the way integration is done today, using a data integration product or in most cases no product at all. When I was doing research for my last book, for instance, I found that FTP was still the primary form of data integration.

Evolving Away, and to Data
Things are changing, fortunately. SOA, and the technology that comes with it, lets data warehouse architects and developers move information...no matter where it comes from or where it's going...as quickly as they want to move it. As a result, it's not unheard of to have all participating databases and services in a SOA solution getting new data constantly, thus providing more value to those using the source and target systems existing in the SOA...including those who use them as a data warehouse or data mart.

Therefore, the rise of SOA will also lead to the rise of real-time data warehouse solutions, or could replace the notion of data warehousing altogether. For instance, we could put the data behind services (data services or abstract data services), with users leveraging up-to-the-minute information to make better business decisions using BI tools or BAM, or just snapping them into composite applications.

As mentioned, as time goes on data integration products may not be needed as SOA architects craft services that abstract both operational and aggregated data, in some cases leveraging aggregated data without having to replicate and change the data, but doing it through abstraction layers. This approach, if possible for your domain, is much less expensive and less complex. In essence, the SOA becomes the place to leverage services that can deal with the data layer(s) through many types of abstraction services, services that can be mixed and matched in composite to create solution instances for the SOA. SOA's value is in bringing all of these things together as a platform for business solutions.

Needs Coupling...Okay, Some Coupling
For a technology to truly be a SOA technology, or so I argue, it have to support the notion of coupling, as well as cohesion, and not just one or the other. This is where some data integration products fall down.

Coupling, in the context of application integration and SOA, is the binding of applications together so that they are dependent on each other, sharing the same services, methods, interfaces, and perhaps data. This is the core notion of SOA where the applications are bound by shared services, versus the simple exchange of information (using services or not).

Of course, the degree of coupling that occurs is really dependent on the SOA architect, and how she or he binds source and target systems together. In some instances systems are tightly coupled, meaning they're dependent on each other. In other instances, they are loosely coupled, meaning that they're more independent. It doesn't matter if you're doing this through Web Services or other mechanisms; you're typically going to have to make these architectural tradeoffs within the notion of coupling.

There are, of course, more pros and cons of coupling that should be considered in the context of the problem you're looking to solve. On the pros side you have the ability to bind systems by sharing behavior, and bound data, versus simply sharing information. This provides the integration solution set with the ability to share services that could be redundant to the integrated systems, thus reducing development costs. This is the reason we leverage SOAs.

Then there's the ability to tightly couple processes as well as shared behavior. This means that process integration engines, layered on top of SOA solutions, have more skill at binding actual behavior (functions, methods, services) versus just simply moving information from place to place.

The problem is that many data integration solution are more about information/data than about sharing services, so they're hard fit for many SOAs. ESBs have a similar issue, but not as obvious. As a result, the marriage between data integration and SOA could end up in divorce if coupling is a requirement. Again, generally speaking.

About David Linthicum

David S. Linthicum (Dave) works for Booz Allen Hamilton in the Washington DC area, focusing on SOA and cloud computing. In addition, Dave is the Editor-in-Chief of SYS-CON's Virtualization Journal. Dave is an internationally known cloud computing and SOA expert. He is a sought-after consultant, speaker, and blogger. In his career, Dave has formed or enhanced many of the ideas behind modern distributed computing including EAI, B2B Application Integration, and SOA, approaches and technologies in wide use today. For the last 10 years, he has focused on the technology and strategies around cloud computing, including working with several cloud computing startups. His industry experience includes tenure as CTO and CEO of several successful software and cloud computing companies, and upper-level management positions in Fortune 500 companies. In addition, he was an associate professor of computer science for eight years, and continues to lecture at major technical colleges and universities, including University of Virginia and Arizona State University. He keynotes at many leading technology conferences, and has several well-read columns and blogs. Linthicum has authored 10 books, including the ground-breaking "Enterprise Application Integration" and "B2B Application Integration." You can reach him at david@bluemountainlabs.com. Or follow him on Twitter. Or view his profile on LinkedIn.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.