| By Jim Gabriel | Article Rating: |
|
| February 2, 2005 12:00 AM EST | Reads: |
18,972 |
This article describes how an essential precursor to any SOA implementation is a data modeling exercise that integrates all underlying data models, focusing more on the business requirements than on system- and application-specific requirements.
Integrating data models in a complex enterprise can be difficult because the IT landscape often reveals massive duplication and redundancy. Refining this situation without semantic loss is a tough nut to crack. This article discusses the problem domain and makes some recommendations.
Gartner, Inc., advises organizations wishing to fully exploit service-oriented business applications to focus on integrating the processes and underlying data models, rather than on integrating individual application components. Failure to integrate these aspects will place the organization at a competitive disadvantage. Gartner believes that such metadata management is "essential to reducing the escalating complexity of management and maintenance of integrated software platforms."
This is very sound advice. When exposing an application through Web services, it is very tempting to write services that speak directly and specifically to the application, as this is the simplest, most cost-effective path in development (at least in the short term). Unfortunately, this gives us "tight coupling" - that is, we are exposing the interface to the application component and little more. Web services that access an application in this way provide the equivalent of RPC-like, point-to-point integration. "Tightly coupled" is not one of the many definitions applicable to SOA that we should by now have learned by rote. Rather, "loosely coupled" and "dynamic" spring to mind as far more applicable.
The benchmark for testing whether services are sufficiently loosely coupled and dynamic is the level to which services are application specific. If developers write services that can be application agnostic, much of the battle has been won. In other words, you should be able to unplug the underlying application component and plug in an equivalent from another manufacturer, and the service should not need updating.
The technical infrastructure that enables application agnosticism depends on a layered approach to SOA. That is, application-agnostic services are only possible if the SOA implements a data model that properly represents and integrates the underlying data models in the enterprise at a layer of abstraction higher than the interface layer. This is very important in the interests of long-term maintainability and evolution. The integrated data model is the source of all data definitions and interface definitions required by services, but it is also the basis for resolving model-to-model mappings in the interface layer. The relationship between services and underlying applications is illustrated in Figure 1.
The relationship between services and application components passes through a number of other layers. First, the payloads of message-centric services are described by schemas. These schemas are assembled from the integrated data model. The relationship between the payload schemas and the underlying data models is managed through transformations. The transformations are built against the integrated data model, which must therefore have knowledge both of the underlying data models and of the payload schemas that have been assembled from the model.
The interface layer is the transformation layer where data is transformed from its format, as described by the integrated data model, into the format required by the application model, and vice versa. For this reason, the integrated data model is sometimes referred to as the "interface model" or "transformation model."
This metadata topography offers up a number of interesting issues:
- Multiple layers of description. Application landscapes can be validly described by the sum total of pieces in each layer. For example, the sum total of the schemas in the schema layer arguably provides an equivalent description of the application landscape to the sum total of the underlying data models.
- References to any given object can occur in multiple layers and contexts. From a programming perspective, these layers of metadata and application objects suggest a high level of duplication and potential redundancy. Maintenance and evolution considerations are of paramount importance in planning the implementation of your SOA.
- The integrated data model must include knowledge of the underlying data models that it integrates. An integrated data model is partly a collection of existing data models, and partly a new schematic representation of the data and processes in an enterprise.
- Is an integrated data model a real, tangible model, or is it a logical concept only?
- What modeling or schema language should the integrated data model use?
- How do you create the integrated data model?
- How do you resolve duplication and redundancy?
- Is an integrated data model a passive reflection or an active master?
- Where do you store the integrated data model?
- How do you manage the life cycle of the integrated data model?
The integrated data model is real. You cannot build the various layers described in Figure 1 unless you collate in one place the metadata that describes all the data describing the SOA. For example, transformations in the interface layer are schema-to-schema mappings that can only be defined if the source and target schemas actually exist. Where you store it and how you manage it are very important questions that you will need to answer before implementing the SOA. Your decisions at this stage will have long-term consequences on the maintainability of your implementation.
Flavor
The integrated data model is an XML data model that is best described in XML Schema; after all, service payloads are constrained by XML Schema. Note that the expression of the data model - that is, how you choose to deploy it as something tangible and usable (in this case a family of xsd files) - is not necessarily the same as the development image of the model. For example, some organizations capture everything in UML and export the resulting model as XML Schemas. However, it is good advice to stay as close to the final implementation as possible.
Creating the Integrated Model
To achieve an integrated data model for an organization, a single, homogeneous model must be created that fully represents all the relevant underlying data models, including any schemas used for trading partners or industry standards. This is a serious data modeling exercise that typically requires the input of highly experienced analysts and architects. The end result is a set of custom standards for the enterprise.
The starting point is often a metadata-gathering phase, where all the existing models are imported into a central place (usually a repository). Exposing metadata is not always straightforward, as some applications do not publish interfaces or schemas and others require some interpretation or embellishment along the way. For example, database schemas in an Oracle database can be exported as DTDs or XML Schema files using the XSUutility (XML-SQL Utility), but ensuring that relationships and constraints are properly expressed in the resulting schemas requires a manual pass over the metadata.
Be aware that importing metadata from existing systems creates an application-specific view of the landscape, warts and all. There is little point in creating application-specific integration models, as this does not abstract us high enough above the underlying application specifics. This exercise must therefore be accompanied by a broader analysis that models the actual data requirements of the enterprise, preferably with a very forward-looking appraisal of the expectations facing the business. You should look at this as a bonus. As Thomas Erl says in his article "Best Practices for Transition Planning" in the November 2004 issue (WSJ Vol. 4, issue 11), in "planning a migration to a standardized adoption of SOA you?have an opportunity to erase some of the neglect from the past." This is a speculative analysis action that certainly applies to the data-modeling phase.
The data modeling phase also provides an excellent opportunity not only to erase some of the neglect from the past, but also to introduce other essential best practices into your SOA, such as a service-oriented security model.
At this metadata gathering stage, some organizations use metadata management tools, equipped with the drivers and connectors, to facilitate the import of metadata from various systems in the application landscape. However, before you rush out and buy a metadata repository, be sure you understand exactly what your long-term requirements of such a system might be, paying particularly good attention to questions of maintenance and evolution.
Duplication and Redundancy
When data models are integrated, some translation and rationalization is inevitable because duplication and redundancies must be resolved. It is essential that no object properties are lost or diminished during the rationalization phase, as this would result in a reduction of the metadata and therefore of the potential functionality. Rather, the resulting integrated data model should be more descriptive and functional than a simple sum of the component models.
When rationalizing data models, we often have to interpret and manage the fact that two objects in separate application models are essentially the same object. How we deal with this depends on the requirements. If two objects exist because different teams of developers have made up their own different names for the object - and this is a very common problem in XML-driven systems -you can either straighten out the problem in the underlying models before integrating them or create a new "alias" object in your integrated model that can map to each variant.
Another common problem is the use of the same name in underlying models for what are essentially different objects. The solution here is similar to the previous problem: either solve it before integrating, or integrate by creating new objects at a higher level. Note that when working with XML Schemas, namespaces can determine how you proceed. Clearly, if two teams have used the same name in the same namespace for different purposes, and the act of integrating the data models exposes that problem for the first time, there is no way you can allow both objects to continue coexisting with different meanings.
When two objects that are essentially the same object need to coexist for transformation purposes, the integrated model must describe both objects rather than attempt to resolve them into one, otherwise it is not possible to create the transformation. For example, in a trading partner scenario where a legacy system from a supplier processes a credit card payment according to constraints described in a DTD (and your system speaks an XML Schema equivalent), the correct way to integrate your data models would be to load both the DTD and the XML Schema equivalent, and then create the mappings between the constituent objects.
Passive or Active?
Active metadata is business driven; passive metadata is technology driven. Once we have an integrated data model, the metadata takes on an active role. Prior to this stage, the role of metadata was passive because it served no further purpose other than to describe and constrain data. It reflected existing systems and application components. Active metadata drives new development effort from within the metadata. In other words, when we need to modify the payload of a service to satisfy a changing business requirement, our starting point must be the externalized schema that describes the payload. This is a very serious consideration for tooling and evolution management. From this point onward, any changes to the way the business functions will force developers to go to the metadata first - that is, the integrated data model - and make their changes there before changing code.
Storing the Model
Bearing in mind the active nature of the metadata in your SOA, it is essential that you store the model in an environment that supports the concept of change. Repositories that provide container functionality and business analysis support are essential parts of the IT landscape before you implement your SOA. Once you start implementing the SOA, the integrated data model must be managed in a model-driven environment with full developer support. This allows changes to be made centrally and deployed out to the environment in the form of version-controlled schemas, transformations, and so on.
Managing the Life Cycle of the Model
The term that is applicable to managing the life cycle of a data model is "metadata evolution management." This is at the heart of any SOA, because SOA is a development-time and deployment-time concept that requires an integration platform that orchestrates Web services. Web services are described by, and carry payloads that are constrained by, metadata that is expressed in XML Schema. When systems evolve, the metadata must also evolve, thus making XML-metadata evolution management an essential part of the infrastructure (see Figure 2).
This was the subject of my article "Metadata Evolution Management in your SOA" in the last issue (WSJ Vol. 5, issue 1), so I would refer you to that discussion for more detail and some recommendations for managing the evolution of metadata in the SOA. Suffice it to say that managing the life cycle of your Web services development, particularly from the perspective of the evolution of metadata, is not simply a schema-versioning problem. Versioning schemas is about technical constructs and development processes, not about the management of metadata evolution. Metadata evolution management is the real problem facing the long-term life-cycle management of Web services development projects.
Summary
The data modeling exercise is such an important precursor to any SOA implementation that I would describe it as more than a best practice: it is essential. This data modeling exercise must integrate all underlying data models, focusing on the business requirements and abstracting above system- and application-specific requirements where possible. Integrating data models will reveal duplication and redundancy, which must be powerfully resolved by suitably experienced architects and analysts. The end result will be a metadata model that will assume an active role in your IT landscape. Preserving the integrity of the metadata model as systems and business requirements evolve will constitute the next major challenge for the IT department, for which a strategy and technical solution must be prepared.
References
Published February 2, 2005 Reads 18,972
Copyright © 2005 SYS-CON Media, Inc. — All Rights Reserved.
Syndicated stories and blog feeds, all rights reserved by the author.
More Stories By Jim Gabriel
Jim Gabriel has authored tens of thousands of pages of technical documentation, ranging from entry-level tutorial material to programmers' reference manuals. He is literate in XML, SGML, and XSL, among others.
![]() |
Javier Cámara 02/10/05 03:30:43 AM EST | |||
I fully agree in that Domain Modeling (where "Domain" means the whole Business of the enterprise) is really a good idea for integration and in general managing the IT resources of an organization. However, in many cases, those IT systems being integrated provide not only *data*, but also *operations*, and from my point of view the Domain Model should cover both. Any corporate metadata management tool should allow to describe and manage both aspects, but not only one. While data can be modeled as XML Schema, operations can be modeled as WSDL and other WS-* standards. Regards |
||||
- The Top 150 Players in Cloud Computing
- Commercial vs Federal Cloud Computing
- Why IBM’s Server Chief Got Busted
- Industry Experts Discuss the State of Cloud Computing
- Cloud Expo New York Call for Papers Now Open
- Cloud Computing on Gartner's Top 10 List and SYS-CON Events' 2010 Calendar
- US Federal Government is Major Cloud Computing Innovator
- Google Wave
- Ulitzer.com Named Exclusive "New Media" Sponsor of Cloud Computing Conference & Expo
- Tactical Cloud Computing Panel at 1st Annual GovIT Expo
- Adaptivity & Cloud Computing: Exclusive Q&A with CEO Tony Bishop
- 4th International Cloud Expo: Photo Album
- The Top 150 Players in Cloud Computing
- SYS-CON.TV: Cloud Computing Expo Power Panel
- Commercial vs Federal Cloud Computing
- Why IBM’s Server Chief Got Busted
- 1st Annual GovIT Expo: Letter from the Technical Chair
- Industry Experts Discuss the State of Cloud Computing
- Deputy CIO of the CIA to Keynote 1st Annual GovIT Expo
- SOA World Power Panel on SYS-CON.TV
- CIA was Headed to an Enterprise Cloud All Along: Jill Tummler Singer
- Cloud Expo New York Call for Papers Now Open
- 1st Annual Government IT Conference & Expo: Themes & Topics
- Stock in Focus: Dragon Capital
- The i-Technology Right Stuff
- Who Are The All-Time Heroes of i-Technology?
- Get the Message
- Where Are RIA Technologies Headed in 2008?
- i-Technology Viewpoint: Is Web 2.0 the Global SOA?
- i-Technology Viewpoint: Thinking Outside the VC Box
- ESB Myth Busters: 10 Enterprise Service Bus Myths Debunked
- i-Technology Viewpoint: When to Leave Your First IT Job
- SOA Web Services Edge Conference Coverage on SYS-CON.TV
- Five Reasons Why Web 2.0 Matters
- SYS-CON.TV's "SOA Web Services" and "Enterprise Open Source" Programs To Air in December
- SOA World Conference & Expo SYS-CON.TV Power Panel Live From Times Square










Cloud computing is a game changer. The cloud ...
























