| By Tieu Luu, Sandeep Maripuri, Riad Assir | Article Rating: |
|
| June 15, 2006 11:30 AM EDT | Reads: |
19,526 |
Besides these scoping and profiling exercises to manage data quality, it's also imperative to resolve value-level conflicts that exist in the data. These conflicts can be categorized into three major types (C.H. Goh, "Representing and Reasoning about Semantic Conflicts in Heterogeneous Information Systems," Sloan School of Management, Massachusetts Institute of Technology, 16-22, January 1997.):
These data conflicts can often be addressed by using commercial data management tools and methodologies, as well as enterprise data modeling software. Another emerging possibility is semantics-centric modeling environments. Instead of hard-coding data cleansing routines, these tools use a semantic description of the enterprise - the business concepts and relationships between those concepts, as well as any business rules governing the relationships - and provide a mechanism to describe how legacy systems support the semantics of the enterprise. This useful abstraction lets the enterprise deterministically identify how each enterprise data asset supports the enterprise business functions, as well as any gaps between the enterprise semantic model and the underlying data representation schemes. This modeling approach can then be used to determine where physical data conflicts or duplications may exist, as well as forward engineer data consolidation and cleansing scripts.
Data Access Controls
In traditional application
architectures, data access security is typically governed by
application-specific mechanisms. In this environment, each source has
its own set of users, roles, and access control policies. Which means
that user profiles, roles, and access control policies lack consistency
across the enterprise. An SOA environment magnifies this problem by
making data sources visible across the organization. So it becomes
increasingly important to move away from individual
application-specific and data source-specific mechanisms in favor of
enterprise-level SOA identity management and access control mechanisms.
This means that when creating the central data services layer, the data sources must rely on central provisioning of some security functions so they can be managed centrally. The challenge is in finding the right balance between the security functions that should be managed centrally and what should be managed as part of the data sources. There are several options in implementing such a scheme, including a centrally managed data security layer, or using layered authorization through multiple policy decision points (PDP).
With the central management option, the data sources relinquish security and rely solely on the data services to protect the access to their data. Within each data source, a single user profile is created for the data service that has full access to the data. Any request to the data through this service is authorized through this user profile. So there's no longer a concern about whether the principal's identity from the overarching security domain exists or means anything in the data source. However, this option pushes security checks into the data service layer and reduces the granularity of accountability. As a consequence, any access control policies from the data source along with the associated roles and privileges should now be re-created and maintained at the central enterprise points.
In contrast, layering the use of multiple policy decision points encourages the reuse of existing authorization capabilities, user profiles, and access control policies of the underlying data sources. This approach allows some of the more fine-grained access control decisions to be made at the data sources rather than elevating them into the enterprise layer. Although many variations exist for this design, the premise is that different layers of authorization with multiple PDPs are making the decisions. The basic flow of this approach is as follows: Authentication still occurs at the edge using enterprise authentication services. Requests for data originate at different security domains in the enterprise. A PDP in each of these domains evaluates requests for resources in that domain. When a data service is invoked it calls the enterprise policy decision point to authorize access to the data service as well as the specific operation requested. The data service then delegates the decision to each data source so they can authorize access to their specific data object(s). Thus, coarse-grained decisions are made at the enterprise level while finer-grained decisions use data source-specific profiles and policies that aren't exposed to the enterprise.
Data Services Architecture
From an architectural
perspective, the heart of this solution is an enterprise layer that
logically centralizes access to the data spread across the enterprise.
This set of logically centralized data services provides several
architectural advantages. First, the enterprise can assert greater
control over the governance and implementation of data access
mechanisms. Second, clients use a consistent mechanism to access data.
Third, the enterprise can design and implement a solution in a holistic
fashion instead of the typical one-off models that are the norm in data
integration. Finally, besides the basic Create, Read, Update, and
Delete (CRUD) operations, the underlying architecture must also support
data aggregation, inter-service transactions, and multiple access and
usage patterns, all while ensuring acceptable levels of quality of
service.
Data Aggregation Scenarios
This data services
layer acts as a façade over the enterprise assets - it logically
provides access to enterprise data assets in a singular manner, while
physically dispatching requests and aggregations across relevant
co-located assets. Three main scenarios should be considered for data
aggregation:
Some of these aggregation capabilities can be supported through Enterprise Information Integration (EII) technology, which provides SOA-centric capabilities for accessing and querying co-located data in real-time. EII products provide adapters to legacy data sources and expose their underlying data in a service-oriented fashion. EII is best used in discrete query-based mechanisms where data volumes are moderate. EII isn't meant to be a replacement for traditional ETL (extract, transform, load), EAI (enterprise application integration), or MDM (master data management) technologies. For example, some of the aggregation scenarios requiring de-duplication capabilities can require the use of MDM technologies.
The data services layer allows creates and updates to be requested once by a client and then decomposed by the supporting architecture into individual write commands to targeted data sources. Therefore, the architecture must support transactionality - ensuring that writes are consistent so that underlying data across all affected data sources are left in a consistent state. This isn't significantly different from current data integration pains. However, most systems today requiring multi-write transaction capabilities leverage the XA standards. Similar standards for the Web Services environment are only starting to emerge. OASIS has recently formed a Web Services Transaction Technical Committee (WS-TX TC) responsible for stewarding WS-AtomicTransaction, WS-Coordination, and WS-BusinessActivity specifications through the standardization process. None of these standards have been ratified yet. Because these specifications are still being developed, most SOA-related transaction support is being custom-developed, typically through the use of homegrown compensation mechanisms - effectively an "undoing" of a previously executed service invocation. Instead of providing true rollback semantics, compensation is an additional service invocation that rewrites data to its original state. While it may be beneficial to take a wait-and-see approach to building transactionality, solutions aligned with the three specifications seeding WS-TX deliberations will likely provide the path of least resistance to standards compliance.
Quality of Service
With all the data access
operations going through this data services layer, a major concern is
the potential bottleneck at this layer that may limit scalability. The
obvious way to resolve this problem is to create a clustered
environment with multiple instances of this data services layer.
There are complexities with clustering dependant on whether the enterprise is using a purely federated approach or has some level of data replication. If using a purely federated approach, then it can be simple to have a cluster with multiple instances. However, the architecture must still address the issue of affinity for a particular instance - especially in the case of inter-service transactions. The architecture must address questions such as: Are all operations that are part of a transaction forced to go to the same data service instance? Can different operations that use different data service instances still be part of the transaction?
A simple solution is to require all operations in a single transaction to interact with a single service instance. However, this solution isn't without its disadvantages since it can affect how well the load is distributed across the cluster. With some replication, clustering becomes more difficult. In addition to the server affinity issue, the architecture must include a partitioning strategy. This strategy answers questions such as: Do all instances of the data services allow access to all the data? Or are data services partitioned so that only certain instances allow access to certain data?
Data Access and Usage Patterns
It's important to
note that different applications have different data access and usage
patterns. Some applications can produce many transactions but access
only a small amount of data in each transaction. For other
applications, the transaction throughput can be small but the volume of
data that's accessed very large. The way to tune data source
performance for these patterns is very different. When using a data
services solution to provide centralized access to enterprise data
sources, the enterprise must accommodate all the various access and
usage patterns of the applications that will be integrated with this
solution. Tuning the infrastructure to support a single application's
performance requirements is complicated, trying to tune it to
adequately support multiple patterns of use and access will be even
more difficult. Often, there will be conflicting configurations -
something that optimizes the performance of one application will
degrade the performance of another. The enterprise should analyze and
model the access and use patterns of the applications that will be
using the data services and ensure that well-defined performance
criteria for each scenario have been developed. Additionally, enough
time should be planned for testing the performance of a particular
solution with simulations that reflect the access and usage patterns
that are common to the enterprise environment.
Summary
Harmonizing data assets has always been a
challenging problem; the problems and urgency are further exacerbated
when migrating to an SOA. Developing a strategy for handling this kind
of transition is essential to properly enabling data access in an
enterprise SOA environment. By developing appropriate requirements and
use cases and by analyzing data assets and data usage, organizations
can better understand the breadth and depth of their data integration
issues and begin to take steps to address them. Ultimately, every
organization must develop a strategy tailored to its specific needs,
but the overall approach described in this article provides guidance in
understanding what types of questions should be asked and how to
leverage possible technology solutions to address the resulting issues
that are identified. This guidance will enable organizations to fully
leverage and exploit their most important strategic asset: their data.
Published June 15, 2006 Reads 19,526
Copyright © 2006 SYS-CON Media, Inc. — All Rights Reserved.
Syndicated stories and blog feeds, all rights reserved by the author.
More Stories By Tieu Luu
Tieu Luu works at Booz Allen Hamilton where he helps the U.S. government create and implement strategies and architectures that apply innovative technologies and approaches in IT. You can read more of Tieu’s writing at his blog at http://tieuluu.com/blog.
More Stories By Sandeep Maripuri
Sandeep Maripuri is an associate with Booz Allen Hamilton where he designs and implements data sharing architectures that apply service-oriented concepts. Prior to joining Booz Allen Hamilton, Sandeep held architecture and engineering positions in both large consulting firms and a commercial software startup, where he was an architect and lead engineer of one of the first commercially-available semantic data interoperability platforms.
More Stories By Riad Assir
Riad Assir is a senior technologist with Booz Allen Hamilton where he designs enterprise systems for commercial and government clients. Prior to Booz Allen Hamilton, Riad held Senior technology positions at companies such as Thomson Financial, B2eMarkets and Manugistics, where he worked on large supply chain systems development.
![]() |
Business Integration Architecture & Technology 06/19/06 06:24:01 PM EDT | |||
Trackback Added: SOA and Data Architecture ; A data access tier is an architectural component of many systems designs. Reusable data objects are a fundamental building block for SOA. Yet many architects are ignoring the data tier for SOA. |
||||
![]() |
SOA Web Services Journal News 06/15/06 11:47:03 AM EDT | |||
The adoption of Service Oriented Architecture (SOA) promises to further decouple monolithic applications by decomposing business functions and processes into discrete services. While this makes enterprise computing assets more accessible and reusable, SOA implementation patterns are primarily an iteration over previous application development models. Like most application development evolutions, SOA approaches inject more layers and flexibility into the application tier, but have often neglected the most fundamental building block of all applications: the underlying data. |
||||
- The Top 150 Players in Cloud Computing
- Commercial vs Federal Cloud Computing
- Why IBM’s Server Chief Got Busted
- Industry Experts Discuss the State of Cloud Computing
- Cloud Expo New York Call for Papers Now Open
- Cloud Computing on Gartner's Top 10 List and SYS-CON Events' 2010 Calendar
- US Federal Government is Major Cloud Computing Innovator
- Google Wave
- Ulitzer.com Named Exclusive "New Media" Sponsor of Cloud Computing Conference & Expo
- Tactical Cloud Computing Panel at 1st Annual GovIT Expo
- Adaptivity & Cloud Computing: Exclusive Q&A with CEO Tony Bishop
- 4th International Cloud Expo: Photo Album
- The Top 150 Players in Cloud Computing
- SYS-CON.TV: Cloud Computing Expo Power Panel
- Commercial vs Federal Cloud Computing
- Why IBM’s Server Chief Got Busted
- 1st Annual GovIT Expo: Letter from the Technical Chair
- Industry Experts Discuss the State of Cloud Computing
- Deputy CIO of the CIA to Keynote 1st Annual GovIT Expo
- SOA World Power Panel on SYS-CON.TV
- CIA was Headed to an Enterprise Cloud All Along: Jill Tummler Singer
- Cloud Expo New York Call for Papers Now Open
- 1st Annual Government IT Conference & Expo: Themes & Topics
- Stock in Focus: Dragon Capital
- The i-Technology Right Stuff
- Who Are The All-Time Heroes of i-Technology?
- Get the Message
- Where Are RIA Technologies Headed in 2008?
- i-Technology Viewpoint: Is Web 2.0 the Global SOA?
- i-Technology Viewpoint: Thinking Outside the VC Box
- ESB Myth Busters: 10 Enterprise Service Bus Myths Debunked
- i-Technology Viewpoint: When to Leave Your First IT Job
- SOA Web Services Edge Conference Coverage on SYS-CON.TV
- Five Reasons Why Web 2.0 Matters
- SYS-CON.TV's "SOA Web Services" and "Enterprise Open Source" Programs To Air in December
- SOA World Conference & Expo SYS-CON.TV Power Panel Live From Times Square










Cloud computing is a game changer. The cloud ...




















