Welcome!

SOA & WOA Authors: Dana Gardner, John Ryan, Jeremy Geelan, David Linthicum, Rebel Brown

Related Topics: SOA & WOA, XML

SOA & WOA: Article

Complete Data Integration Through XQuery

Vastly simplifing SOA implementations

Most businesses have an urgent need for up-to-date, accurate information based on data from multiple data sources. It would be much easier if all your data were stored in one database so it can be queried as a whole, but this is rarely practical. In the real world, data integration is required. You need a simple, efficient way to query data found in various data sources.

Suppose you use information about customers in your Service Oriented Architecture (SOA). Your company might use an external CRM system like salesforce.com for leads and customer data, an internal ERP system like PeopleSoft for processing orders, a dedicated software solution to track technical support calls, and one or more databases to store customer information not captured by these other systems. Each of these data sources has a different representation of a customer, a different API and data model, and perhaps a different query language. Nevertheless, you have to be able to combine this data intelligently to get an overview of a customer.

For instance, you may want to generate one report with the overall status of a customer, or you may want to find all customers with outstanding tech support issues who are deciding whether to make a major purchase this quarter. If all of your data were in a single database, you would retrieve the information with a simple query. Because the data is in many different sources, you have to write a good deal of code to get the same result, and the code is quite different for each data source. This is time-consuming, error prone, and complicates security and auditing. With XQuery, you query each data source as though it were XML, no matter how the underlying data is physically stored.

XQuery is the World Wide Web Consortium (W3C) standard XML query language, designed for both XML processing and data integration. Using XQuery for data integration vastly simplifies SOA implementations, making your developers more productive and improving the performance of your systems. An XML Integrated Development Environment (IDE) that supports XQuery makes it much easier for you to visualize data sources, generate and test queries, and debug. The queries you develop can be exposed via a data access layer, which is accessed using SOAP or HTTP, so that they can be reused in different SOAP message formats or in other applications.

XQuery Simplifies Data Integration
XQuery simplifies data integration in two ways. First, it provides native support for XML and for the operations most frequently needed when working with XML. Today, XML is at the heart of most data integration, and this is certainly true for SOA environments where every SOAP message is expressed in XML. Most languages don't support XML natively. In general, programming languages are based on objects or structures where query languages are based on relational tables and scripting languages are based on text. XQuery is based on XML, and XML is the only data structure in XQuery. In the same way that SQL queries tables to produce tables, XQuery queries XML to produce XML - and the XML produced by an XQuery can be used directly in XML applications. For example, a query result might be the payload of a SOAP message. XQuery provides direct support for querying, creating, and transforming XML.

One frequently used expression in XQuery, the FLWOR expression, is similar to SQL's SELECT-FROM-WHERE. Because XML structures are more complex than SQL tables, XQuery provides path expressions that can identify any item in an XML structure. To create structures in query results, it also provides constructors, using a syntax that looks like the XML to be constructed. A typical XQuery might use path expressions to locate data, FLWOR expressions to perform joins and combine data, and constructors to create the structures of the query result. These tasks are much more tedious with conventional programming languages. For instance, to achieve the same result with the Java DOM API, this would require parsing, navigating object structures, casting values from XML into Java data types, creating a result tree structure, and appending nodes to that result tree. In general, conventional programming languages require seven to 20 times more code than an equivalent XQuery. Not only are XML applications harder to write in conventional programming languages, performance can be much better in a good XQuery implementation, because XQuery is a declarative language that allows the implementation to do many useful kinds of query optimization.

The second way XQuery simplifies data integration is by eliminating the need to work with different APIs and data models for each data source. The XQuery language is defined in terms of XML structures, but since almost any data can be mapped into XML structures, an XQuery implementation can use XQuery to query just about anything. For instance, an XQuery implementation can provide support for relational data, implementing queries by generating efficient SQL for the database, but allowing a user to query the data as though it were XML.

By treating all data sources as XML, this kind of XQuery implementation lets a developer query relational data, Web message calls, and other data sources together, with a small amount of declarative code, in one uniform data model, without mastering the idiosyncrasies of each system.

Consider the customer example in the introduction. With an XQuery implementation that supports all of the underlying data sources, a developer can write a simple query to do a join among the different systems that represent different aspects of a customer. This dramatically simplifies software development in most business environments. The developer focuses on the information that's needed, not on the representation used in each system. Typically, the code savings in data integration environments is even greater than in pure XML environments.

The available data sources and the implementation strategy vary widely among XQuery implementations. For relational data, an implementation may translate an XQuery into SQL then translate the SQL result sets to XML when returning results to the query engine. For flat file formats, an implementation can provide XML converters that actually convert data to XML on-the-fly when it's queried. Web Service calls may be supported using functions that can be called from within a query. When choosing an XQuery implementation, make sure that it fits in your computing environment and can handle the data sources needed in your architecture. The XQuery implementations from most database vendors are designed to query only data stored in their database; most companies have more than one database, and data not found in a database.

The XQuery implementations from application server vendors or XML integration server vendors can query a wider range of data sources, but require the adoption of their server, which may not fit in your architecture, or may increase the footprint of the system. If you're writing Web Services in a Java environment, make sure your implementation supports the XQuery API for Java (XQJ), which is the standard Java interface for XQuery - it lets your servlets use XQuery the same way that JDBC lets servlets use SQL. Also, the performance of XQuery implementations varies dramatically - make sure that you test performance for the data you work with, especially if you're using XQuery for relational data or very large XML files. Because XQuery is declarative and can be optimized, a good implementation will provide performance better than you normally achieve with hand-coded Java, JDBC, SQL, and an XML API.

Using XQuery vastly simplifies data integration, offering loosely coupled access, and providing one way to query any data source supported by the query engine. And because an XQuery implementation can talk directly to the original data source, it can do optimizations that are no longer available once the data is extracted and converted to physical XML. As a result, what is easier for the developer also results in better performance.

XML Development Environments for Data Integration
It's hard to understand the relationships among data without some way to visualize the data. This is particularly true when working with data from multiple sources. When doing data integration, look for an IDE that lets you visualize as many of your data sources as possible, supports general XML functionality, and has good support for XQuery. Some of these tools let you establish database connections, drag-and-drop from data sources to create XQuery code, run queries and see their output, and run a debugger to help find bugs. These tools make developers more productive.

When choosing an IDE for data integration with XQuery, consider related functionality that you may need. For instance, some IDEs also provide support for developing XML pipelines and publishing. Several IDEs can generate XQJ code to run an XQuery as part of a program. One XQuery IDE is implemented as an Eclipse plug-in, which is very convenient for Java developers who use Eclipse. Several IDEs also provide good support for writing and testing XSLT stylesheets, W3C XML Schemas and DTDs, and related XML development.

The Data Access Layer In most companies, several data consumers need to access the same information. For instance, if one of your Web Services needs a description of a customer, this same description might also be useful for other Web Services, and also for dynamic Web sites, AJAX clients, publishing applications, or any other application that needs customer data. Frequently companies design for a single project, coding very similar interfaces for each data consumer, an obvious waste of programming effort. And if the data sources change, each of these interfaces has to be rewritten. In environments where security and auditability are important, much more code must be audited.

A data access layer lets many data consumers access data using the same well-defined interface. For each request, the data access layer calls a data service. Data services should represent the business model, hiding underlying systems and the data integration task from data consumers. For instance, you might write a data service that provides the data for a single customer. A data service can be parameterized - a parameter might identify the customer ID or the name of a particular view of the customer.

Many data services do nothing more than query data from one or more data sources to produce XML. These data services can be written directly in XQuery, using external variables to allow queries to be parameterized. In other data services, an XQuery may be part of a Java program that performs business logic or interacts with other systems, or it may be part of an XML pipeline. A small focused team can be responsible for writing the queries to implement data services, and for documenting available services, allowing data consumers to access these services using standard Web and XML interfaces.

Summary
XQuery provides a simple way to query data across data sources, providing simple, efficient data integration. With a good XQuery implementation, any data source can be queried as though it were XML, and any desired XML structure can be created as the result of a query. For instance, a query can take relational data and other data sources as input, and return the payload for a SOAP message. XQuery increases productivity by freeing developers from the need to learn a different API and data model for each data source, and provides direct support for the operations commonly used in XML. Depending on the XQuery implementation, data sources might include relational data from one or more databases, XML files, Web Service calls, EDI, and legacy file formats among others. An XML development environment that allows visualization of multiple data sources and provides support for XQuery can further enhance developer productivity. When many data consumers need access to the same kinds of data, data integration can be done in a data access layer that provides a set of data services, representing the business model, that hide the details of data integration and allow reuse of data integration code.

Because businesses need up-to-date information that comes from a variety of data sources, but the proper tools and development methods have lagged behind, today's software systems are often needlessly complex and ad hoc. Modern data integration tools are the solution. Using XQuery, an XML IDE, and a data access layer simplifies development significantly, improves performance, increases code reusability, and makes systems more maintainable.

About Jonathan Robie

Jonathan Robie is the XML program manager at DataDirect Technologies. Before joining DataDirect, Jonathan was an XML research specialist at Software AG. Jonathan works very closely with the W3C; he is a co-author of the XQuery specification, has participated in several W3C Working Groups, and speaks regularly at XML conferences. Jonathan wrote an XQuery tutorial for a book called XQuery from the Experts which is now available on Amazon.com.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.