| By Jimmy Zhang | Article Rating: |
|
| February 20, 2008 02:15 PM EST | Reads: |
31,638 |
VTD+XML in 30 Seconds
Allowing XML parsing to be
decoupled from application logic, the key in the example above is the
index file "po.vxl," which conforms to the VTD+XML spec. What is
VTD+XML? Since VTD-XML's internal representation of XML infoset is
inherently persistent, VTD+XML, as the name suggests, is simply the
binary packaging format that combines VTD records, LCs entries, and XML
into a single file. The detailed technical spec can be found at http://vtd-xml.sourceforge.net/persistence.html.
A Simple Example
This section gets down to the
nitty-gritty of the specification by manually composing, byte-by-byte,
a VTD+XML index. For the sake of simplicity, this example chooses to
index a simple XML document containing a single child-less root element
whose parsed representation doesn't have location cache entries. This
example also assumes a big-endian byte order (as in Java) and UTF-8
document encoding (the default character set). The name space awareness
is set to false.
<root/>
The first four-byte word of the corresponding index file is 0x0102A000 containing:
- The VTD+XML version number (0x01) in the first byte
- The character encoding format (0x02) in the second byte (Jimmy1)
- The name space awareness, word length of LC entries in the last level, byte endian-ness of the platform, and VTD version as encoded in various bit fields in the third byte (0xA0)(Jimmy2)
- The document depth (0x0 as the root element has no child)(Jimmy3)
The second four-byte word has the value of 0x00040001 containing:
- The number of LC levels supported by the VTD-XML implementation in the upper 16 bits (0x0004 in big endian)(Jimmy4)
- The root element index value in the lower 16 bits (0x0001 in big endian)(Jimmy5)
The remaining part of VTD+XML index consists of multiple adjacent segments each containing an eight-byte word (0x0000000000000002 indicating the VTD record or LC entry count) followed by the actual content of the VTD records or LC entries. The first eight-byte word (0x000000000000000002) indicates that there are two VTD records that are 0xDFF0000000000000 and 0x0000000400000001.
The remaining three eight-byte words all have the value of zero indicating that the location caches in level one, two, and three have zero entry in the VTD+XML index.
As the final output, the VTD+XML index for "<root/>" is 88-bytes long and looks like the following hex:
0x0102A00000040001 0x0000000000000000
0x0000000000000000 0x0000000000000007
0x3C726F6F742F3E00 0x0000000000000002
0xDFF0000000000000 0x0000000400000001
0x0000000000000000 0x0000000000000000
0x0000000000000000
Benefits and Limitations
Because VTD+XML
straightforwardly combines VTD and XML, it inherits all the benefits of
VTD-XML parsing. When compared with existing XML indices (e.g., various
pure-binary XML indices modeling labeled, ordered tree etc.), VTD+XML
possesses many unique technical benefits:
• General Purpose - Before
VTD+XML, most native XML indices only optimize specific types (e.g.,
the axis) of Xpath lookups. If an input query differs slightly from the
index type, the query execution still has to resort to expensive
parsing. Due to this limitation, many native XML databases today
require users to create multiple indices, one for each input query type
so users can benefit from those indices. The problem is that XML
database applications usually serve many types of queries that are
unpredictable and complex in nature, often rendering the benefits of
indexing insignificant. In comparison, VTD+XML is the first index that
completely eliminates the cost of XML parsing and predictably speeds up
any type of XPath query. It also works with namespaces exceptionally
well.
• Human Readable - VTD+XML is
also the first human-readable XML index. You can actually open it in a
text editor to examine the XML text. Figure 1 is what "po.vxl" looks
like in "vim." More than just a nice property, VTD+XML's
human-readability offers distinct advantages over pure binary indexing
schemes. Everything else being equal, keeping XML in its original
format avoids the processing cost of converting to and from any binary
formats. Moreover, what if your applications just wants to modify the
XML payload, such as inserting into it a chunk of XML text extracted
out of another SOAP message? What's the point of converting XML to
binary formats? In a service-oriented heterogeneous environment,
maintaining XML in its original format automatically retains the
openness and interoperability. It just seems to me that the only
loss-less equivalent of XML is XML itself, no less.
Published February 20, 2008 Reads 31,638
Copyright © 2008 SYS-CON Media, Inc. — All Rights Reserved.
Syndicated stories and blog feeds, all rights reserved by the author.
More Stories By Jimmy Zhang
Jimmy Zhang is a cofounder of XimpleWare, a provider of high performance XML processing solutions. He has working experience in the fields of electronic design automation and Voice over IP for a number of Silicon Valley high-tech companies. He holds both a BS and MS from the department of EECS from U.C. Berkeley.
- The Top 150 Players in Cloud Computing
- Commercial vs Federal Cloud Computing
- Why IBM’s Server Chief Got Busted
- Industry Experts Discuss the State of Cloud Computing
- Cloud Expo New York Call for Papers Deadline December 15
- Cloud Computing on Gartner's Top 10 List and SYS-CON Events' 2010 Calendar
- US Federal Government is Major Cloud Computing Innovator
- Google Wave
- Ulitzer.com Named Exclusive "New Media" Sponsor of Cloud Computing Conference & Expo
- Tactical Cloud Computing Panel at 1st Annual GovIT Expo
- Adaptivity & Cloud Computing: Exclusive Q&A with CEO Tony Bishop
- 4th International Cloud Expo: Photo Album
- The Top 150 Players in Cloud Computing
- SYS-CON.TV: Cloud Computing Expo Power Panel
- Commercial vs Federal Cloud Computing
- Why IBM’s Server Chief Got Busted
- 1st Annual GovIT Expo: Letter from the Technical Chair
- Deputy CIO of the CIA to Keynote 1st Annual GovIT Expo
- Industry Experts Discuss the State of Cloud Computing
- SOA World Power Panel on SYS-CON.TV
- CIA was Headed to an Enterprise Cloud All Along: Jill Tummler Singer
- 1st Annual Government IT Conference & Expo: Themes & Topics
- Cloud Expo New York Call for Papers Deadline December 15
- Stock in Focus: Dragon Capital
- The i-Technology Right Stuff
- Who Are The All-Time Heroes of i-Technology?
- Get the Message
- Where Are RIA Technologies Headed in 2008?
- i-Technology Viewpoint: Is Web 2.0 the Global SOA?
- i-Technology Viewpoint: Thinking Outside the VC Box
- ESB Myth Busters: 10 Enterprise Service Bus Myths Debunked
- i-Technology Viewpoint: When to Leave Your First IT Job
- SOA Web Services Edge Conference Coverage on SYS-CON.TV
- Five Reasons Why Web 2.0 Matters
- SYS-CON.TV's "SOA Web Services" and "Enterprise Open Source" Programs To Air in December
- SOA World Conference & Expo SYS-CON.TV Power Panel Live From Times Square









There are a variety of applications that supp...



























