| By Jimmy Zhang | Article Rating: |
|
| September 14, 2008 11:00 PM EDT | Reads: |
3,516 |
My question becomes: Am I the only one seeing the elephant in the room?
How VTD-XML Changes the Picture
Simply put, VTD-XML provides a solution so spectacular that the problem is completely gone.
The first part of this article series introduced VTD-XML as a memory-efficient, high-performance XML parser with integrated indexing and XPath. Virtually every technical benefit of VTD-XML is, one way or another, the result of non-extractive parsing, meaning the original XML text is loaded in memory and fully preserved. However, the most important benefits of VTD-XML - the ones that truly set it apart from other XML processing models - lie in its unique ability to manipulate XML document content at the byte level. Below are three distinct, yet related, sets of capabilities available in the latest version of VTD-XML.
- Incremental XML modifier: You can modify an XML document incrementally through the XMLModifier, which defines three types of "modify" operations: inserting new content into any location (i.e., offset) in the document, deleting content (by specifying the offset and length), and replacing old content with new content - which effectively is a deletion and insertion at the same location. To compose a new document containing all the changes, call XMLModifier's output(...) method.
- XML slicer and splicer: You can use a pair of integers (offset and length) to address a segment of XML text so your application can slice the segment from the original document and move it to another location in the same or a different document. The VTDNav class exposes two methods that allow you to address an element fragment: getElementFragment(), which returns a 64-bit integer representing the offset and length value of the current element, and getElementFragmentNs() (in the latest version), which returns an ElementFragmentNs object representing a "namespace-compensated" element fragment. The latest version also transparently supports transcoding, so you can perform cutting and pasting across documents with different encoding formats.
- XML editor: You can directly edit the in-memory copy of the XML text using VTDNav's overWrite(...) method, provided that the original tokens you're overwriting are wide enough to hold the new byte content.
Using VTD-XML as an incremental modifier to update the text node, you basically navigate the VTD records to the right location, stick in the change, and generate a new document - exactly the same way you would do it with NotePad. Listing 1 shows a simple application updating a text node using VTD-XML.
XML Processing: Object Oriented vs Document Centric
Traditional XML processing models, such as DOM, SAX, and various object data binding tools, are designed around the notion of objects. The XML text - merely the output of object serialization - is relegated to the status of a second-class citizen. You base your applications on DOM nodes, strings, and various business objects, but rarely on the physical documents. If you have followed my analysis so far, it's become obvious that this object-oriented approach of XML processing makes little sense as it causes performance hits from virtually all directions (an in-depth discussion on the topic can be found in "the performance woe of binary XML"). Not only are object creations and garbage collection inherently memory- and CPU-intensive, but applications incur the cost of re-serialization with even the smallest changes to the original text.
What is "document-centric" XML processing? In non-extractive parsing, the XML text - the persistent data format - is the starting point from which everything else comes about. Whether it's parsing, XPath evaluation, modifying content, or slicing element fragments, by default, you no longer work with objects. You only do that when it makes sense. More often than not, you treat documents purely as syntax, and think in bytes, byte arrays, integers, offsets, lengths, fragments, and namespace-compensated fragments. The first-class citizen in this paradigm is the XML text. The object-centric notions of XML processing, such as serialization and de-serialization (or marshaling and un-marshaling), as shown in Figure 1, are often displaced, if not replaced, by more document-centric notions of parsing and composition. Increasingly, you will find that your XML programming experience is getting simpler. Not surprisingly, the simpler, intuitive way to think about XML processing is also the most efficient and powerful (see Table 1 for the technical comparison of DOM and VTD-XML).
Published September 14, 2008 Reads 3,516
Copyright © 2008 SYS-CON Media, Inc. — All Rights Reserved.
Syndicated stories and blog feeds, all rights reserved by the author.
More Stories By Jimmy Zhang
Jimmy Zhang is a cofounder of XimpleWare, a provider of high performance XML processing solutions. He has working experience in the fields of electronic design automation and Voice over IP for a number of Silicon Valley high-tech companies. He holds both a BS and MS from the department of EECS from U.C. Berkeley.
- The Top 150 Players in Cloud Computing
- Commercial vs Federal Cloud Computing
- Why IBM’s Server Chief Got Busted
- Industry Experts Discuss the State of Cloud Computing
- Cloud Expo New York Call for Papers Deadline December 15
- Cloud Computing on Gartner's Top 10 List and SYS-CON Events' 2010 Calendar
- US Federal Government is Major Cloud Computing Innovator
- Google Wave
- Ulitzer.com Named Exclusive "New Media" Sponsor of Cloud Computing Conference & Expo
- Tactical Cloud Computing Panel at 1st Annual GovIT Expo
- Adaptivity & Cloud Computing: Exclusive Q&A with CEO Tony Bishop
- 4th International Cloud Expo: Photo Album
- The Top 150 Players in Cloud Computing
- SYS-CON.TV: Cloud Computing Expo Power Panel
- Commercial vs Federal Cloud Computing
- Why IBM’s Server Chief Got Busted
- 1st Annual GovIT Expo: Letter from the Technical Chair
- Deputy CIO of the CIA to Keynote 1st Annual GovIT Expo
- Industry Experts Discuss the State of Cloud Computing
- SOA World Power Panel on SYS-CON.TV
- CIA was Headed to an Enterprise Cloud All Along: Jill Tummler Singer
- 1st Annual Government IT Conference & Expo: Themes & Topics
- Cloud Expo New York Call for Papers Deadline December 15
- Stock in Focus: Dragon Capital
- The i-Technology Right Stuff
- Who Are The All-Time Heroes of i-Technology?
- Get the Message
- Where Are RIA Technologies Headed in 2008?
- i-Technology Viewpoint: Is Web 2.0 the Global SOA?
- i-Technology Viewpoint: Thinking Outside the VC Box
- ESB Myth Busters: 10 Enterprise Service Bus Myths Debunked
- i-Technology Viewpoint: When to Leave Your First IT Job
- SOA Web Services Edge Conference Coverage on SYS-CON.TV
- Five Reasons Why Web 2.0 Matters
- SYS-CON.TV's "SOA Web Services" and "Enterprise Open Source" Programs To Air in December
- SOA World Conference & Expo SYS-CON.TV Power Panel Live From Times Square









There are a variety of applications that supp...






















