| By Ash Parikh | Article Rating: |
|
| March 2, 2012 07:00 AM EST | Reads: |
1,381 |
Given the speed at which organizations are conducting business today, the promise of increased agility is making "Data Virtualization" a hot topic. However, as with all things, the devil's in the details. We need to go a bit beyond simply looking under the hood.
Yes, data virtualization is an agile data integration approach that provides fast and direct access to new critical data that the business can trust and consume. That's all well said and done. However, it's a loaded statement. Each word, in fact, needs to be put under a microscope to make sure that simple data federation is not being passed off as data virtualization. Why? It's simple - the ROI just disappears.

To do this correctly, we need to take a short trip down memory lane. Wayne Eckerson's blog on the TDWI website still remains one of the best sources of information on this subject. It pays due respect to data federation as a technology, but then also helps us understand why its limitations stood in its own path to higher glory. In particular, let's read this line very carefully:
"Data federation offers many advantages - it's a fast, flexible, low cost way to integrate diverse data sets in real time. But data integration offers benefits that data federation doesn't: scalability, complex transformations, and data quality and data cleansing."
Yes - that's exactly right. Data federation by definition means high performance. But, if performance is a given, is that all you need? What about making sure that the data you are sourcing from is of good quality and in the right format? Who is it that is making sure that the data is trustworthy and ready for consumption? BI tools won't help with that. Does the "business" even play a role in defining the rules?
"But what if you could combine the best of these two worlds and deliver a data integration platform that offered data federation as an integrated module, not a bolt on product? What if you could get all the advantages of both data federation and data integration in a single toolset?"
In my previous blog, I shared a list of critical capabilities that architects look for in an advanced data virtualization solution. I spoke about being metadata-driven and marrying the sophistication of data integration with the agility of data federation. Let's flip these capabilities on their head and understand the potential impact of using a technology that is based heavily on its data federation heritage.
You would:
- Work with an environment that is SQL or XQuery code-heavy, resulting in high maintenance
- Hand-code sophisticated cleansing rules and transformations, reinventing the wheel every time
- Not be able to profile and cleanse federated data on-the-fly, needing staging and more processing
- Have no way to seamlessly reuse virtual views for batch, leaving you stranded
- Use one environment for data integration and one for federation, with no reuse of skills or work
Where's the ROI? Where's the agility since you are losing precious time at every step? What about the added cost? By the way, did I mention it takes years to build and integrate a comprehensive data integration and data quality platform into the solution? Simple schedulers don't pass for data integration and simple address cleansing web services just don't cut it as data quality.
The Forrester Wave: Data Virtualization, Q4 2011, summarizes it well - "Data virtualization solutions provide a virtualized data services layer that integrates data from heterogeneous data sources and content in real time, near-real time, or batch as needed to support a wide range of applications and processes. Data provided through the data services layer can be updated, transformed, and/or cleansed when (or before) applications access it. Data services layers can do more than federation."
The Gartner Magic Quadrant for Data Integration Tools, October 27, 2011, reinforces the need for a "degree of commonality, consistency and interoperability between the various components of the data integration toolset." It makes a special note about "the ability to switch seamlessly and transparently between delivery modes (bulk/batch vs. granular real-time vs. federation) with minimal rework."
Do look under the hood, but then go beyond that and do the due diligence needed to maximize your return on data. Hear what industry architects are saying, and join the discussions here.
Published March 2, 2012 Reads 1,381
Copyright © 2012 SYS-CON Media, Inc. — All Rights Reserved.
Syndicated stories and blog feeds, all rights reserved by the author.
More Stories By Ash Parikh
Ash Parikh is responsible for driving Informatica’s product strategy around real-time data integration and SOA. He has over 17 years of industry experience in driving product innovation and strategy at technology leaders such as Raining Data, Iopsis Software, BEA, Sun and PeopleSoft. Ash is a well-published industry expert in the field of SOA and distributed computing and is a regular presenter at leading industry technology events like XMLConference, OASIS Symposium, Delphi, AJAXWorld, and JavaOne. He has authored several technical articles in leading journals including DMReview, AlignJournal, XML Journal, JavaWorld, JavaPro, Web Services Journal, and ADT Magazine. He is the co-chair of the SDForum Web services SIG.
- Cloud Expo New York: Why PostgreSQL is the Database for the Cloud
- Cloud Expo New York Speaker Profile: Dave Linthicum – Blue Mountain Labs
- Agile Adoption – Crossing the Chasm
- Cloud Expo New York: The Java EE 7 Platform - Developing for the Cloud
- Write Once Run Anywhere or Cross Platform Mobile Development Tools
- Cross-Platform Mobile Website Development – a Tool Comparison
- Cloud Expo New York: Cloud Architectures Require Scale-Out Storage
- Cloud Expo New York: The Growing Big Data Tools Landscape
- Architecture Governance – the TOGAF Way
- Big Data – A Sea Change of Capabilities in IT
- Cloud Expo New York: Cloud Computing and Healthcare
- Cloud Expo New York: Mobilizing Enterprise Applications for the Cloud
- Cloud Expo New York: Why PostgreSQL is the Database for the Cloud
- Cloud Expo New York Speaker Profile: Dave Linthicum – Blue Mountain Labs
- Agile Adoption – Crossing the Chasm
- Red Hat Executive Appointed to Technology Services Industry Association (TSIA) Support Services Advisory Board
- Graal, a Dynamic Java Compiler in the Works
- Cloud Expo New York: The Java EE 7 Platform - Developing for the Cloud
- Write Once Run Anywhere or Cross Platform Mobile Development Tools
- Cross-Platform Mobile Website Development – a Tool Comparison
- Cloud Expo New York: Cloud Architectures Require Scale-Out Storage
- What Motivates Open Standards in the Cloud?
- Cloud Expo New York: The Growing Big Data Tools Landscape
- Architecture Governance – the TOGAF Way
- The i-Technology Right Stuff
- The Top 150 Players in Cloud Computing
- Who Are The All-Time Heroes of i-Technology?
- Where Are RIA Technologies Headed in 2008?
- Get the Message
- ESB Myth Busters: 10 Enterprise Service Bus Myths Debunked
- i-Technology Viewpoint: Is Web 2.0 the Global SOA?
- i-Technology Viewpoint: Thinking Outside the VC Box
- i-Technology Viewpoint: When to Leave Your First IT Job
- SOA Web Services Edge Conference Coverage on SYS-CON.TV
- SYS-CON.TV's "SOA Web Services" and "Enterprise Open Source" Programs To Air in December
- Five Reasons Why Web 2.0 Matters


















