|By Roger Gaskell||
|September 7, 2011 09:30 AM EDT||
Is MapReduce the Holy Grail answer to the pressing problem of processing, analyzing and making sense of large and growing data volumes? Certainly it has potential in this arena, but there is a distressing gap between the amount of hype this technology - and its spinoffs - has received and the number of professionals who actually know how to integrate and make best use of it.
Industry watchers say it's just a matter of time before MapReduce sweeps through the enterprise data warehouse (EDW) market the same way open source technologies like Linux have done. In fact, in a recent blog post, Forrester's James Kobielus proclaimed that most EDW vendors will incorporate support for MapReduce's open source cousin Hadoop into the heart of their architectures to enable open, standards-based data analytics on massive amounts of data.
So, no more databases, just MapReduce? I'm not so sure. But don't misunderstand. It's not that MapReduce isn't an effective way to analyze data in some cases. The big names in Internet business are all using it - Facebook, Google, Amazon, eBay et al - so it must be good, right? But it's worth taking a more measured view based both on the technical and the practical business merits. I believe that the two technologies are not so mutually exclusive; that they will work hand-in-hand and, in some cases, MapReduce will be integrated into the relational database (RDBMS).
Google certainly has proven that MapReduce excels at making sense out of the exabytes of unstructured data on the web, which it should, given that MapReduce was designed from the outset for manipulating very large data sets. MapReduce in this sense provides a way to put structure around unstructured data. We humans prefer structure; it's in our DNA. Without structure, we have no real way of adding value to the data. Unstructured data analytics is something of an oxymoron for a pattern-seeking hominid.
MapReduce helps us put structure around the unstructured so we can then make sense of it. It creates an environment wherein a data analyst can write two simple functions, a "mapper" and a "reducer," to perform the actual data manipulation, returning a result that is at once both an analysis of the data it has just mapped and summarized, as well as the structure for further analysis that will help provide insight into the data. Whether that further analysis is done in a MapReduce environment might be the more appropriate question.
From an infrastructure standpoint, MapReduce excels where performance and scalability are challenges. Applications written using the MapReduce framework are automatically parallelized, making it well suited to a large infrastructure of connected machines. As it scales applications across lots of servers made up of lots of nodes, the MapReduce framework also provides built-in query fault tolerance so that whatever hardware component might fail, a query would be completed by another machine. Further, MapReduce and its open source brethren can perform functions not possible in standard SQL (click-stream sessionization, nPath, graph production of potentially unbounded length in SQL).
What's not to love? At a basic level I believe the MapReduce framework is an inefficient way of analyzing data for the vast majority of businesses. The aforementioned capabilities of MapReduce are all well and good, provided you have a Google-like business replete with legions of programmers and vast amounts of server and memory capacity. Viewed from this perspective, it makes perfect sense that Google developed and used MapReduce: because it could. It had a huge and growing resource in its farms of custom-made servers, as well as armies of programmers constantly looking for new ways to take advantage of that seemingly infinite hardware (and the data collected on it), to do cool new things.
Similarly, the other high-profile adopters and advocates are also IT-savvy, IT-heavy companies and, like Google, have the means and ongoing incentive to get a MapReduce framework tailored to their particular needs and reap the benefits. Would a mid-size firm know how? It seems doubtful. While it has claimed that MapReduce is easy to use, even for programmers without experience with distributed systems, I know from field experience with customers that it does, in fact, take some pretty experienced folks to make best use of it.
Projects like Hive, Google Sawzall, Yahoo Pig and companies like Cloudera all, in essence, attempt to make the MapReduce paradigm easier for lesser experts to use and, in fact, make it behave for the end user more like a parallel database. But this raises the question: Why? It seems to be a bit of re-inventing the wheel. IT-heavy is not how most businesses operate today, especially in these economic times. The dot-com bubble is long over. Hardware budgets are limited and few companies relish the idea of hiring teams of programming experts to maintain even a valuable IT asset such as their data warehouse. They'd rather buy an off-the-shelf tool designed from the ground up to do high-speed data analytics.
Like MapReduce, commercially available massively parallel processing databases specifically built for rapid, high volume data analytics will provide immense data scale and query fault tolerance. They also have a proven track record of customer deployments and deliver equal if not better performance on Big Data problems. Perhaps as important, today's next-generation MPP analytic databases give businesses the flexibility to draw on a deep pool of IT labor skilled in established conventions such as SQL.
As mentioned earlier, unstructured data seems like a natural for MapReduce analysis. A rising tide of chatter is focused on the increasing problem - and importance - of unstructured data. There is more than a bit of truth to this. As the Internet of everything becomes more and more a reality, data is generated everywhere; but our experience to date is that businesses are most interested in data derived from the transactional systems they've wired their businesses on top of, where structure is a given.
Another difficulty faces companies even as MapReduce becomes more integrated into the overall enterprise data analysis strategy. MapReduce is a framework. As the hype and interest have grown, MapReduce solutions are being created by database vendors in entirely non-standard and incompatible ways. This will further limit the likelihood that it will become the centerpiece of an EDW. Business has demonstrated time and again that it prefers open standards and interoperability.
Finally, I believe a move toward a programmer-centric approach to data analysis is both inefficient and contrary to all other prevailing trends of technology use in the enterprise. From the mobile workforce to the rise of social enterprise computing, the momentum is away from hierarchy. I believe this trend is the only way the problem of making Big Data actionable will be effectively addressed. In his classic book on the virtues of open source programming, The Cathedral and the Bazaar, Eric S. Raymond put forth the idea that open source was an effective way to address the complexity and density of information inherent in developing good software code. His proposition, "given enough eyeballs, all bugs are shallow," could easily be restated for Big Data as, "given enough analysts, all trends are apparent." The trick is - and really always has been - to get more people looking at the data. You don't achieve that end by centering your data analytics efforts on a tool largely geared to the skills of technical wizards.
MapReduce-type solutions as they currently exist are most effective when utilized by programmer-led organizations focused on maximizing their growing IT assets. For most businesses seeking the most efficient way to quickly turn their most valuable data into revenue generating insight, MPP databases will likely continue to hold sway, even as MapReduce-based solutions find a supporting role.
SYS-CON Events has announced today that Roger Strukhoff has been named conference chair of Cloud Expo and @ThingsExpo 2016 Silicon Valley. The 19th Cloud Expo and 6th @ThingsExpo will take place on November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. "The Internet of Things brings trillions of dollars of opportunity to developers and enterprise IT, no matter how you measure it," stated Roger Strukhoff. "More importantly, it leverages the power of devices and the Interne...
Jul. 26, 2016 05:15 AM EDT Reads: 2,081
DevOps at Cloud Expo – being held November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA – announces that its Call for Papers is open. Born out of proven success in agile development, cloud computing, and process automation, DevOps is a macro trend you cannot afford to miss. From showcase success stories from early adopters and web-scale businesses, DevOps is expanding to organizations of all sizes, including the world's largest enterprises – and delivering real results. Am...
Jul. 26, 2016 01:45 AM EDT Reads: 2,200
The 19th International Cloud Expo has announced that its Call for Papers is open. Cloud Expo, to be held November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA, brings together Cloud Computing, Big Data, Internet of Things, DevOps, Digital Transformation, Microservices and WebRTC to one location. With cloud computing driving a higher percentage of enterprise IT budgets every year, it becomes increasingly important to plant your flag in this fast-expanding business opportuni...
Jul. 26, 2016 01:15 AM EDT Reads: 2,540
This digest provides an overview of good resources that are well worth reading. We’ll be updating this page as new content becomes available, so I suggest you bookmark it. Also, expect more digests to come on different topics that make all of our IT-hearts go boom!
Jul. 26, 2016 12:15 AM EDT Reads: 3,609
Keeping pace with advancements in software delivery processes and tooling is taxing even for the most proficient organizations. Point tools, platforms, open source and the increasing adoption of private and public cloud services requires strong engineering rigor – all in the face of developer demands to use the tools of choice. As Agile has settled in as a mainstream practice, now DevOps has emerged as the next wave to improve software delivery speed and output. To make DevOps work, organization...
Jul. 26, 2016 12:00 AM EDT Reads: 2,157
SYS-CON Events announced today that Isomorphic Software will exhibit at DevOps Summit at 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Isomorphic Software provides the SmartClient HTML5/AJAX platform, the most advanced technology for building rich, cutting-edge enterprise web applications for desktop and mobile. SmartClient combines the productivity and performance of traditional desktop software with the simp...
Jul. 25, 2016 11:00 PM EDT Reads: 1,050
Internet of @ThingsExpo, taking place November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with the 19th International Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world and ThingsExpo Silicon Valley Call for Papers is now open.
Jul. 25, 2016 10:00 PM EDT Reads: 2,524
In his session at @DevOpsSummit at 19th Cloud Expo, Yoseph Reuveni, Director of Software Engineering at Jet.com, will discuss Jet.com's journey into containerizing Microsoft-based technologies like C# and F# into Docker. He will talk about lessons learned and challenges faced, the Mono framework tryout and how they deployed everything into Azure cloud. Yoseph Reuveni is a technology leader with unique experience developing and running high throughput (over 1M tps) distributed systems with extre...
Jul. 25, 2016 07:15 PM EDT Reads: 2,115
SYS-CON Events announced today that LeaseWeb USA, a cloud Infrastructure-as-a-Service (IaaS) provider, will exhibit at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. LeaseWeb is one of the world's largest hosting brands. The company helps customers define, develop and deploy IT infrastructure tailored to their exact business needs, by combining various kinds cloud solutions.
Jul. 25, 2016 09:45 AM EDT Reads: 1,159
Adding public cloud resources to an existing application can be a daunting process. The tools that you currently use to manage the software and hardware outside the cloud aren’t always the best tools to efficiently grow into the cloud. All of the major configuration management tools have cloud orchestration plugins that can be leveraged, but there are also cloud-native tools that can dramatically improve the efficiency of managing your application lifecycle. In his session at 18th Cloud Expo, ...
Jul. 25, 2016 09:30 AM EDT Reads: 956
SYS-CON Events announced today that Venafi, the Immune System for the Internet™ and the leading provider of Next Generation Trust Protection, will exhibit at @DevOpsSummit at 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Venafi is the Immune System for the Internet™ that protects the foundation of all cybersecurity – cryptographic keys and digital certificates – so they can’t be misused by bad guys in attacks...
Jul. 25, 2016 08:30 AM EDT Reads: 1,303
Ovum, a leading technology analyst firm, has published an in-depth report, Ovum Decision Matrix: Selecting a DevOps Release Management Solution, 2016–17. The report focuses on the automation aspects of DevOps, Release Management and compares solutions from the leading vendors.
Jul. 25, 2016 08:00 AM EDT Reads: 1,682
This is a no-hype, pragmatic post about why I think you should consider architecting your next project the way SOA and/or microservices suggest. No matter if it’s a greenfield approach or if you’re in dire need of refactoring. Please note: considering still keeps open the option of not taking that approach. After reading this, you will have a better idea about whether building multiple small components instead of a single, large component makes sense for your project. This post assumes that you...
Jul. 25, 2016 03:30 AM EDT Reads: 4,081
The Internet of Things will challenge the status quo of how IT and development organizations operate. Or will it? Certainly the fog layer of IoT requires special insights about data ontology, security and transactional integrity. But the developmental challenges are the same: People, Process and Platform and how we integrate our thinking to solve complicated problems. In his session at 19th Cloud Expo, Craig Sproule, CEO of Metavine, will demonstrate how to move beyond today's coding paradigm ...
Jul. 24, 2016 09:45 PM EDT Reads: 2,151
Jul. 24, 2016 06:30 PM EDT Reads: 3,810
Right off the bat, Newman advises that we should "think of microservices as a specific approach for SOA in the same way that XP or Scrum are specific approaches for Agile Software development". These analogies are very interesting because my expectation was that microservices is a pattern. So I might infer that microservices is a set of process techniques as opposed to an architectural approach. Yet in the book, Newman clearly includes some elements of concept model and architecture as well as p...
Jul. 24, 2016 01:45 PM EDT Reads: 9,518
"We provide DevOps solutions. We also partner with some key players in the DevOps space and we use the technology that we partner with to engineer custom solutions for different organizations," stated Himanshu Chhetri, CTO of Addteq, in this SYS-CON.tv interview at DevOps at 18th Cloud Expo, held June 7-9, 2016, at the Javits Center in New York City, NY.
Jul. 24, 2016 11:00 AM EDT Reads: 1,625
Let's just nip the conflation of these terms in the bud, shall we?
"MIcro" is big these days. Both microservices and microsegmentation are having and will continue to have an impact on data center architecture, but not necessarily for the same reasons. There's a growing trend in which folks - particularly those with a network background - conflate the two and use them to mean the same thing.
They are not.
One is about the application. The other, the network. T...
Jul. 24, 2016 04:15 AM EDT Reads: 3,364
If you are within a stones throw of the DevOps marketplace you have undoubtably noticed the growing trend in Microservices. Whether you have been staying up to date with the latest articles and blogs or you just read the definition for the first time, these 5 Microservices Resources You Need In Your Life will guide you through the ins and outs of Microservices in today’s world.
Jul. 24, 2016 12:45 AM EDT Reads: 3,869
Before becoming a developer, I was in the high school band. I played several brass instruments - including French horn and cornet - as well as keyboards in the jazz stage band. A musician and a nerd, what can I say? I even dabbled in writing music for the band. Okay, mostly I wrote arrangements of pop music, so the band could keep the crowd entertained during Friday night football games. What struck me then was that, to write parts for all the instruments - brass, woodwind, percussion, even k...
Jul. 24, 2016 12:30 AM EDT Reads: 2,160