What's Really Industry Changing About Cloud Computing?

Four exciting new directions in massively parallel cloud computing

Bill McColl's "Cloud N" Blog

This is an incredibly important time for the cloud computing area. But let’s try and move the discussion of it in the press along from an obsession with new datacenter buildings located by power stations, with the total server numbers at Microsoft and Google, and with Amazon’s hourly pricing for EC2. Interesting though those aspects of cloud computing appear to be to journalists, they hardly represent what is really industry changing about cloud computing

What are some of the new directions in the massively parallel cloud computing space? I’ll mention four that I’m particularly interested in, that are exciting and challenging, and that I think will have a huge impact on the industry. If you have others in mind, then feel free to add your ideas.

Here are my four areas for what they’re worth:

  • Cloudbursting. Seamlessly and automatically migrating (parts of) a massively parallel computation back and forth between private and public clouds in real-time driven by changing resource demands, performance demands, hardware availability, and economics. Lots of existing vendors, such as Microsoft, will want/need great solutions to this challenge. I expect it will emerge and become widespread pretty quickly.
  • Libraries and App Stores. Developing apps from scratch in MapReduce is great, but we also need to begin to see application libraries and app stores that provide modules that are massively parallel and ready to run on both private and public clouds (and on both at the same time via cloudbursting). Libraries for major enterprise apps, for machine learning and recommendation, for scientific computing, and for semantic web and Datalog apps would be particularly interesting. Projects like Mahout are a small first step in this direction. As more and more leading universities start to teach MapReduce to their students, and pursue MapReduce-based research projects, we will hopefully see a lot more in this area.
  • Live Data. Massively parallel real-time programming on live data streams (complementing what MapReduce provides for historical/stored data). In addition to the exabytes of private live streams within businesses, web companies, telcos, scientific research centers, and government departments, there are also now torrential flows of live streaming data available from commercial companies such as Thomson Reuters, Bloomberg, Nasdaq, Xignite, StrikeIron, Spinn3r and many others. This is the area we are aiming to disrupt at Cloudscale.
  • Domain Specific Development Tools. Eclipse, Visual Studio or even Emacs is probably OK as a development environment for a computer science Ph.D. at a major bank developing a Hadoop application to support algorthmic trading. However, for each one of those CS PhDs working in financial services, there are probably thousands of portfolio managers around the world who could benefit enormously and immediately from the power of massively parallel cloud computing, but today use only basic tools such as spreadsheets. As the demand for the consumerization of software accelerates, there is a tremendous opportunity now for innovation that can deliver the power of massively parallel processing behind very high level user interfaces that are extremely easy-to-use, targeted at specific domains, and where the parallelism is implicit. The gold standard for power and ease-of-use is of course Google’s “one-search-box-on-a-white-page”. Ten years of incredible innovation behind the scenes to deliver improved scale and power, but no change to the ultra-simple user interface. We won’t be able to achieve that kind of ultrasimplicity in very many other areas, but it’s a great target to aim for. Massively parallel processing is great, but hey let’s also try to build some high level interfaces that begin to unleash its power to the mass market. That’s a real innovation challenge, and a real opportunity. I expect it will be at least as hard as (probably much harder than) building the back-end engine. As I noted in a previous blog, “simplicity and ease-of-use combined with scalability and power is the future” in referring to Bernard Lunn’s remark on what we need from software “Usable without a manual within 30 minutes, still valuable for a sophisticated power user 2 years later. That is the mark of greatness. It is a real art. The great ones make it look simple - it is not simple!”

Bill McColl left Oxford University to found Cloudscale. At Oxford he was Professor of Computer Science, Head of the Parallel Computing Research Center, and Chairman of the Computer Science Faculty. Along with Les Valiant of Harvard, he developed the BSP approach to parallel programming. He has led research, product, and business teams, in a number of areas: massively parallel algorithms and architectures, parallel programming languages and tools, datacenter virtualization, realtime stream processing, big data analytics, and cloud computing. He lives in Palo Alto, CA.

