Microservices Expo Authors: Liz McMillan, Pat Romanski, Carmen Gonzalez, Elizabeth White, Jason Bloomberg

Related Topics: Microservices Expo

Microservices Expo: Article

BI - The Game-Changing Era of Advanced Analytics

Architectural shift joins app logic with massive data sets to take advanced BI analytics to real-time performance heights

New architectures for data and logic processing are ushering in a game-changing era of advanced analytics.

These new approaches support massive data sets to produce powerful insights and analysis -- yet with unprecedented price-performance. As we enter 2010, enterprises are including more forms of diverse data into their business intelligence (BI) activities. They're also diversifying the types of analysis that they expect from these investments.

At the same time, more kinds and sizes of companies and government agencies are seeking to deliver ever more data-driven analysis for their employees, partners, users, and citizens. It boils down to giving more communities of participants what they need to excel at whatever they're doing. By putting analytics into the hands of more decision makers, huge productivity wins across entire economies become far more likely.

But such improvements won’t happen if the data can't effectively reach the application's logic, if the systems can't handle the massive processing scale involved, or the total costs and complexity are too high.

In this sponsored podcast discussion we examine how convergence of data and logic, of parallelism and MapReduce -- and of a hunger for precise analysis with a flood of raw new data -- are all setting the stage for powerful advanced analytics outcomes.

To help learn how to attain advanced analytics and to uncover the benefits from these new architectural activities for ubiquitous BI, we're joined by Jim Kobielus, senior analyst at Forrester Research, and Sharmila Mulligan, executive vice president of marketing at Aster Data Systems. The discussion is moderated by BriefingsDirect's Dana Gardner, principal analyst at Interarbor Solutions.

Here are some excerpts:

Kobielus: Advanced analytics is focused on how to answer questions about the future. It's what's likely to happen -- forecast, trend, what-if analysis -- as well as what I like to call the deep present, really current streams for complex event processing.

What's streaming in now? And how can you analyze the great gushing streams of information that are emanating from all your applications, your workflows, and from social networks?

Advanced analytics is all about answering future-oriented, proactive, or predictive questions, as well as current streaming, real-time questions about what's going on now. Advanced analytics leverages the same core features that you find in basic analytics -- all the reports, visualizations, and dashboarding -- but then takes it several steps further.

... What Forrester is seeing is that, although the average data warehouse today is in the 1-10 terabyte range for most companies, we foresee the average warehouse size going, in the middle of the coming decade, into the hundreds of terabytes.

In 10 years or so, we think it's possible, and increasingly likely, that petabyte-scale data warehouses or content warehouses will become common. It's all about unstructured information, deep history, and historical information. A lot of trends are pushing enterprises in the direction of big data.

... We need to rethink the platforms with which we're doing analytical processing. Data mining is traditionally thought of as being the core of advanced analytics. Generally, you pull data from various sources into an analytical data mart.

That analytical data mart is usually on a database that's specific to a given predictive modeling project, let's say a customer analytics project. It may be a very fast server with a lot of compute power for a single server, but quite often what we call the analytical data mart is not the highest performance database you have in your company. Usually, that high performance database is your data warehouse.

As you build larger and more complex predictive models you quickly run into resource constraints on your existing data-mining platform. So you have to look for where you can find the CPU power, the data storage, and the I/O bandwidth to scale up your predictive modeling efforts.

... But, [there is] another challenge, which is advanced analytics producing predictive models. Those predictive models increasingly are deployed in-line to transactional applications to provide some basic logic and rules that will drive such important functions as "next best offer" being made to customers based on a broad variety of historical and current information.

How do you inject predictive logic into your transactional applications in a fairly seamless way? You have to think through that, because, right now, quite often analytical data models, predictive models, in many ways are not built for optimal embedding within your transactional applications. You have to think through how to converge all these analytical models with the transactional logic that drives your business.

New data platform

Mulligan: What we see with customers is that the advanced analytics needs and the new generation of analytics that they are trying to do is driving the need for a new data platform.

What you've got is a situation where enterprises want to be able to do more scalable reporting on massive data sets with very, very fast response times. On the reporting side, in terms of the end result to the customer, it is similar to the type of report they are trying to achieve, but the difference is that the quantity of data that they're trying to get at, and the amount of data that these reports are filling up is far greater than what they had before.

That's what's driving a need for a new platform underneath some of the preexisting BI tools that are, in themselves, good at reporting, but what the BI tools need is a data platform beneath them that allows them to do more scalable reporting than you could do before.

... Previously, the choice of a data management platform was based primarily on price-performance, being able to effectively store lots of data, and get very good performance out of those systems. What we're seeing right now is that, although price performance continues to be a critical factor, it's not necessarily the only factor or the primary thing driving their need for a new platform.

What's driving the need now, and one of the most important criteria in the selection process, is the ability of this new platform to be able to support very advanced analytics.

Customers are very precise in terms of the type of analytics that they want to do. So, it's not that a vendor needs to tell them what they are missing. They are very clear on the type of data analysis they want to do, the granularity of data analysis, the volume of data that they want to be able to analyze, and the speed that they expect when they analyze that data.

There is a big shift in the market, where customers have realized that their preexisting platforms are not necessarily suitable for the new generation of analytics that they're trying to do.

They are very clear on what their requirements are, and those requirements are coming from the top. Those new requirements, as it relates to data analysis and advanced analytics, are driving the selection process for a new data management platform.

There is a big shift in the market, where customers have realized that their preexisting platforms are not necessarily suitable for the new generation of analytics that they're trying to do.

We see the push toward analysis that's really more near real-time than what they were able to do before. This is not a trivial thing to do when, it comes to very large data sets, because what you are asking for is the ability to get very, very quick response times and incredibly high performance on terabytes and terabytes of data to be able to get these kind of results in real-time.

Social network analysis

Kobielus: Let's look at what's going to be a true game changer, not just for business, but for the global society. It's a thing called social network analysis.

It's predictive models, fundamentally, but it's predictive models that are applied to analyzing the behaviors of networks of people on the web, on the Internet, Facebook, and Twitter, in your company, and in various social network groupings, to determine classification and clustering of people around common affinities, buying patterns, interests, and so forth.

As social networks weave their way into not just our consumer lives, but our work lives, our life lives, social network analysis -- leveraging all the core advanced analytics of data mining and text analytics -- will take the place of the focus group.

You're going to listen to all their tweets and their Facebook updates and you're going to look at their interactions online through your portal and your call center. Then, you're going to take all that huge stream of event information -- we're talking about complex event processing (CEP) -- you're going to bring it into your data warehousing grid or cloud.

You're also going to bring historical information on those customers and their needs. You're going to apply various social network behavioral analytics models to it to cluster people into the categories that make us all kind of squirm when we hear them, things like yuppie and Generation X and so forth.

They can get a sense of how a product or service is being perceived in real-time, so that the the provider of that product or service can then turn around and tweak that marketing campaign ...

Social network analysis becomes more powerful as you bring more history into it -- last year, two years, five years, 10 years worth of interactions -- to get a sense for how people will likely respond likely to new offers, bundles, packages, campaigns, and programs that are thrown at them through social networks.

If you can push not just the analytic models, but to some degree bring transactional applications, such as workflow, into this environment to be triggered by all of the data being developed or being sifted by these models, that is very powerful.

Mulligan: One of the biggest issues that the preexisting data pipeline faces is that the data lives in a repository that's removed from where the analytics take place. Today, with the existing solutions, you need to move terabytes and terabytes of data through the data pipeline to the analytics application, before you can do your analysis.

There's a fundamental issue here. You can't move boulders and boulders of data to an application. It's too slow, it's too cumbersome, and you're not factoring in all your fresh data in your analysis, because of the latency involved.

One of the biggest shifts is that we need to bring the analytics logic close to the data itself. Having it live in a completely different tier, separate from where the data lives, is problematic. This is not a price-performance issue in itself. It is a massive architectural shift that requires bringing analytics logic to the data itself, so that data is collocated with the analytics itself.

MapReduce plays a critical role in this. It is a very powerful technology for advanced analytics and it brings capabilities like parallelization to an application, which then allows for very high-performance scalability.

What we see in the market these days are terms like "in-database analytics," "applications inside data," and all this is really talking about the same thing. It's the notion of bringing analytics logic to the data itself.

... In the marriage of SQL with MapReduce, the real intent is to bring the power of MapReduce to the enterprise, so that SQL programmers can now use that technology. MapReduce alone does require some sophistication in terms of programming skills to be able to utilize it. You may typically find that skill set in Web 2.0 companies, but often you don’t find developers who can work with that in the enterprise.

What you do find in enterprise organizations is that there are people who are very proficient at SQL. By bringing SQL together with MapReduce what enterprise organizations have is the familiarity of SQL and the ease of using SQL, but with the power of MapReduce analytics underneath that. So, it’s really letting SQL programmers leverage skills they already have, but to be able to use MapReduce for analytics.

... One of the biggest requirements in order to be able to do very advanced analytics on terabyte- and petabyte-level data sets, is to bring the application logic to the data itself. Earlier, I described why you need to do this. You want to eliminate as much data movement as possible, and you want to be able to do this analysis in as near real-time as possible.

What we did in Aster Data 4.0 is just that. We're allowing companies to push their analytics applications inside of Aster’s MPP database, where now you can run your application logic next to the data itself, so they are both collocated in the same system. By doing so, you've eliminated all the data movement. What that gives you is very, very quick and efficient access to data, which is what's required in some of these advanced analytics application examples we talked about.

Pushing the code

What kind of applications can you push down into the system? It can be any app written in Java, C, C++, Perl, Python, .NET. It could be an existing custom application that an organization has written and that they need to be able to scale to work on much larger data sets. That code can be pushed down into the apps database.

It could be a new application that a customer is looking to write to do a level of analysis that they could not do before, like real-time fraud analytics, or very deep customer behavior analysis. If you're trying to deliver these new generations of advanced analytics apps, you would write that application in the programming language of your choice.

Kobielus: In this coming decade, we're going to see predictive logic deployed into all application environments, be they databases, clouds, distributed file systems, CEP environments, business process management (BPM) systems, and the like. Open frameworks will be used and developed under more of a service-oriented architecture (SOA) umbrella, to enable predictive logic that’s built in any tool to be deployed eventually into any production, transaction, or analytic environment

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.

Microservices Articles
Modern software design has fundamentally changed how we manage applications, causing many to turn to containers as the new virtual machine for resource management. As container adoption grows beyond stateless applications to stateful workloads, the need for persistent storage is foundational - something customers routinely cite as a top pain point. In his session at @DevOpsSummit at 21st Cloud Expo, Bill Borsari, Head of Systems Engineering at Datera, explored how organizations can reap the bene...
"NetApp's vision is how we help organizations manage data - delivering the right data in the right place, in the right time, to the people who need it, and doing it agnostic to what the platform is," explained Josh Atwell, Developer Advocate for NetApp, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
The Jevons Paradox suggests that when technological advances increase efficiency of a resource, it results in an overall increase in consumption. Writing on the increased use of coal as a result of technological improvements, 19th-century economist William Stanley Jevons found that these improvements led to the development of new ways to utilize coal. In his session at 19th Cloud Expo, Mark Thiele, Chief Strategy Officer for Apcera, compared the Jevons Paradox to modern-day enterprise IT, examin...
In his session at 20th Cloud Expo, Mike Johnston, an infrastructure engineer at Supergiant.io, discussed how to use Kubernetes to set up a SaaS infrastructure for your business. Mike Johnston is an infrastructure engineer at Supergiant.io with over 12 years of experience designing, deploying, and maintaining server and workstation infrastructure at all scales. He has experience with brick and mortar data centers as well as cloud providers like Digital Ocean, Amazon Web Services, and Rackspace. H...
Skeuomorphism usually means retaining existing design cues in something new that doesn’t actually need them. However, the concept of skeuomorphism can be thought of as relating more broadly to applying existing patterns to new technologies that, in fact, cry out for new approaches. In his session at DevOps Summit, Gordon Haff, Senior Cloud Strategy Marketing and Evangelism Manager at Red Hat, will discuss why containers should be paired with new architectural practices such as microservices ra...
In his session at 20th Cloud Expo, Scott Davis, CTO of Embotics, discussed how automation can provide the dynamic management required to cost-effectively deliver microservices and container solutions at scale. He also discussed how flexible automation is the key to effectively bridging and seamlessly coordinating both IT and developer needs for component orchestration across disparate clouds – an increasingly important requirement at today’s multi-cloud enterprise.
The Software Defined Data Center (SDDC), which enables organizations to seamlessly run in a hybrid cloud model (public + private cloud), is here to stay. IDC estimates that the software-defined networking market will be valued at $3.7 billion by 2016. Security is a key component and benefit of the SDDC, and offers an opportunity to build security 'from the ground up' and weave it into the environment from day one. In his session at 16th Cloud Expo, Reuven Harrison, CTO and Co-Founder of Tufin, ...
DevOps is often described as a combination of technology and culture. Without both, DevOps isn't complete. However, applying the culture to outdated technology is a recipe for disaster; as response times grow and connections between teams are delayed by technology, the culture will die. A Nutanix Enterprise Cloud has many benefits that provide the needed base for a true DevOps paradigm. In their Day 3 Keynote at 20th Cloud Expo, Chris Brown, a Solutions Marketing Manager at Nutanix, and Mark Lav...
Many organizations are now looking to DevOps maturity models to gauge their DevOps adoption and compare their maturity to their peers. However, as enterprise organizations rush to adopt DevOps, moving past experimentation to embrace it at scale, they are in danger of falling into the trap that they have fallen into time and time again. Unfortunately, we've seen this movie before, and we know how it ends: badly.
TCP (Transmission Control Protocol) is a common and reliable transmission protocol on the Internet. TCP was introduced in the 70s by Stanford University for US Defense to establish connectivity between distributed systems to maintain a backup of defense information. At the time, TCP was introduced to communicate amongst a selected set of devices for a smaller dataset over shorter distances. As the Internet evolved, however, the number of applications and users, and the types of data accessed and...