Click here to close now.

Welcome!

Microservices Journal Authors: Pat Romanski, Elizabeth White, Liz McMillan, Blue Box Blog, AppDynamics Blog

Related Topics: Cloud Expo, Java, Microservices Journal, Open Source, Web 2.0, Apache

Cloud Expo: Article

The Cure for the Common Cloud-Based Big Data Initiative

Understanding how to work with Big Data

There is no doubt that Big Data holds infinite promise for a range of industries. Better visibility into data across various sources enables everything from insight into saving electricity to agricultural yield to placement of ads on Google. But when it comes to deriving value from data, no industry has been doing it as long or with as much rigor as clinical researchers.

Unlike other markets that are delving into Big Data for the first time and don't know where to begin, drug and device developers have spent years refining complex processes for asking very specific questions with clear purposes and goals. Whether using data for designing an effective and safe treatment for cholesterol, or collecting and mining data to understand proper dosage of cancer drugs, life sciences has had to dot every "i" and cross every "t" in order to keep people safe and for new therapies to pass muster with the FDA. Other industries are now marveling at a new ability to uncover information about efficiencies and cost savings, but - with less than rigorous processes in place - they are often shooting in the dark or only scratching the surface of what Big Data offers.

Drug developers today are standing on the shoulders of those who created, tested and secured FDA approval for treatments involving millions of data points (for one drug alone!) without the luxury of the cloud or sophisticated analytics systems. These systems have the potential to make the best data-driven industry even better. This article will outline key lessons and real-world examples of what other industries can and should learn from life sciences when it comes to understanding how to work with Big Data.

What Questions to Ask, What Data to Collect
In order to gain valuable insights from Big Data, there are two absolute requirements that must be met - understanding both what questions to ask and what data to collect. These two components are symbiotic, and understanding both fully is difficult, requiring both domain expertise and practical experience.

In order to know what data to collect, you first must know the types of questions that you're going to want to ask - often an enigma. With the appropriate planning and experience-based guesses, you can often make educated assumptions. The trick to collecting data is that you need to collect enough to answer questions, but if you collect too much then you may not be able to distill the specific subset that will answer your questions. Also, explicit or inherent cost can prevent you from collecting all possible data, in which case you need to carefully select which areas to collect data about.

Let's take a look at how this is done in clinical trials. Say you're designing a clinical study that will analyze cancer data. You may not have specific questions when the study is being designed, but it's reasonable to assume that you'll want to collect data related to commonly impacted readings for the type of cancer and whatever body system is affected, so that you have the right information to analyze when it comes time.

You may also want to collect data unrelated to the specific disease that subsequent questions will likely require, such as information on demographics and medications that the patient is taking that are different from the treatment. During the post-study data analysis, questions on these areas often arise, even though the questions aren't initially apparent. Thus clinical researchers have adopted common processes for collecting data on demographics and concomitant medications. Through planning and experience, you can also identify areas that do not need to be collected for each study. For example, if you're studying lung cancer, collecting cognitive function data is probably unrelated.

How can other industries anticipate what questions to ask, as is done in life sciences? Well, determine a predefined set of questions that are directly related to the goal of the data analysis. Since you will not know all of the questions until after the data collection have started, it's important to 1) know the domain, and 2) collect any data you'll need to answer the likely questions that could come up.

Also, clinical researchers have learned that questions can be discovered automatically. There are data mining techniques that can uncover statistically significant connections, which in effect are raising questions that can be explored in more detail afterwards. An analysis can be planned before data is collected, but not actually be run until afterwards (or potentially during), if the appropriate data is collected.

One other area that has proven to be extremely important to collect is metadata, or data about the data - such as, when it was collected, where it was collected, what instrumentation was used in the process and what calibration information was available. All of this information can be utilized later on to answer a lot of potentially important questions. Maybe there was a specific instrument that was incorrectly configured and all the resulting data that it recorded is invalid. If you're running an ad network, maybe there's a specific web site where your ads are run that are gaming the system trying to get you to pay more. If you're running a minor league team, maybe there's a specific referee that's biased, which you can address for subsequent games. Or, if you're plotting oil reserves in the Gulf of Mexico, maybe there are certain exploratory vessels that are taking advantage of you. In all of these cases, without the appropriate metadata, it'd be impossible to know where real problems reside.

Identifying Touch Points to Be Reviewed Along the Way
There are ways to specify which types of analysis can be performed, even while data is being collected, that can affect either how data will continue to be collected or the outcome as a whole.

For example, some clinical studies run what's called interim analysis while the study is in progress. These interim analyses are planned, and the various courses that can be used afterwards are well defined, but the results afterward are statistically usable. This is called an adaptive clinical trial, and there are a lot of studies that are being performed to determine more effective and useful ways that these can be done in the future. The most important aspect of these is preventing biases, and this is something that has been well understood and tested by the pharmaceutical community over the past several decades. Simply understanding what's happening during the course of a trial, or how it affects the desired outcome, can actually bias the results.

The other key factor is that the touch points are accessible to everybody who needs the data. For example, if you have a person in the field, then it's important to have him or her access the data in a format that's easily consumable to them - maybe through an iPad or an existing intranet portal. Similarly, if you have an executive that needs to understand something at a high level, then getting it to them in an easily consumable executive dashboard is extremely important.

As the life sciences industry has learned, if the distribution channels of the analytics aren't seamless and frictionless, then they won't be utilized to their fullest extent. This is where cloud-based analytics become exceptionally powerful - the cloud makes it much easier to integrate analytics into every user's day. Once each user gets the exact information they need, effortlessly, they can then do their job better and the entire organization will work better - regardless of how and why the tools are being used.

Augmenting Human Intuition
Think about the different types of tools that people use on a daily basis. People use wrenches to help turn screws, cars to get to places faster and word processers to write. Sure, we can use our hands or walk, but we're much more efficient and better when we can use tools.

Cloud-based analytics is a tool that enables everybody in an organization to perform more efficiently and effectively. The first example of this type of augmentation in the life sciences industry is alerting. A user tells the computer what they want to see, and then the computer alerts them via email or text message when the situation arises. Users can set rules for the data it wants to see, and then the tools keep on the lookout to notify the user when the data they are looking for becomes available.

Another area the pharmaceutical industry has thoroughly explored is data-driven collaboration techniques. In the clinical trial process, there are many different groups of users: those who are physically collecting the data (investigators), others who are reviewing it to make sure that it's clean (data managers), and also people who are stuck in the middle (clinical monitors). Of course there are many other types of users, but this is just a subset to illustrate the point. These different groups of users all serve a particular purpose relating to the overall collection of data and success of the study. When the data looks problematic or unclean, the data managers will flag it for review, which the clinical monitors can act on.

What's unique about the way that life sciences deals with this is that they've set up complex systems and rules to make sure that the whole system runs well. The tools associated around these processes help augment human intuition through alerting, automated dissemination and automatic feedback. The questions aren't necessarily known at the beginning of a trial, but as the data is collected, new questions evolve and the tools and processes in place are built to handle the changing landscape.

No matter what the purpose of Big Data analytics, any organization can benefit from the mindset of cloud-based analytics as a tool that needs to consistently be adjusted and refined to meet the needs of users.

Ongoing Challenges of Big Data Analytics
Given this history with data, one would expect that drug and device developers would be light years ahead when it comes to leveraging Big Data technologies - especially given that the collection and analytics of clinical data is often a matter of life and death. But while they have much more experience with data, the truth is that life sciences organizations are just now starting to integrate analytics technologies that will enable them to work with that data in new, more efficient ways - no longer involving billions of dollars a year, countless statisticians, archaic methods, and, if we're being honest, brute force. As new technology becomes available, the industry will continue to become more and more seamless. In the meantime, other industries looking to wrap their heads around the Big Data challenge should look to life sciences as the starting point for best practices in understanding how and when to ask the right questions, monitoring data along the way and selecting tools that improve the user experience.

More Stories By Rick Morrison

Rick Morrison is CEO and co-founder of Comprehend Systems. Prior to Comprehend Systems, he was the Chief Technology Officer of an Internet-based data aggregator, where he was responsible for product development and operations. Prior to that, he was at Integrated Clinical Systems, where he led the design and implementation of several major new features. He also proposed and led a major infrastructure redesign, and introduced new, streamlined development processes. Rick holds a BS in Computer Science from Carnegie Mellon University in Pittsburgh, Pennsylvania.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


@MicroservicesExpo Stories
The 17th International Cloud Expo has announced that its Call for Papers is open. 17th International Cloud Expo, to be held November 3-5, 2015, at the Santa Clara Convention Center in Santa Clara, CA, brings together Cloud Computing, APM, APIs, Microservices, Security, Big Data, Internet of Things, DevOps and WebRTC to one location. With cloud computing driving a higher percentage of enterprise IT budgets every year, it becomes increasingly important to plant your flag in this fast-expanding bu...
As enterprises engage with Big Data technologies to develop applications needed to meet operational demands, new computation fabrics are continually being introduced. To leverage these new innovations, organizations are sacrificing market opportunities to gain expertise in learning new systems. In his session at Big Data Expo, Supreet Oberoi, Vice President of Field Engineering at Concurrent, Inc., discussed how to leverage existing infrastructure and investments and future-proof them against e...
Cloud Expo New York is happening from June 9 - 11. This event brings together the worlds of Cloud Computing, DevOps, IoT, WebRTC, Big Data and SDDC. We hope to see you there-members of the Blue Box team will exhibit in booth 218 next to the DevOps area. Plus, our Chief Product Officer, Hernan Alvarez, will present his talk "The Cloud Has a Down-and-Dirty Lining" as part of the Operations track in the DevOps Summit portion of the event on June 9 at 11 am. Learn more about his session her...
Once the decision has been made to move part or all of a workload to the cloud, a methodology for selecting that workload needs to be established. How do you move to the cloud? What does the discovery, assessment and planning look like? What workloads make sense? Which cloud model makes sense for each workload? What are the considerations for how to select the right cloud model? And how does that fit in with the overall IT transformation?
With major technology companies and startups seriously embracing IoT strategies, now is the perfect time to attend @ThingsExpo in Silicon Valley. Learn what is going on, contribute to the discussions, and ensure that your enterprise is as "IoT-Ready" as it can be! Internet of @ThingsExpo, taking place Nov 3-5, 2015, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 17th Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading in...
When OpenStack aficionados gather in Vancouver in a couple of weeks, one of the hot topics will be containers, a “new” alternative to virtualization. Actually, container technology has been around for a couple of decades, but it is trending among the IT community at a fever pitch these days and stands to have a huge impact on the future of cloud computing.The appeal of container technology is easy to appreciate. In a nutshell, containers can enable you to run many more applications on the same h...
Docker is an open platform for developers and sysadmins of distributed applications that enables them to build, ship, and run any app anywhere. Docker allows applications to run on any platform irrespective of what tools were used to build it making it easy to distribute, test, and run software. I found this 5 Minute Docker video, which is very helpful when you want to get a quick and digestible overview. If you want to learn more, you can go to Docker’s web page and start with this Docker intro...
The 5th International DevOps Summit, co-located with 17th International Cloud Expo – being held November 3-5, 2015, at the Santa Clara Convention Center in Santa Clara, CA – announces that its Call for Papers is open. Born out of proven success in agile development, cloud computing, and process automation, DevOps is a macro trend you cannot afford to miss. From showcase success stories from early adopters and web-scale businesses, DevOps is expanding to organizations of all sizes, including the...
17th Cloud Expo, taking place Nov 3-5, 2015, at the Santa Clara Convention Center in Santa Clara, CA, will feature technical sessions from a rock star conference faculty and the leading industry players in the world. Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud strategy. Meanwhile, 94% of enterprises a...
Over the years, a variety of methodologies have emerged in order to overcome the challenges related to project constraints. The successful use of each methodology seems highly context-dependent. However, communication seems to be the common denominator of the many challenges that project management methodologies intend to resolve. In this respect, Information and Communication Technologies (ICTs) can be viewed as powerful tools for managing projects. Few research papers have focused on the way...
As the world moves from DevOps to NoOps, application deployment to the cloud ought to become a lot simpler. However, applications have been architected with a much tighter coupling than it needs to be which makes deployment in different environments and migration between them harder. The microservices architecture, which is the basis of many new age distributed systems such as OpenStack, Netflix and so on is at the heart of CloudFoundry – a complete developer-oriented Platform as a Service (PaaS...
The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time to wait for long development cycles that produce software that is obsolete at launch. DevOps may be disruptive, but it is essential. The DevOps Summit at Cloud Expo – to be held June 3-5, 2015, at the Javits Center in New York City – will expand the DevOps community, enable a wide...
Enterprises are fast realizing the importance of integrating SaaS/Cloud applications, API and on-premises data and processes, to unleash hidden value. This webinar explores how managers can use a Microservice-centric approach to aggressively tackle the unexpected new integration challenges posed by proliferation of cloud, mobile, social and big data projects. Industry analyst and SOA expert Jason Bloomberg will strip away the hype from microservices, and clearly identify their advantages and d...
Cloud Expo, Inc. has announced today that Andi Mann returns to DevOps Summit 2015 as Conference Chair. The 4th International DevOps Summit will take place on June 9-11, 2015, at the Javits Center in New York City. "DevOps is set to be one of the most profound disruptions to hit IT in decades," said Andi Mann. "It is a natural extension of cloud computing, and I have seen both firsthand and in independent research the fantastic results DevOps delivers. So I am excited to help the great team at ...
There is no question that the cloud is where businesses want to host data. Until recently hypervisor virtualization was the most widely used method in cloud computing. Recently virtual containers have been gaining in popularity, and for good reason. In the debate between virtual machines and containers, the latter have been seen as the new kid on the block – and like other emerging technology have had some initial shortcomings. However, the container space has evolved drastically since coming on...
Container frameworks, such as Docker, provide a variety of benefits, including density of deployment across infrastructure, convenience for application developers to push updates with low operational hand-holding, and a fairly well-defined deployment workflow that can be orchestrated. Container frameworks also enable a DevOps approach to application development by cleanly separating concerns between operations and development teams. But running multi-container, multi-server apps with containers ...
Converging digital disruptions is creating a major sea change - Cisco calls this the Internet of Everything (IoE). IoE is the network connection of People, Process, Data and Things, fueled by Cloud, Mobile, Social, Analytics and Security, and it represents a $19Trillion value-at-stake over the next 10 years. In her keynote at @ThingsExpo, Manjula Talreja, VP of Cisco Consulting Services, will discuss IoE and the enormous opportunities it provides to public and private firms alike. She will shar...
SYS-CON Events announced today that EnterpriseDB (EDB), the leading worldwide provider of enterprise-class Postgres products and database compatibility solutions, will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. EDB is the largest provider of Postgres software and services that provides enterprise-class performance and scalability and the open source freedom to divert budget from more costly traditiona...
How can you compare one technology or tool to its competitors? Usually, there is no objective comparison available. So how do you know which is better? Eclipse or IntelliJ IDEA? Java EE or Spring? C# or Java? All you can usually find is a holy war and biased comparisons on vendor sites. But luckily, sometimes, you can find a fair comparison. How does this come to be? By having it co-authored by the stakeholders. The binary repository comparison matrix is one of those rare resources. It is edite...
With the advent of micro-services, the application design paradigm has undergone a major shift. The days of developing monolithic applications are over. We are bringing in the principles (read SOA) hereto the preserve of applications or system integration space into the application development world. Since the micro-services are consumed within the application, the need of ESB is not there. There is no message transformation or mediations required. But service discovery and load balancing of ...