Welcome!

Microservices Expo Authors: Ruxit Blog, Pat Romanski, Elizabeth White, Liz McMillan, Lori MacVittie

Related Topics: Containers Expo Blog, Microservices Expo, @CloudExpo

Containers Expo Blog: Article

Data Mining and Data Virtualization

Extending Data Virtualization Platforms

Data Mining helps organizations to discover new insights from existing data, so that predictive techniques can be applied towards various business needs. The following are the typical characteristics of data mining.

  • Extends Business Intelligence, beyond Query, Reporting and OLAP (Online Analytical Processing)
  • Data Mining is cornerstone for assessing the customer risk, market segmentation and prediction
  • Data Mining is about performing computationally complex analysis techniques on very large volumes of data
  • It combines the analysis of historical data with modeling techniques towards future predictions, it turns Operations into performance

The following are the use cases that can benefit from the application of data mining:

  • Manufacturing / Product Development: Understanding the defect and customer complaints into a model that can provide insight into customer satisfaction and help enterprises build better products
  • Consumer Payments: Understand the payment patterns of consumers to predict market penetration analysis and discount guidelines.
  • Consumer Industry: Customer segmentation to understand the customer base and help targeted advertisements and promotions.
  • Consumer Industry: Campaign effectiveness can be gauged with customer segmentation coupled with predictive marketing models.
  • Retail Indsutry: Supply chain efficiencies can be brought by mining the supply demand data

‘In Database' Data Mining
Data Mining is typically a multi-step process.

  1. Define the Business Issue to Be Addressed, e.g., Customer Attrition, Fraud Detection, Cross Selling.
  2. Identify the Data Model / Define the Data / Source the Data.(Data Sources, Data Types, Data Usage etc.)
  3. Choose the Mining Technique (Discovery Data Mining, Predictive Data Mining, Clustering, Link Analysis, Classification, Value Prediction)
  4. Interpret the Results (Visualization Techniques)
  5. Deploy the Results (CRM Systems.)

Initially Data Mining has been implemented with a combination of multiple tools and systems, which resulted in latency and a long cycle for realization of results.

Sensing this issue, major RDBMS vendors have implemented Data Mining as part of their core database offering. This offering has the following key features:

  • Data Mining engine resides inside the traditional database environment facilitating easier licensing and packaging options
  • Eliminates the data extraction and data movement and avoids costly ETL process
  • Major Data Mining models are available as pre-built SQL functions which can be easily integrated into the existing database development process.

The following is some of the information about data mining features as part of the popular databases:

Built as DB2 data mining functions, the Modeling and Scoring services directly integrate data mining technology into DB2. This leads to faster application performance. Developers want integration and performance, as well as any facility to make their job easier. The model can be used within any SQL statement. This means the scoring function can be invoked with ease from any application that is SQL aware, either in batch, real time, or as a trigger.

Oracle Data Mining, a component of the Oracle Advanced Analytics Option, delivers a wide range of cutting edge machine learning algorithms inside the Oracle Database. Since Oracle Data Mining functions reside natively in the Oracle Database kernel, they deliver unparallel performance, scalability and security. The data and data mining functions never leave the database to deliver a comprehensive in-database processing solution.

Data Virtualization: Data Virtualization is the new concept that allows , enterprises to access their information contained in disparate data sources in a seamless way. As mentioned in my earlier articles there are specialized Data virtualization platforms from vendors like, Composite Software, Denodo Technologies, IBM, Informatica, Microsoft have developed specialized data virtualization engines. My earlier article details out Data Virtualization using Middleware Vs RDBMS.

Data virtualization solutions provide a virtualized data services layer that integrates data from heterogeneous data sources and content in real time, near-real time, or batch as needed to support a wide range of applications and processes. : The Forrester Wave: Data Virtualization, Q1 2012 puts the data virtualization in the following perspective, in the past 24 months, we have seen a significant increase in adoption in the healthcare, insurance, retail, manufacturing, eCommerce, and media/entertainment sectors. Regardless of industry, all firms can benefit from data virtualization.

Data Mining Inside Data Virtualization Platforms?
The increase in data sources, especially integration with Big Data and Unstructured data made Data Virtualization platform a important part of enterprise data access strategy. Data virtualization provides the following attributes for efficient data access across enterprise.

  • Abstraction: Provides location, API, language and storage technology independent access of data
  • Federation: Converges data from multiple disparate data sources
  • Transformation: Enriches the quality and quantity of data on a need basis
  • On-Demand Delivery: Provides the consuming applications the required information on-demand

With the above benefits of the Data Virtualization Platform in mind, it is evident that enterprises will find it more useful if Data Virtualization platforms are built with Data Mining Models and Algorithms, so that effective Data Mining can be performed on top of Data Virtualization platform.

As the important part of Data Mining is about identifying the correct data sources and associated events of interest, effective Data Mining can be built if disparate data sources are brought under the scope of Data Virtualization Platform rather than putting the Data Mining inside a single database engine.

The following extended view of Data Virtualization Platform signifies how Data Mining can be part of Data Virtualization Platform.

Summary
Data Virtualization is becoming part of the mainstream enterprise data access strategy, mainly because it abstracts the multiple data sources and avoids complex ETL processing and facilitates the single version of truth, data quality and zero latency enterprise.

If value adds like a Data Mining engine can be built on top of the existing Data Virtualization platform, the enterprises will benefit further.

More Stories By Srinivasan Sundara Rajan

Highly passionate about utilizing Digital Technologies to enable next generation enterprise. Believes in enterprise transformation through the Natives (Cloud Native & Mobile Native).

@MicroservicesExpo Stories
This digest provides an overview of good resources that are well worth reading. We’ll be updating this page as new content becomes available, so I suggest you bookmark it. Also, expect more digests to come on different topics that make all of our IT-hearts go boom!
Keeping pace with advancements in software delivery processes and tooling is taxing even for the most proficient organizations. Point tools, platforms, open source and the increasing adoption of private and public cloud services requires strong engineering rigor – all in the face of developer demands to use the tools of choice. As Agile has settled in as a mainstream practice, now DevOps has emerged as the next wave to improve software delivery speed and output. To make DevOps work, organization...
SYS-CON Events announced today that Isomorphic Software will exhibit at DevOps Summit at 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Isomorphic Software provides the SmartClient HTML5/AJAX platform, the most advanced technology for building rich, cutting-edge enterprise web applications for desktop and mobile. SmartClient combines the productivity and performance of traditional desktop software with the simp...
Internet of @ThingsExpo, taking place November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with the 19th International Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world and ThingsExpo Silicon Valley Call for Papers is now open.
In his session at @DevOpsSummit at 19th Cloud Expo, Yoseph Reuveni, Director of Software Engineering at Jet.com, will discuss Jet.com's journey into containerizing Microsoft-based technologies like C# and F# into Docker. He will talk about lessons learned and challenges faced, the Mono framework tryout and how they deployed everything into Azure cloud. Yoseph Reuveni is a technology leader with unique experience developing and running high throughput (over 1M tps) distributed systems with extre...
Sharding has become a popular means of achieving scalability in application architectures in which read/write data separation is not only possible, but desirable to achieve new heights of concurrency. The premise is that by splitting up read and write duties, it is possible to get better overall performance at the cost of a slight delay in consistency. That is, it takes a bit of time to replicate changes initiated by a "write" to the read-only master database. It's eventually consistent, and it'...
There's a lot of things we do to improve the performance of web and mobile applications. We use caching. We use compression. We offload security (SSL and TLS) to a proxy with greater compute capacity. We apply image optimization and minification to content. We do all that because performance is king. Failure to perform can be, for many businesses, equivalent to an outage with increased abandonment rates and angry customers taking to the Internet to express their extreme displeasure.
Right off the bat, Newman advises that we should "think of microservices as a specific approach for SOA in the same way that XP or Scrum are specific approaches for Agile Software development". These analogies are very interesting because my expectation was that microservices is a pattern. So I might infer that microservices is a set of process techniques as opposed to an architectural approach. Yet in the book, Newman clearly includes some elements of concept model and architecture as well as p...
No matter how well-built your applications are, countless issues can cause performance problems, putting the platforms they are running on under scrutiny. If you've moved to Node.js to power your applications, you may be at risk of these issues calling your choice into question. How do you identify vulnerabilities and mitigate risk to take the focus off troubleshooting the technology and back where it belongs, on innovation? There is no doubt that Node.js is one of today's leading platforms of ...
SYS-CON Events announced today that LeaseWeb USA, a cloud Infrastructure-as-a-Service (IaaS) provider, will exhibit at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. LeaseWeb is one of the world's largest hosting brands. The company helps customers define, develop and deploy IT infrastructure tailored to their exact business needs, by combining various kinds cloud solutions.
Adding public cloud resources to an existing application can be a daunting process. The tools that you currently use to manage the software and hardware outside the cloud aren’t always the best tools to efficiently grow into the cloud. All of the major configuration management tools have cloud orchestration plugins that can be leveraged, but there are also cloud-native tools that can dramatically improve the efficiency of managing your application lifecycle. In his session at 18th Cloud Expo, ...
Ovum, a leading technology analyst firm, has published an in-depth report, Ovum Decision Matrix: Selecting a DevOps Release Management Solution, 2016–17. The report focuses on the automation aspects of DevOps, Release Management and compares solutions from the leading vendors.
SYS-CON Events announced today that Venafi, the Immune System for the Internet™ and the leading provider of Next Generation Trust Protection, will exhibit at @DevOpsSummit at 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Venafi is the Immune System for the Internet™ that protects the foundation of all cybersecurity – cryptographic keys and digital certificates – so they can’t be misused by bad guys in attacks...

Let's just nip the conflation of these terms in the bud, shall we?

"MIcro" is big these days. Both microservices and microsegmentation are having and will continue to have an impact on data center architecture, but not necessarily for the same reasons. There's a growing trend in which folks - particularly those with a network background - conflate the two and use them to mean the same thing.

They are not.

One is about the application. The other, the network. T...

This is a no-hype, pragmatic post about why I think you should consider architecting your next project the way SOA and/or microservices suggest. No matter if it’s a greenfield approach or if you’re in dire need of refactoring. Please note: considering still keeps open the option of not taking that approach. After reading this, you will have a better idea about whether building multiple small components instead of a single, large component makes sense for your project. This post assumes that you...
DevOps at Cloud Expo – being held November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA – announces that its Call for Papers is open. Born out of proven success in agile development, cloud computing, and process automation, DevOps is a macro trend you cannot afford to miss. From showcase success stories from early adopters and web-scale businesses, DevOps is expanding to organizations of all sizes, including the world's largest enterprises – and delivering real results. Am...
The 19th International Cloud Expo has announced that its Call for Papers is open. Cloud Expo, to be held November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA, brings together Cloud Computing, Big Data, Internet of Things, DevOps, Digital Transformation, Microservices and WebRTC to one location. With cloud computing driving a higher percentage of enterprise IT budgets every year, it becomes increasingly important to plant your flag in this fast-expanding business opportuni...
If you are within a stones throw of the DevOps marketplace you have undoubtably noticed the growing trend in Microservices. Whether you have been staying up to date with the latest articles and blogs or you just read the definition for the first time, these 5 Microservices Resources You Need In Your Life will guide you through the ins and outs of Microservices in today’s world.
Before becoming a developer, I was in the high school band. I played several brass instruments - including French horn and cornet - as well as keyboards in the jazz stage band. A musician and a nerd, what can I say? I even dabbled in writing music for the band. Okay, mostly I wrote arrangements of pop music, so the band could keep the crowd entertained during Friday night football games. What struck me then was that, to write parts for all the instruments - brass, woodwind, percussion, even k...
Node.js and io.js are increasingly being used to run JavaScript on the server side for many types of applications, such as websites, real-time messaging and controllers for small devices with limited resources. For DevOps it is crucial to monitor the whole application stack and Node.js is rapidly becoming an important part of the stack in many organizations. Sematext has historically had a strong support for monitoring big data applications such as Elastic (aka Elasticsearch), Cassandra, Solr, S...