Welcome!

SOA & WOA Authors: Hari Gottipati, Maureen O'Gara, Elizabeth White, Pat Romanski, Liz McMillan

Related Topics: SOA & WOA

SOA & WOA: Article

Introduction to Complex Event Processing & Data Streams

OMG's Data Distribution Service data streams when you need high performance from complex event processing systems

With the evolution of distributed IT systems and the advent of Web Services, applications can now make more informed decisions by using real-time information from third-party sources. For example, today's automated trading applications can make over 30 trades a second by analyzing stock trends, market movements, news, and events that may only be relevant for a fraction of a second. A trading algorithm may infer a negative stock trend line for the next tenth of a second and may short the stock for that bit of time.

Such applications require Complex Event Processing (CEP) engines. These engines can detect patterns of activity from multiple data streams and infer events continuously. Many critical CEP use cases require peak performance from both the event engine and the data streams. In the example above, the trading algorithm can't profit from making buys and sells in a tenth of the second if the event engine and the data stream latency exceed the operable time window.

This article will survey the suitability of OMG's Data Distribution Service data streams in use cases that demand high performance from complex event processing systems.

What Is Complex Event Processing?
Consider the following example: The Securities and Exchange Commission has different margin and reserve requirements for traders classified as "pattern day traders." The term "pattern day trader" means any customer who executes four or more day trades in the same stock in five business days. However, if the number of day trades is 6% or less of his total trades for the five-business-day period, the customer won't be considered a pattern day trader and the special margin and reserve requirements won't apply.

With Complex Event Processing, we can detect a "pattern day trader" in real-time as follows. Each time a trader makes a transaction, the system posts an executedTradeEvent <traderId, stockId> event. The CEP engine maintains a window of five days and searches for cases where the count (count_n) of executedTradeEvents for a given trader and a stock exceeds four. Detecting such a pattern it counts the total number of trades done in the last five days. If count_n is more than 6% of all trades in the last five days then it posts a detectedDayTrader event.

Based on this example, we can describe many key characteristics of Complex Event Processing:

•  Events are inferred. In the example, the trading application doesn't and can't send a detectedDayTrader event. This event has to be inferred by processing other events like executedTradeEvent, maintaining a count of events satisfying a query condition over a given period of time and integrating with non-streaming content like the number of total trades stored in a relational database.

•  The system correlated the data from multiple sources to infer the event. In the example, it included explicit sources like the database that stored the total number of trades for a customer and implicit sources like time.

•  The system wouldn't have sent a detectedDayTrader if there wasn't a set of executedTradeEvents preceding it. The executedTradeEvent plus some other criteria caused the detectedDayTrader event to occur.

CEP engines manage event-driven information systems by employing techniques such as detecting complex patterns, building correlations, and relationships such as causality and timing between many events.

From a black box view, a CEP engine takes as input a set of input streams like RTI, database files, or JMS. Most CEP engines use an SQL-like programming language such as the Continuous Computation Language (CCL) with extensions for event processing.

While discussing the entire semantics of CCL or CEP is beyond the scope of this article, here's a simple example of how application developers can infer a weather event using the CCL programming language.

In this example, the query searches for events from the input data stream WindIn in which the wind speed changes by more than five miles an hour in two seconds. The events matching the pattern are then inserted into an output data stream WindPatternOut:

INSERT INTO WindPatternOut (Location, Speed1, Speed2)
SELECT W1.Location, W1.WindSpeed, W2.WindSpeed
FROM WindIn W1, WindIn W2
MATCHING [2 SECONDS: W1 && W2]
ON W1.Location = W2.Location
WHERE (W1.WindSpeed - W2.WindSpeed) >= 5;

As you can see from this example, CCL is loosely based on SQL semantics with extensions (such as matching patterns, viewing samples in a given time window) for complex events. The input and the output data streams are logically modeled as database tables, regardless of the underlying messaging protocol.

Why Use Complex Event Processing?
From an oversimplified view, applications have implemented functionality for inferring events from existing data for a very long time. Credit and fraud-risk applications, for example, have existed for decades without an explicit design and implementation for CEP. However, the data avalanche produced by edge devices like Radio Frequency Identification (RFID) readers and sensors is rapidly changing design-detection algorithms and the need for a configurable and flexible way to detect patterns is becoming more vital.
Here are some situations in which software architects might consider incorporating CEP into their application technology stack:

•  Only the "processed" data is useful: In applications where edge devices such as sensors or RFID readers connect to the enterprise, all the raw data samples aren't of equal interest to the enterprise business process. The data might need to be cleansed, validated, and enriched before it's useful. In the wind example, the application isn't interested in each sensor read of the wind speed. The application is only interested when the wind speed changes by more than five miles an hour in two seconds, possibly inferring a hurricane or tornado.

•  Software development cycles can't keep up with the changes in algorithms for detecting patterns: In trading applications, patterns for detecting buy and sell events may only be competitive for a few months, or even a few weeks. In some cases, new patterns are discovered, implemented, and deployed in a day. In such cases, it's necessary to parameterize and abstract out the pattern detection layer of the application. Having the trading algorithm embedded in the application code is not a good design practice.

•  Event processing has to be done in real-time: Not all event processing requires a CEP engine. Data warehouse applications analyze trends by correlating multiple dimensions of the data in an "offline" manner (for example, send Home Depot discount promotions to all residents who have moved to zip code 95138 in the last two months). However, in many applications, events have to be inferred in real-time and the applications simply can't wait for the raw data to be persisted and analyzed. For example, radar tracking applications must process events in real-time. A trading desk application seeks a competitive advantage by analyzing an event microseconds ahead of its competition. Traditional event-detection applications require that the data is persisted first and then correlated. This methodology is too slow for applications such as trading desks where the data has to be analyzed for event patterns in real-time.

•  Event processing has to scale: The media is now heralding the arrival of truly ubiquitous computing, where tiny microprocessors in our ambient surroundings communicate to deliver a more intelligent service. For example, in healthcare monitoring, pressure sensors in the shoes of patients at risk of a heart attack can monitor any change in their pattern of walking and alert the healthcare provider, who can immediately schedule a check-up. Enterprise applications that integrate with the edge have to cope with a large number of sensors simultaneously sending high rates of data. Traditional applications won't be able to handle this impedance mismatch of high data rate and volume. Using a CEP engine designed for performance, one can take steps to manage the load. Selecting the right messaging protocol for the data stream is the other step.


More Stories By Supreet Oberoi

Supreet Oberoi is vice president, engineering for RTI, and he brings over a decade of experience in building and deploying Web-based enterprise applications. He was a founding member and director of engineering for Trading Dynamics, which was acquired by Ariba in Nov 1999. Later, he led the engineering organization for the Mohr-Davidow (MDV) funded startup, oneREV Inc, that was acquired by Agile Software in 2002. Most recently, Supreet served as Director of Engineering at Agile Software. Supreet received his BS in computer sciences with Highest Honors from the University of Texas at Austin, and an MS in computer sciences from Stanford University.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.