Welcome!

Microservices Expo Authors: Elizabeth White, Liz McMillan, Yeshim Deniz, Pat Romanski, Zakia Bouachraoui

Related Topics: Microservices Expo, @CloudExpo

Microservices Expo: Blog Feed Post

Building a Back Testing Platform for Algorithmic Trading

Let’s first examine what market data looks like

On this continuing series, I am examining thoughts and specific implementation details around building a back-testing platform for algo trading.  Eventually, we’ll see where complex event processing plays and how to implement it.

Appendix to Part One – The Data Format

Rather than looking at various database solutions first and then trying to define the problem in terms of those solutions, let’s first examine what market data looks like.  In its most simple form, market data looks like this (there’s usually a little more, but this is fine for our purposes):

  • Date: The date of the market data,
  • Time: When did the quote occur during the date,
  • Sequence #: Most quote or trade streams include a sequence #,
  • Symbol: What security is this data for?
  • Best Bid: The best bid (we’re going to concern ourselves with BBO data for this series, it’s easier),
  • Best Bid Size: How much does someone want to buy at the Best Bid,
  • Best Offer: The best offer,
  • Best Offer Size: How much does someone want to sell at the Best Offer.

Consider this chart:

Market Data


If we break down data, we can successively see how data might be arranged on disk for subsequent reading.  We want to read the data very quickly.  If we were using a standard relational database, it’s easy to see that we might be replicating some unnecessary data during the reads.  And we if use a typical columnar database, we can see that there are chunks that could be read together increasing throughput.

For example, for any given millisecond (Time) in a quote feed, there may be more than one symbol with a quote.  In fact, that’s quite common.  So replicating the time stamp is superfluous.  So if we had a table for a date’s worth of data, then we’d have a Time column that was replicated throughout the table.  No reason to do that.

Looking again at the data, we can see that, for a given time, there might be multiple quotes available for multiple symbols.  We’d like to read those in order as a little group.  By organizing the data on disk as a flattened multi-dimensional map of maps, we would:

  1. Start with a given day (our table),
  2. Start with a time (our row),
  3. Read each quote in sequence # order (our column)
  4. Process (do something)
  5. Increment the time, and go to #2 above until we run out of data (lather, rise, repeat)
  6. Put the $ in the bank

If we could write this data structure to disk as we get it from the quote feed, and had fast enough disk, we could keep up with the feed.  If we needed to create some indexes on the data, we could easily do that as well.  We’d simply create another table that would hold an inverted list of time and sequence #’s by symbol.  If we want to process a day’s worth of data, we’re all set.  If we want to process a symbol, or group of symbols, we’re all set.

So, to summarize, we need a hybrid approach.  In some places, we want rows of data – storing columns of data via a unique key.  In our case, that’s the Time column above for a given day.  The row above is Time, the column (or Super Column) is the Quote for a Symbol.  The Super Column’s key is the Sequence #.  Can anyone guess which database might fit nicely for this use case?

In my next post, I’ll describe a formalized data structure and it’s implementation.  I might even include a little code for all you #NoSQL guys and gals out there.

Thanks for reading!

Read the original blog entry...

More Stories By Colin Clark

Colin Clark is the CTO for Cloud Event Processing, Inc. and is widely regarded as a thought leader and pioneer in both Complex Event Processing and its application within Capital Markets.

Follow Colin on Twitter at http:\\twitter.com\EventCloudPro to learn more about cloud based event processing using map/reduce, complex event processing, and event driven pattern matching agents. You can also send topic suggestions or questions to [email protected]

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


Microservices Articles
SYS-CON Events announced today that DatacenterDynamics has been named “Media Sponsor” of SYS-CON's 18th International Cloud Expo, which will take place on June 7–9, 2016, at the Javits Center in New York City, NY. DatacenterDynamics is a brand of DCD Group, a global B2B media and publishing company that develops products to help senior professionals in the world's most ICT dependent organizations make risk-based infrastructure and capacity decisions.
Most DevOps journeys involve several phases of maturity. Research shows that the inflection point where organizations begin to see maximum value is when they implement tight integration deploying their code to their infrastructure. Success at this level is the last barrier to at-will deployment. Storage, for instance, is more capable than where we read and write data. In his session at @DevOpsSummit at 20th Cloud Expo, Josh Atwell, a Developer Advocate for NetApp, will discuss the role and value...
DevOpsSummit New York 2018, colocated with CloudEXPO | DXWorldEXPO New York 2018 will be held November 11-13, 2018, in New York City. Digital Transformation (DX) is a major focus with the introduction of DXWorldEXPO within the program. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive over the long term.
CloudEXPO New York 2018, colocated with DXWorldEXPO New York 2018 will be held November 11-13, 2018, in New York City and will bring together Cloud Computing, FinTech and Blockchain, Digital Transformation, Big Data, Internet of Things, DevOps, AI, Machine Learning and WebRTC to one location.
Consumer-driven contracts are an essential part of a mature microservice testing portfolio enabling independent service deployments. In this presentation we'll provide an overview of the tools, patterns and pain points we've seen when implementing contract testing in large development organizations.
Adding public cloud resources to an existing application can be a daunting process. The tools that you currently use to manage the software and hardware outside the cloud aren’t always the best tools to efficiently grow into the cloud. All of the major configuration management tools have cloud orchestration plugins that can be leveraged, but there are also cloud-native tools that can dramatically improve the efficiency of managing your application lifecycle. In his session at 18th Cloud Expo, ...
Using new techniques of information modeling, indexing, and processing, new cloud-based systems can support cloud-based workloads previously not possible for high-throughput insurance, banking, and case-based applications. In his session at 18th Cloud Expo, John Newton, CTO, Founder and Chairman of Alfresco, described how to scale cloud-based content management repositories to store, manage, and retrieve billions of documents and related information with fast and linear scalability. He addresse...
Containers and Kubernetes allow for code portability across on-premise VMs, bare metal, or multiple cloud provider environments. Yet, despite this portability promise, developers may include configuration and application definitions that constrain or even eliminate application portability. In this session we'll describe best practices for "configuration as code" in a Kubernetes environment. We will demonstrate how a properly constructed containerized app can be deployed to both Amazon and Azure ...
Enterprise architects are increasingly adopting multi-cloud strategies as they seek to utilize existing data center assets, leverage the advantages of cloud computing and avoid cloud vendor lock-in. This requires a globally aware traffic management strategy that can monitor infrastructure health across data centers and end-user experience globally, while responding to control changes and system specification at the speed of today’s DevOps teams. In his session at 20th Cloud Expo, Josh Gray, Chie...
Discussions of cloud computing have evolved in recent years from a focus on specific types of cloud, to a world of hybrid cloud, and to a world dominated by the APIs that make today's multi-cloud environments and hybrid clouds possible. In this Power Panel at 17th Cloud Expo, moderated by Conference Chair Roger Strukhoff, panelists addressed the importance of customers being able to use the specific technologies they need, through environments and ecosystems that expose their APIs to make true ...