Click here to close now.

Welcome!

Microservices Journal Authors: Leo Reiter, Liz McMillan, Yeshim Deniz, Pat Romanski, Elizabeth White

Related Topics: Microservices Journal

Microservices Journal: Article

In at the Deep End: Training for High Performance Distributed Systems

Six best practices for training new engineers

Here at Logentries we have a simple philosophy when it comes to hiring: hire the best people we can find and let them jump in at the deep end. That is how we like to learn. Smart people like to go deep and then find out what they don’t know as they work through some real world problems. And, our job is to give them the mentoring and support they need to overcome the blockers quickly and continue the learning process. We ensure they come to us with a great computer science or mathematical background and we take it from there.

Logentries TrainingWhen we thought about designing a training program we built it around the following set of problems:

1. CAP Theorem

Learn to accept that consistency, availability and partition tolerance cannot be provided simultaneously. Decide what is important for your customers and make the best trade-offs you can. But, what you build needs to be understandable. Raft provides a consensus algorithm whose aim is to be understandable, or at least to be easier to comprehend than Paxos or Paxos-influenced algorithms. It’s a good place to start to understand the distributed systems challenge. So we start with the theory then the next step we get practical…

2. Embrace Hardware Failures

Logentries runs is an Amazon EC2-based service – an environment where failures in parts of the system are routine. Netflix has taken this approach to its natural conclusion. If you expect failure and know that when it happens it’s going to happen at the worst possible time; why not plan for it to happen all time. So they built Chaos Monkey, a tool that randomly generates various kinds of failures in your environment. You can run this to schedule so that failures happen when you are closely monitoring things. By combining Raft + Chaos Monkey – you get to understand what happens when things go wrong and how to build fault tolerant systems that can deal with any failure.

3. Embrace the Asynchronous

Blocking calls will kill performance and may impact availability. The slowest system in a chain will cause a backlog on the other nodes. Asynchronous design allows you to avoid the cascade effects that can occur when one node in the system brings down other nodes. It is the best way to build fault tolerant systems. So we start by building a system that relies on synchronous calls, show what happens under extreme load or when failures occur and then evolve this to work asynchronously.

4. Mechanical Sympathy

The LMAX Disruptor team understands that to get every last drop of performance out of your system you sometimes need to know every last detail of the hardware environment you are running on. They call it Mechanical Sympathy. They took what they knew about modern day processor architectures (caches, memory barriers, compare and swap etc.) and turned it into a software solution for high performance inter-thread messaging, which leads to massive performance improvements when used correctly.  So next we show our graduates what happens when you move from queue-based interactions to using Disruptor.

5. Horizontal over Vertical Scaling

Vertical scaling can give you quick wins in performance but when it comes to building a business like Logentries, where subscriber growth outpaces technology, the only way to design systems to scale with your success is to scale them horizontally. Don’t wait for technology to set limits on your growth. Also it’s better to scale on commodity hardware or, in our case, to scale on the EC2 servers that give you the best price-performance rather than rely on the latest premium-priced bells and whistles. When it comes to practical side of scaling we focus on splitting out the load but we plan for multiple axes of scale. Scaling out horizontally gives us the first split, where transactions are split over nodes. But we also ensure we have options for scaling by partitioning by types of transactions (log queries, API calls, live tail requests etc.), and by customer etc.

6. Talent Borrows, Genius Steals

We work only ten minutes away from where Oscar Wilde was born (Merrion Square, Dublin), so we do like to keep his words in mind. There are so many great distributed technologies around today – and they are getting better all the time. We have been inspired by Kafka, Hadoop, etcd, BDAS etc. But we like to control our own destiny and there are some very good reasons why we have built our platform from the ground up to do what it does in the best way it can. Turning logs into business insights isn’t easy – so we do like to stand on the shoulders of some distributed systems giants. So the final steps in our training program is to evaluate other distributed systems in the context of what we have learnt in the previous steps. Learn how Kafka implements mechanical sympathy when maximizing disk I/O. Understand Raft in the context of etcd. Understand the challenges on scaling analytics in the context of Hadoop.

Over the next few months – as we put our training program into practice we hope to have some of graduates blog about their experiences. Stay tuned!

More Stories By Trevor Parsons

Trevor Parsons is Chief Scientist and Co-founder of Logentries. Trevor has over 10 years experience in enterprise software and, in particular, has specialized in developing enterprise monitoring and performance tools for distributed systems. He is also a research fellow at the Performance Engineering Lab Research Group and was formerly a Scientist at the IBM Center for Advanced Studies. Trevor holds a PhD from University College Dublin, Ireland.

@MicroservicesExpo Stories
What’s hot in today’s cloud computing world? Containers are fast becoming a viable alternative to virtualization for the right use cases. But to understand why containers can be a better option, we need to first understand their origins. In basic terms, containers are application-centric environments that help isolate and run workloads far more efficiently than the traditional hypervisor technology found in commodity cloud Infrastructure as a Service. Modern operating systems (Linux, Windows, e...
SYS-CON Events announced today that Vicom Computer Services, Inc., a provider of technology and service solutions, will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. They are located at booth #427. Vicom Computer Services, Inc. is a progressive leader in the technology industry for over 30 years. Headquartered in the NY Metropolitan area. Vicom provides products and services based on today’s requirements...
SYS-CON Events announced today that Blue Box has been named “Bronze Sponsor” of SYS-CON's DevOps Summit New York, which will take place June 9-11, 2015, at the Javits Center in New York City, NY. Blue Box delivers Private Cloud as a Service (PCaaS) to a worldwide customer base. Built on a technology platform leveraging decades of operational expertise in cloud and distributed systems, Blue Box Cloud is a managed private cloud product available in both hosted and on-prem versions. Each Blue Box ...
Chuck Piluso will present a study of cloud adoption trends and the power and flexibility of IBM Power and Pureflex cloud solutions. Speaker Bio: Prior to Data Storage Corporation (DSC), Mr. Piluso founded North American Telecommunication Corporation, a facilities-based Competitive Local Exchange Carrier licensed by the Public Service Commission in 10 states, serving as the company's chairman and president from 1997 to 2000. Between 1990 and 1997, Mr. Piluso served as chairman & founder of ...
SYS-CON Events announced today that Soha will exhibit at SYS-CON's DevOps Summit New York, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. Soha delivers enterprise-grade application security, on any device, as agile as the cloud. This turnkey, cloud-based service enables customers to solve secure application access and delivery challenges that traditional or virtualized network solutions cannot solve because they are too expensive, inflexible and operational...
of cloud, colocation, managed services and disaster recovery solutions, will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. TierPoint, LLC, is a leading national provider of information technology and data center services, including cloud, colocation, disaster recovery and managed IT services, with corporate headquarters in St. Louis, MO. TierPoint was formed through the strategic combination of some of t...
SYS-CON Events announced today that Ciqada will exhibit at SYS-CON's @ThingsExpo, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. Ciqada™ makes it easy to connect your products to the Internet. By integrating key components - hardware, servers, dashboards, and mobile apps - into an easy-to-use, configurable system, your products can quickly and securely join the internet of things. With remote monitoring, control, and alert messaging capability, you will mee...
SYS-CON Events announced today that Column Technologies, a global technology solutions company, will exhibit at SYS-CON's DevOps Summit 2015 New York, which will take place June 9-11, 2015, at the Javits Center in New York City, NY. Established in 1998, Column Technologies is a leader in application performance and infrastructure management for commercial and federal markets. The company is headquartered in the United States, with a diverse and talented team of more than 350 employees around th...
Public Cloud IaaS started it's life in the developer and startup communities and has grown rapidly to a $20B+ industry, but it still pales in comparison to how much is spent worldwide on IT: $3.6 trillion. In fact, there are 8.6 million data centers worldwide, the reality is many small and medium sized business have server closets and colocation footprints filled with servers and storage gear. While on-premise environment virtualization may have peaked at 75%, the Public Cloud has lagged in ado...
The 5th International DevOps Summit, co-located with 17th International Cloud Expo – being held November 3-5, 2015, at the Santa Clara Convention Center in Santa Clara, CA – announces that its Call for Papers is open. Born out of proven success in agile development, cloud computing, and process automation, DevOps is a macro trend you cannot afford to miss. From showcase success stories from early adopters and web-scale businesses, DevOps is expanding to organizations of all sizes, including the...
Dave will share his insights on how Internet of Things for Enterprises are transforming and making more productive and efficient operations and maintenance (O&M) procedures in the cleantech industry and beyond. Speaker Bio: Dave Landa is chief operating officer of Cybozu Corp (kintone US). Based in the San Francisco Bay Area, Dave has been on the forefront of the Cloud revolution driving strategic business development on the executive teams of multiple leading Software as a Services (SaaS) ap...
79% of new products miss their launch date. That was the conclusion of a CGT/Sopheon Survey in which the impact of such market misses were also explored. What it didn't dig into was the reason why so many products and projects miss their launch date. When we start digging into the details with respect to applications, we can find at least one causal factor in the delivery process, specifically that portion which focuses on the actual move into production, from which consumers (internal and...
Microsoft is releasing in the near future Azure Service Fabric as a preview beta. Azure Service Fabric is built to run microservices - a complex application consisting of smaller, interlocked components that enables updating components without disrupting service. Microsoft has used this over the past few years internally for many of its own applications and the new release is for general use, a new product. OSIsoft is an early adopter of this system and run with it to expand into the explo...
ProfitBricks, the provider of painless cloud infrastructure IaaS, today released its SDK for Ruby, written against the company's new RESTful API. The new SDK joins ProfitBricks' previously announced support for the popular multi-cloud open-source Fog project. This new Ruby SDK, which exposes advanced functionality to take advantage of ProfitBricks' simplicity and productivity, aligns with ProfitBricks' mission to provide a painless way to automate infrastructure in the cloud. Ruby is a genera...
This digest provides an overview of good resources that are well worth reading. We’ll be updating this page as new content becomes available, so I suggest you bookmark it. Also, expect more digests to come on different topics that make all of our IT-hearts go boom!
One of the most frequently requested Rancher features, load balancers are used to distribute traffic between docker containers. Now Rancher users can configure, update and scale up an integrated load balancing service to meet their application needs, using either Rancher's UI or API. To implement our load balancing functionality we decided to use HAproxy, which is deployed as a contianer, and managed by the Rancher orchestration functionality. With Rancher's Load Balancing capability, users ...
Modern Systems announced completion of a successful project with its new Rapid Program Modernization (eavRPMa"c) software. The eavRPMa"c technology architecturally transforms legacy applications, enabling faster feature development and reducing time-to-market for critical software updates. Working with Modern Systems, the University of California at Santa Barbara (UCSB) leveraged eavRPMa"c to transform its Student Information System from Software AG's Natural syntax to a modern application lev...
There is no quick way to learn Jython API but to experiment with it. The easiest way is to start with Jytutor extension for XL Deploy. Now you can also use the code snippet for exposing jython/python context in XL Deploy environment by running it directly in Jytutor Here’s how you can go ahead with that Download the Jytutor extension referring to the Jytutor Blog or from the following link https://github.com/xebialabs-community/xld-jytutor-plugin/releases Shutdown your XL Deploy server...
ProfitBricks has launched its new DevOps Central and REST API, along with support for three multi-cloud libraries and a Python SDK. This, combined with its already existing SOAP API and its new RESTful API, moves ProfitBricks into a position to better serve the DevOps community and provide the ability to automate cloud infrastructure in a multi-cloud world. Following this momentum, ProfitBricks has also introduced several libraries that enable developers to use their favorite language to code ...
ProfitBricks, the provider of painless cloud infrastructure IaaS, announced the launch of its new DevOps Central and REST API, along with support for three multi-cloud libraries and a Python SDK. This, combined with its already existing SOAP API and its new RESTful API, moves ProfitBricks into a position to better serve the DevOps community and provide the ability to automate cloud infrastructure in a multi-cloud world. Following this momentum, ProfitBricks is also today introducing several l...