|By Michael Kopp||
|November 4, 2012 09:00 AM EST||
An eCommerce site that crashes seven times during the Christmas season being down for up to five hours each time it crashes is a site that loses a lot of money and reputation. It happened to one of our customers who told this story at our annual performance conference earlier this month. Among the several reasons that led to these crashes I want to share more details on one of them that I see more often with other websites as well. Load balancers on a round-robin instead of least-busy can easily lead to app server crashes caused by heap memory exhaustion. Let's dig into some details on how to identify these problems and how to avoid it.
The Symptom: Crashing Tomcat Instances
The website is deployed on six Tomcats with three front-end Apache Web Servers. During peak load hours individual Tomcat instances started showing growing response times and a growing number of requests in the Tomcat processing queue. After a while these instances crashed due to out-of-memory exceptions and with that also brought down the rest of the site as load couldn't be handled any more with the remaining servers. Figure 1 shows the actual flow of transactions through the system highlighting unevenly distributed response time in the application servers and functional errors being reported on all tiers (red-colored server icon):
Even with equally distributed load (Round Robin Load Balancer Setting) one of the Tomcats spiked in Response Time Contribution before crashing
Once the App Server started rejecting incoming connections we can observe the first ripple effect of errors. We can see a very high number of exceptions in the database layer, exceptions thrown between application tiers with the web app responding with HTTP 500s:
Within 30 minutes the application serves 43000 pages with an HTTP 500 Response correlating to Exceptions in the Database and Inter-Tier Communication
The Root Cause: Inefficient Database Statements and Connection Pool Usage
The exceptions caught in the Database Layer (JDBC) were already a very good hint of the root cause of this problem. A closer look at the Exceptions shows that connection pools are exhausted, which causes problems in the different components of the application:
Exhausted Connection Pool causes Exceptions that impact Data Access Layer as well as Widget Rendering
Looking at the performance breakdown by application layer reveals how much performance impact connection pooling has on the overall transaction response time:
Due to the connection pool problem a single request had to wait 3.8s on average to obtain a connection from the pool
Now - it was not only the size of the pool that was the problem - but - several very inefficient database statements that took long to execute for certain business transactions of the application. This caused the application server to hold on to the connection longer than normal. As the load balancer was configured with Round Robin the app server still got additional requests served. Eventually - just by the random nature of incoming requests - one app server received several of these requests that executed these inefficient database calls. Once the connection pool was exhausted the application started throwing exceptions that ultimately also led to a crash of the JVM. Once the first app server crashed, it didn't take too long to take the other app servers down as well.
The Solution: Optimizing App and Load Balancer
The problem was fixed by looking at the slowest database statements and optimizing them for performance by, e.g, adding indices on the database or making the SQL statements more efficient. They also optimized the pool size to accommodate the expected load during peak hours.
They started by optimizing SQL Statements that took long to execute and those that got executed several times within the same transaction
They also changed the load balancer setting from Round-Robin to Least-Busy, which was the preferred setting from the LB vendor but had simply forgotten to configure in the production environment.
The Result: Site Has Not Been Down Since
Since they made the changes to the application and the load balancer the site has never gone down since. Now - the next holiday season is coming up and they are ready for the next seasonal spikes. Even though they are really confident that everything will work without any problems, they learned their lesson and are approaching performance proactively through proper load testing.
Next Steps: Proactive Performance Management
The lesson learned was that these problems could have been found prior to the holiday shopping season by doing proper load testing. They did load testing before but never encountered this problem because of two reasons: 1) they didn't test using expected peak volumes for long enough sessions and 2) didn't use a tool that simulated real customer behavior variations (too few scripts and the scripts were too simple) that tested their highly interactive web site.
Their strategy for proactive performance management is that they:
- Perform load tests on the production system during low traffic hours (2 a.m.-6 a.m.), accepting the risk of minor sales losses in case of a crash, versus major sales losses during the holiday shopping season.
- Multiply the hourly load test volume by 2.5 since their actual peaks are 10 hours long.
- Use a load testing service that uses real browsers in different locations around the U.S.
- Use an APM solution that identifies problems within the application while running the load test.
If you want to read more on common performance problems that are not found prior to moving to production check out my recent series of blogs: Supersized Content, Deployment Mistakes or Excessive Logging
While poor system performance occurs for any number of reasons (poor code, understaffed teams, inadequate legacy systems), this week’s post should help you quickly diagnose and fix a few common problems, while setting yourself up for a more stable future at the same time. Modern application frameworks have made it very easy to build not only powerful back-ends, but also rich, web-based user interfaces that are pushed out to the client in real-time. Often this involves a lot of data being transf...
Apr. 1, 2015 04:12 PM EDT Reads: 316
InfoScout in San Francisco gleans new levels of accurate insights into retail buyer behavior by collecting data directly from consumers’ sales receipts. In order to better analyze actual retail behaviors and patterns, InfoScout provides incentives for buyers to share their receipts, but InfoScout is then faced with the daunting task of managing and cleansing that essential data to provide actionable and understandable insights.
Apr. 1, 2015 03:45 PM EDT Reads: 332
Best practices for helping DevOps and Test collaborate in ways that make your SDLC leaner and more scalable. The business demand for "more innovative software, faster" is driving a surge of interest in DevOps, Agile and Lean software development practices. However, today's testing processes are typically bogged down by weighty burdens such as the difficulty of 1) accessing complete Dev/Test environments; 2) acquiring complete, sanitized test data; and 3) configuring the behavior of the environm...
Apr. 1, 2015 03:20 PM EDT Reads: 356
SYS-CON Events announced today that MangoApps will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY., and the 17th International Cloud Expo®, which will take place on November 3–5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. MangoApps provides private all-in-one social intranets allowing workers to securely collaborate from anywhere in the world and from any device. Social, mobile, and eas...
Apr. 1, 2015 03:00 PM EDT Reads: 3,277
As a group of concepts, DevOps has converged on several prominent themes including continuous software delivery, automation, and configuration management (CM). These integral pieces often form the pillars of an organization’s DevOps efforts, even as other bigger pieces like overarching best practices and guidelines are still being tried and tested. Being that DevOps is a relatively new paradigm - movement - methodology - [insert your own label here], standards around it have yet to be codified a...
Apr. 1, 2015 03:00 PM EDT Reads: 950
SYS-CON Events announced today that Solgenia will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY, and the 17th International Cloud Expo®, which will take place on November 3–5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. Solgenia is the global market leader in Cloud Collaboration and Cloud Infrastructure software solutions. Designed to “Bridge the Gap” between Personal and Professional S...
Apr. 1, 2015 03:00 PM EDT Reads: 3,104
Learn the top API testing issues that organizations encounter and how automation plus a DevOps team approach can address these top API testing challenges. Ensuring API integrity is difficult in today's complex application cloud, on-premises and hybrid environment scenarios. In this interview with TechTarget, Parasoft solution architect manager Spencer Debrosse shares his experiences about the top API testing issues that organizations encounter and how automation and a DevOps team approach can a...
Apr. 1, 2015 02:00 PM EDT Reads: 494
Chef and Canonical announced a partnership to integrate and distribute Chef with Ubuntu. Canonical is integrating the Chef automation platform with Canonical's Machine-As-A-Service (MAAS), enabling users to automate the provisioning, configuration and deployment of bare metal compute resources in the data center. Canonical is packaging Chef 12 server in upcoming distributions of its Ubuntu open source operating system and will provide commercial support for Chef within its user base.
Apr. 1, 2015 01:00 PM EDT Reads: 580
When it comes to microservices there are myths and uncertainty about the journey ahead. Deploying a “Hello World” app on Docker is a long way from making microservices work in real enterprises with large applications, complex environments and existing organizational structures. February 19, 2015 10:00am PT / 1:00pm ET → 45 Minutes Join our four experts: Special host Gene Kim, Gary Gruver, Randy Shoup and XebiaLabs’ Andrew Phillips as they explore the realities of microservices in today’s IT worl...
Apr. 1, 2015 12:45 PM EDT Reads: 1,943
After what feel like an interminable cycle of media frenzy followed by hype and hysteria cycles, the practical elements of real world cloud implementations are starting to become better documented. But what is really different in the cloud? How do software applications behave, live, interact and interconnect inside the cloud? Where do cloud architectures differ so markedly from their predecessors that we need to learn a new set of mechanics – and, when do we start to refer to software progra...
Apr. 1, 2015 12:45 PM EDT Reads: 537
The world's leading Cloud event, Cloud Expo has launched Microservices Journal on the SYS-CON.com portal, featuring over 19,000 original articles, news stories, features, and blog entries. DevOps Journal is focused on this critical enterprise IT topic in the world of cloud computing. Microservices Journal offers top articles, news stories, and blog posts from the world's well-known experts and guarantees better exposure for its authors than any other publication. Follow new article posts on T...
Apr. 1, 2015 12:00 PM EDT Reads: 1,608
Even though it’s now Microservices Journal, long-time fans of SOA World Magazine can take comfort in the fact that the URL – soa.sys-con.com – remains unchanged. And that’s no mistake, as microservices are really nothing more than a new and improved take on the Service-Oriented Architecture (SOA) best practices we struggled to hammer out over the last decade. Skeptics, however, might say that this change is nothing more than an exercise in buzzword-hopping. SOA is passé, and now that people are ...
Apr. 1, 2015 12:00 PM EDT Reads: 1,477
Hosted PaaS providers have given independent developers and startups huge advantages in efficiency and reduced time-to-market over their more process-bound counterparts in enterprises. Software frameworks are now available that allow enterprise IT departments to provide these same advantages for developers in their own organization. In his workshop session at DevOps Summit, Troy Topnik, ActiveState’s Technical Product Manager, will show how on-prem or cloud-hosted Private PaaS can enable organ...
Apr. 1, 2015 12:00 PM EDT Reads: 1,447
For those of us that have been practicing SOA for over a decade, it's surprising that there's so much interest in microservices. In fairness microservices don't look like the vendor play that was early SOA in the early noughties. But experienced SOA practitioners everywhere will be wondering if microservices is actually a good thing. You see microservices is basically an SOA pattern that inherits all the well-known SOA principles and adds characteristics that address the use of SOA for distribut...
Apr. 1, 2015 11:00 AM EDT Reads: 1,125
SYS-CON Events announced today the IoT Bootcamp – Jumpstart Your IoT Strategy, being held June 9–10, 2015, in conjunction with 16th Cloud Expo and Internet of @ThingsExpo at the Javits Center in New York City. This is your chance to jumpstart your IoT strategy. Combined with real-world scenarios and use cases, the IoT Bootcamp is not just based on presentations but includes hands-on demos and walkthroughs. We will introduce you to a variety of Do-It-Yourself IoT platforms including Arduino, Ras...
Apr. 1, 2015 11:00 AM EDT Reads: 2,278
Microservice architectures are the new hotness, even though they aren't really all that different (in principle) from the paradigm described by SOA (which is dead, or not dead, depending on whom you ask). One of the things this decompositional approach to application architecture does is encourage developers and operations (some might even say DevOps) to re-evaluate scaling strategies. In particular, the notion is forwarded that an application should be built to scale and then infrastructure sho...
Apr. 1, 2015 11:00 AM EDT Reads: 2,603
Our guest on the podcast this week is Jason Bloomberg, President at Intellyx. When we build services we want them to be lightweight, stateless and scalable while doing one thing really well. In today's cloud world, we're revisiting what to takes to make a good service in the first place. Listen in to learn why following "the book" doesn't necessarily mean that you're solving key business problems.
Apr. 1, 2015 10:45 AM EDT Reads: 1,392
Microservices are the result of decomposing applications. That may sound a lot like SOA, but SOA was based on an object-oriented (noun) premise; that is, services were built around an object - like a customer - with all the necessary operations (functions) that go along with it. SOA was also founded on a variety of standards (most of them coming out of OASIS) like SOAP, WSDL, XML and UDDI. Microservices have no standards (at least none deriving from a standards body or organization) and can be b...
Apr. 1, 2015 10:45 AM EDT Reads: 2,313
Right off the bat, Newman advises that we should "think of microservices as a specific approach for SOA in the same way that XP or Scrum are specific approaches for Agile Software development". These analogies are very interesting because my expectation was that microservices is a pattern. So I might infer that microservices is a set of process techniques as opposed to an architectural approach. Yet in the book, Newman clearly includes some elements of concept model and architecture as well as p...
Apr. 1, 2015 10:15 AM EDT Reads: 2,237
SYS-CON Events announced today the DevOps Foundation Certification Course, being held June ?, 2015, in conjunction with DevOps Summit and 16th Cloud Expo at the Javits Center in New York City, NY. This sixteen (16) hour course provides an introduction to DevOps – the cultural and professional movement that stresses communication, collaboration, integration and automation in order to improve the flow of work between software developers and IT operations professionals. Improved workflows will res...
Apr. 1, 2015 10:00 AM EDT Reads: 1,803