SOA & WOA Authors: Sharon Barkai, Michael Bushong, Pat Romanski, Roger Strukhoff, Mark O'Neill

Related Topics: Cloud Expo, SOA & WOA

Cloud Expo: Blog Feed Post

Cloud and Public Relations and the AWS US-EAST-1 Question

Latest outage raises more questions about the Amazon Cloud

As you may be aware Amazon AWS US-EAST-1 experienced two outages in June that resulted in widespread service interruptions and significant downtime for AWS marquis customers such as Netflix, Pinterest, Istagram, Heroku.

While some of the Cloud community and analyst community may rationalize the outages in an attempt to “protect Cloud” my approach is to take a hard line with Amazon and to place the outages squarely in Amazon’s court. In my opinion the outages were avoidable and that Amazon’s datacenters suffered from a design or engineering flaw that resulted in not just one but two outages in June. And beyond the technical reasons for directing the issue to Amazon, my understanding is that effective public relations in the face of serious events is to accept responsibility, and work to remedy the issue so it doesn't happen again. For the Cloud to evolve into an enterprise technology such issues need to be addressed by the providers, and failures need not be rationalized, or excused.

I strive for and recommend the “design for failure” approach to Cloud and systems architecture. Yet I believe that if failures can be avoided by exercising design for failure at the data center level then Amazon failed to effectively execute the “design for failure principles” espoused by Werner Vogels. In the case of the June 14th and June 29th outages at Amazon US-EAST-1, I believe the outages could have been prevented the first time if Amazon’s data center had been able to run on generator power.

In the case of the June 29th outage the data center(s) lost power in much the same way and yet again no generator power to keep the systems available. The fact remains that other data centers in the Ashburn, Virginia region lost grid and continued to operate with no issue running on generator power until power was restored.

In my opinion Amazon needs to follow its own design principles and to avoid failing the same way twice, especially when standard data center design should have eliminated the first of the two failures.

More Stories By Brian McCallion

Brian McCallion, founder of New York City-based consultancy Bronze Drum focuses on the unique challenges of Public Cloud adoption in the Fortune 500. Forged along the fault line of Corporate IT and line of business meet, Brian successfully delivers successful enterprise public cloud solutions that matter to the business. In 2011, while the Cloud was just a gleam in the eye of most Fortune 500 firms Brian designed and proved the often referenced hybrid cloud architecture that enabled McGraw-Hill Education to scale the web and application layer of its $160M revenue, 2M user higher education platform in Amazon Web Services. Brian recently designed and delivered the JD Power and Associates strategic customer facing Next Generation Content Platform, an Alfresco Content Management solution supported by a substantial data warehouse and data mart running in AWS and a batch job that processes over 500M records daily in RDS Oracle.”

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.