Welcome!

Microservices Expo Authors: Stackify Blog, Liz McMillan, Simon Hill, Dalibor Siroky, John Worthington

Related Topics: @CloudExpo, Microservices Expo, Containers Expo Blog, Agile Computing, @DXWorldExpo, SDN Journal

@CloudExpo: Article

Twitter Is Not a SaaS Monitoring Solution

The crowd can help IT, but only if the right information is shared

A few weeks ago I was trying to update some files I have stored on a cloud storage service (that will remain nameless). I had moved my files there a while back as a way to make it easier to access them from my various devices and to avoid losing them during the next inevitable hard drive failure. For the most part I've been happy with the service, but on this day, I was unable to access the site.

Not good, as I was rushing to make some changes and send the files to a colleague.

Frustrated by my situation, I asked a co-worker to see if he was also having problems. He was, so we did the next logical thing you would expect. We went to the service provider's status page to see what they had to say. According to it, the service was healthy and there were no current service or maintenance notices.

#nowwhat?
Twitter! Of course. Whenever services like YouTube or Hulu have outages, users light-up Twitter with comments and laments. Sure enough, a quick Twitter search showed that, yes, there was a widespread problem that had started only a few minutes prior, and already there was a trending hashtag.

This example shows what's great about Twitter. It is an immensely powerful platform for creating instant virtual communities sharing information and opinion around a topic of common interest. The Twitter community as a group was able to do a better job than the service provider itself of informing users that there was a problem with the service.  I and the other storage service users -- at least the ones also on Twitter -- had formed an impromptu global network of monitors, watching the service from hundreds of thousands of access points. Together could confirm for each other that there was a service-wide outage.

#problemsolved?
Well, not really. Yes, I could see a number of people on Twitter reporting that they couldn't access the service, but this was all anecdotal information (along with a fair amount of opinion). I had no idea who these other users were or where they were located. For all I knew we might all be customers of the same internet service provider and maybe the problem was there and not with the storage service itself. In addition, while I could go to Twitter to confirm that I wasn't the only one experiencing an outage -- even as the service provider's status dashboard said everything was okay -- I was still searching for evidence after the fact. There was no practical way for me to be notified proactively, nor was I able to reliably see service performance degrading prior to the outage.

Herein lies the problem for manufacturers, or any organization, looking to leverage SaaS applications -- particularly mission critical email, collaboration, and document storage -- as part of their IT infrastructure. While it may be okay for me to use Twitter to monitor Hulu, you obviously can't operate a business this way. Organizations need the same level of visibility and troubleshooting capability for SaaS apps that they've come to rely on for traditional on-premise applications. This includes:

  • Proactive issue detection and alerting
  • Quantitative data on application performance
  • Ability to accurately measure service level attainment v. target goals
  • Ability to identify problem sources so the time to isolate and fix is minimized

That last one is particularly tricky for SaaS since most of the datacenter and network infrastructure is outside organizations' IT perimeters. You can't directly see or touch the server or network equipment and neither can your traditional monitoring and management tools. It's not surprising, then, that we often hear from IT admins that they have had to resort to using Twitter because otherwise they are flying completely blind. It's not enough, but at least it's something.

#saasvisibility
Despite its shortcomings, there is a lot to be said for the "power of the crowd" that is so fundamental to Twitter. What if we could take that same model and use it to proactively monitor our SaaS applications?  First, it would require some type of active monitoring behind your firewall at the locations where users access their SaaS applications. These "sensors" could act like Twitter users, constantly running transactions against the service and collecting data on transaction and network node performance. They would also allow you to proactively detect and notify an IT Admin of any outages or performance anomalies BEFORE they impact your users.

Then, what if we could collect and share real-time performance data from those sensors (yours as well as other users' sensors) into a global database maintained as part of your cloud service. You'd then be able to access this data to gain visibility into the health of the complete service delivery chain between you and the SaaS provider. For example, you could:

  • View current status, alerts, network statistics, and performance trends for one or more of your own sensors to determine if you have service issues affecting a particular location or subnet, so you can point and fix faults in your own infrastructure and get users back online quickly
  • Analyze your sensor data with the rest of the crowd to determine whether service issues are systemic to the application provider or the result of downstream internet service provider problems; you may not be able to fix these directly, but with this information you would know which service provider to call and could provide them with details to speed their time to resolution
  • Confirm exactly what service levels you are getting from your application service providers, with detailed outage data needed both for internal reporting and for provider service level guaranty refund requests

The goal of every IT shop is to keep their application users online and happy. But with SaaS, that's more difficult to do because administrators do not have the same visibility that they do with on-premise applications. We, as a community, need to come up with ways to change that. Taking a cue from Twitter, and leveraging the crowd - seems like a great place to start.

More Stories By Patrick Carey

Patrick Carey is vice president of product management and marketing for Exoprise, a provider of cloud-based monitoring and enablement solutions for Software-as-a-Service (SaaS) applications. He spends his free time thinking about how companies can get to the cloud faster and stay there longer.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


@MicroservicesExpo Stories
The nature of test environments is inherently temporary—you set up an environment, run through an automated test suite, and then tear down the environment. If you can reduce the cycle time for this process down to hours or minutes, then you may be able to cut your test environment budgets considerably. The impact of cloud adoption on test environments is a valuable advancement in both cost savings and agility. The on-demand model takes advantage of public cloud APIs requiring only payment for t...
"Codigm is based on the cloud and we are here to explore marketing opportunities in America. Our mission is to make an ecosystem of the SW environment that anyone can understand, learn, teach, and develop the SW on the cloud," explained Sung Tae Ryu, CEO of Codigm, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
High-velocity engineering teams are applying not only continuous delivery processes, but also lessons in experimentation from established leaders like Amazon, Netflix, and Facebook. These companies have made experimentation a foundation for their release processes, allowing them to try out major feature releases and redesigns within smaller groups before making them broadly available. In his session at 21st Cloud Expo, Brian Lucas, Senior Staff Engineer at Optimizely, discussed how by using ne...
Many enterprise and government IT organizations are realizing the benefits of cloud computing by extending IT delivery and management processes across private and public cloud services. But they are often challenged with balancing the need for centralized cloud governance without stifling user-driven innovation. This strategy requires an approach that fundamentally reshapes how IT is delivered today, shifting the focus from infrastructure to services aggregation, and mixing and matching the bes...
"CA has been doing a lot of things in the area of DevOps. Now we have a complete set of tool sets in order to enable customers to go all the way from planning to development to testing down to release into the operations," explained Aruna Ravichandran, Vice President of Global Marketing and Strategy at CA Technologies, in this SYS-CON.tv interview at DevOps Summit at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
While we understand Agile as a means to accelerate innovation, manage uncertainty and cope with ambiguity, many are inclined to think that it conflicts with the objectives of traditional engineering projects, such as building a highway, skyscraper or power plant. These are plan-driven and predictive projects that seek to avoid any uncertainty. This type of thinking, however, is short-sighted. Agile approaches are valuable in controlling uncertainty because they constrain the complexity that ste...
Cavirin Systems has just announced C2, a SaaS offering designed to bring continuous security assessment and remediation to hybrid environments, containers, and data centers. Cavirin C2 is deployed within Amazon Web Services (AWS) and features a flexible licensing model for easy scalability and clear pay-as-you-go pricing. Although native to AWS, it also supports assessment and remediation of virtual or container instances within Microsoft Azure, Google Cloud Platform (GCP), or on-premise. By dr...
"This all sounds great. But it's just not realistic." This is what a group of five senior IT executives told me during a workshop I held not long ago. We were working through an exercise on the organizational characteristics necessary to successfully execute a digital transformation, and the group was doing their ‘readout.' The executives loved everything we discussed and agreed that if such an environment existed, it would make transformation much easier. They just didn't believe it was reali...
"We're developing a software that is based on the cloud environment and we are providing those services to corporations and the general public," explained Seungmin Kim, CEO/CTO of SM Systems Inc., in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
It’s “time to move on from DevOps and continuous delivery.” This was the provocative title of a recent article in ZDNet, in which Kelsey Hightower, staff developer advocate at Google Cloud Platform, suggested that “software shops should have put these concepts into action years ago.” Reading articles like this or listening to talks at most DevOps conferences might make you think that we’re entering a post-DevOps world. But vast numbers of organizations still struggle to start and drive transfo...
Agile has finally jumped the technology shark, expanding outside the software world. Enterprises are now increasingly adopting Agile practices across their organizations in order to successfully navigate the disruptive waters that threaten to drown them. In our quest for establishing change as a core competency in our organizations, this business-centric notion of Agile is an essential component of Agile Digital Transformation. In the years since the publication of the Agile Manifesto, the conn...
The cloud revolution in enterprises has very clearly crossed the phase of proof-of-concepts into a truly mainstream adoption. One of most popular enterprise-wide initiatives currently going on are “cloud migration” programs of some kind or another. Finding business value for these programs is not hard to fathom – they include hyperelasticity in infrastructure consumption, subscription based models, and agility derived from rapid speed of deployment of applications. These factors will continue to...
While some developers care passionately about how data centers and clouds are architected, for most, it is only the end result that matters. To the majority of companies, technology exists to solve a business problem, and only delivers value when it is solving that problem. 2017 brings the mainstream adoption of containers for production workloads. In his session at 21st Cloud Expo, Ben McCormack, VP of Operations at Evernote, discussed how data centers of the future will be managed, how the p...
Let's do a visualization exercise. Imagine it's December 31, 2018, and you're ringing in the New Year with your friends and family. You think back on everything that you accomplished in the last year: your company's revenue is through the roof thanks to the success of your product, and you were promoted to Lead Developer. 2019 is poised to be an even bigger year for your company because you have the tools and insight to scale as quickly as demand requires. You're a happy human, and it's not just...
Enterprises are adopting Kubernetes to accelerate the development and the delivery of cloud-native applications. However, sharing a Kubernetes cluster between members of the same team can be challenging. And, sharing clusters across multiple teams is even harder. Kubernetes offers several constructs to help implement segmentation and isolation. However, these primitives can be complex to understand and apply. As a result, it’s becoming common for enterprises to end up with several clusters. Thi...
DevOps teams have more on their plate than ever. As infrastructure needs grow, so does the time required to ensure that everything's running smoothly. This makes automation crucial - especially in the server and network monitoring world. Server monitoring tools can save teams time by automating server management and providing real-time performance updates. As budgets reset for the New Year, there is no better time to implement a new server monitoring tool (or re-evaluate your current solution)....
We just came off of a review of a product that handles both containers and virtual machines in the same interface. Under the covers, implementation of containers defaults to LXC, though recently Docker support was added. When reading online, or searching for information, increasingly we see “Container Management” products listed as competitors to Docker, when in reality things like Rocket, LXC/LXD, and Virtualization are Dockers competitors. After doing some looking around, we have decided tha...
"Opsani helps the enterprise adopt containers, help them move their infrastructure into this modern world of DevOps, accelerate the delivery of new features into production, and really get them going on the container path," explained Ross Schibler, CEO of Opsani, and Peter Nickolov, CTO of Opsani, in this SYS-CON.tv interview at DevOps Summit at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
The benefits of automation are well documented; it increases productivity, cuts cost and minimizes errors. It eliminates repetitive manual tasks, freeing us up to be more innovative. By that logic, surely, we should automate everything possible, right? So, is attempting to automate everything a sensible - even feasible - goal? In a word: no. Consider this your short guide as to what to automate and what not to automate.
identify the sources of event storms and performance anomalies will require automated, real-time root-cause analysis. I think Enterprise Management Associates said it well: “The data and metrics collected at instrumentation points across the application ecosystem are essential to performance monitoring and root cause analysis. However, analytics capable of transforming data and metrics into an application-focused report or dashboards are what separates actual application monitoring from relat...