Welcome!

Microservices Expo Authors: Dalibor Siroky, Simon Hill, Stackify Blog, Liz McMillan, John Katrick

Related Topics: @DevOpsSummit, Containers Expo Blog, @DXWorldExpo

@DevOpsSummit: Article

Compression: Making the Big Smaller and Faster (Part 1) | @DevOpsSummit #DevOps #WebPerf

The sharing of information in a fast and efficient manner has been an area of constant study and research

Compression: Making the Big Smaller and Faster (Part 1)
By Nilabh Mishra

How important is data compression? The sharing of information in a fast and efficient manner has been an area of constant study and research. Companies like Google and Facebook have spent a lot of time and effort trying to develop faster and better compression algorithms. Compression algorithms have existed since the ’70s and the ongoing research to have better algorithms proves just how important compression is for the Internet and for all of us.

The Need for Data Compression
The World Wide Web (WWW) has undergone a lot of changes since it was made available to the public in 1991. Believe it or not, the copy of the world’s first website can still be browsed here. Back then, webpages were very simple. Today, they are increasingly more complex and there is an evident need to have compression algorithms that are lossless, fast, and efficient.

There are several best practices that help optimize page load times. Here is a blog from that discusses webpage optimization. In this article, we will spend some time understanding the basics of compression and how it works. We will also cover a new type of compression method called “Brotli” in the second part of this blog.

Encoding and Data Compression
Let’s start by understanding what data encoding and compression are:

The word “compression” comes from the Latin word compressare, which means to press together. “Encoding” is the process of placing a sequence of characters in a specialized format that allows efficient data storage as well as transmission. Per Wikipedia: “Data compression involves encoding information using fewer bits than the original representation.

Compression plays a key role when it comes to saving bandwidth and speeding up your site. Modern day websites involve a lot of HTTP requests and responses between the client (the browser) and the server to serve a webpage. With an overall increase in the number of HTTP requests and responses, it becomes important to ensure that these transfers are taking place at a fast and efficient rate.

HTTP works on a request-response model, as demonstrated below:

In this case, we are not using any compression method to compress the response being sent by the server.

  • The browser sends an HTTP request asking for the Index.html page
  • The server looks for the requested file and responds with the requested resource and a 200 OK HTTP status message
  • The browser receives the server’s response and renders the page

As we can see, in this case there is no compression involved. The server responded with a 300 KB file (index.html page). If the file size was bigger, it would have taken more time for the response to be sent on the wire and this would have increased the overall page load time. Please note that we are currently looking only at a single HTTP response. Modern websites receive hundreds of such HTTP responses from the server to render a webpage.

The image below shows the same HTTP request – response between the browser and the server, but in this case, we use compression to reduce the size of the response being sent by the server to the browser.

Today, complex and dynamic websites generate hundreds of HTTP requests/responses. This made it important to have a system which would ensure fast and efficient data transfer between the server and the browser. This is when compression algorithms like Deflate and Gzip came into existence.

Introduction to Gzip
Gzip is a compression method that is used to make files smaller for storage and faster transmission over the network. Gzip is one of the most popular, powerful, and effective ways of compressing data and it can reduce the file size by up to 70%.

Gzip is based on the DEFLATE algorithm, which in turn is a combination of LZ77 and Huffman coding. Understanding how LZ77 works is essential to understand how compression methods like DEFLATE and Gzip work.

LZ77
Developed in the late ’70s by Abraham Lempel and Jacob Ziv, the LZ77 method of compression looks for sequences of characters that recur in a text. It performs compression by replacing the recurring occurrences of strings using pointers that backreference identical strings, previously encountered in the text, that needs to be compressed.

The pointer or backreference is of the form <relative jump, length>, where relative jump signifies how many bytes are there between the current occurrence of the string and its last occurrence and length is the total number of identical bytes found.

Now let us understand this better with the help of an example. Assume, there is a text file with the following text:

As idle as a painted ship, upon a painted ocean.

In this file, we see the following strings: “as” and “painted” occurring multiple times. What LZ77 method does is, it replaces multiple occurrences of strings with the notation: <relative jump, length>.

So using LZ77, the text will get encoded in the following way:

As idle <8,2> a painted ship, upon a <21,7> ocean.

To encode the text, we took the following steps:

  1. Looked at the string and tried to find occurrences of the same “string” or “substrings”.
  2. Replaced multiple occurrences of a string with the notation: <relative jump, length>; The two strings: “as” and “painted” were replaced the multiple occurrences of the strings with <relative jump, length>.
  3. The string “painted” which would have earlier occupied 7 bytes (i.e. the number of characters in the word: “painted”) X 1 byte = 7 bytes was compressed to occupy only 2 bytes. 2 bytes or 16 bits is the size of the pointer or backreference.

HUFFMAN Coding
Huffman Coding is another lossless data compression algorithm. The frequency of occurrence of a string in a text file or pixels in images form the basis of Huffman coding. To get a deeper understanding of this algorithm, read this detailed tutorial that clearly explains how Huffman Coding works.

All modern browsers support Gzip compression for HTTP Requests. With Gzip, one of the most important question is what to compress. It works best with text-based resources like static HTML, CSS files and JavaScript resources but is not very efficient for already compressed resources such as Images. To support Gzip, the server must be configured to allow gzip compression.

The image above shows the impact Gzip compression can have on a text-based resource like a JavaScript file. In this case, we ran 2 instant tests using Catchpoint to the URL: https://code.jquery.com/jquery-3.2.1.js.

For the first test run, we did not specify any encoding to be used by passing the custom header: Accept-Encoding: identity along with the request. The first image shows no Content-Encoding being passed for the request.

In the second image, the browser is sending Accept-Encoding:zip, for which the server is sending zipped file as the response.

We can clearly see how Gzip can drastically compress the files to improve data transmission rate over the wire.

Catchpoint’s Scheduled tests also highlight the difference between compressed and not-compressed content loading on webpages.

In the screenshot above, we see the difference in downloaded bytes for static content (CSS, JavaScript) when using G-zip vs. when not using any encoding.

Brotli Compression
A new compression method called Brotli was introduced not too long ago. The Brotli compression algorithm is optimized for the web and specifically for small text documents. We will discuss more about this compression method and what is has to offer to the World Wide Web community in the second part of the article.

The post Compression: Making the Big Smaller and Faster (Part 1) appeared first on Catchpoint's Blog - Web Performance Monitoring.

More Stories By Mehdi Daoudi

Catchpoint radically transforms the way businesses manage, monitor, and test the performance of online applications. Truly understand and improve user experience with clear visibility into complex, distributed online systems.

Founded in 2008 by four DoubleClick / Google executives with a passion for speed, reliability and overall better online experiences, Catchpoint has now become the most innovative provider of web performance testing and monitoring solutions. We are a team with expertise in designing, building, operating, scaling and monitoring highly transactional Internet services used by thousands of companies and impacting the experience of millions of users. Catchpoint is funded by top-tier venture capital firm, Battery Ventures, which has invested in category leaders such as Akamai, Omniture (Adobe Systems), Optimizely, Tealium, BazaarVoice, Marketo and many more.

@MicroservicesExpo Stories
The nature of test environments is inherently temporary—you set up an environment, run through an automated test suite, and then tear down the environment. If you can reduce the cycle time for this process down to hours or minutes, then you may be able to cut your test environment budgets considerably. The impact of cloud adoption on test environments is a valuable advancement in both cost savings and agility. The on-demand model takes advantage of public cloud APIs requiring only payment for t...
Cavirin Systems has just announced C2, a SaaS offering designed to bring continuous security assessment and remediation to hybrid environments, containers, and data centers. Cavirin C2 is deployed within Amazon Web Services (AWS) and features a flexible licensing model for easy scalability and clear pay-as-you-go pricing. Although native to AWS, it also supports assessment and remediation of virtual or container instances within Microsoft Azure, Google Cloud Platform (GCP), or on-premise. By dr...
Let's do a visualization exercise. Imagine it's December 31, 2018, and you're ringing in the New Year with your friends and family. You think back on everything that you accomplished in the last year: your company's revenue is through the roof thanks to the success of your product, and you were promoted to Lead Developer. 2019 is poised to be an even bigger year for your company because you have the tools and insight to scale as quickly as demand requires. You're a happy human, and it's not just...
"Opsani helps the enterprise adopt containers, help them move their infrastructure into this modern world of DevOps, accelerate the delivery of new features into production, and really get them going on the container path," explained Ross Schibler, CEO of Opsani, and Peter Nickolov, CTO of Opsani, in this SYS-CON.tv interview at DevOps Summit at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
Many enterprise and government IT organizations are realizing the benefits of cloud computing by extending IT delivery and management processes across private and public cloud services. But they are often challenged with balancing the need for centralized cloud governance without stifling user-driven innovation. This strategy requires an approach that fundamentally reshapes how IT is delivered today, shifting the focus from infrastructure to services aggregation, and mixing and matching the bes...
identify the sources of event storms and performance anomalies will require automated, real-time root-cause analysis. I think Enterprise Management Associates said it well: “The data and metrics collected at instrumentation points across the application ecosystem are essential to performance monitoring and root cause analysis. However, analytics capable of transforming data and metrics into an application-focused report or dashboards are what separates actual application monitoring from relat...
Enterprises are adopting Kubernetes to accelerate the development and the delivery of cloud-native applications. However, sharing a Kubernetes cluster between members of the same team can be challenging. And, sharing clusters across multiple teams is even harder. Kubernetes offers several constructs to help implement segmentation and isolation. However, these primitives can be complex to understand and apply. As a result, it’s becoming common for enterprises to end up with several clusters. Thi...
The benefits of automation are well documented; it increases productivity, cuts cost and minimizes errors. It eliminates repetitive manual tasks, freeing us up to be more innovative. By that logic, surely, we should automate everything possible, right? So, is attempting to automate everything a sensible - even feasible - goal? In a word: no. Consider this your short guide as to what to automate and what not to automate.
DevOps teams have more on their plate than ever. As infrastructure needs grow, so does the time required to ensure that everything's running smoothly. This makes automation crucial - especially in the server and network monitoring world. Server monitoring tools can save teams time by automating server management and providing real-time performance updates. As budgets reset for the New Year, there is no better time to implement a new server monitoring tool (or re-evaluate your current solution)....
While some developers care passionately about how data centers and clouds are architected, for most, it is only the end result that matters. To the majority of companies, technology exists to solve a business problem, and only delivers value when it is solving that problem. 2017 brings the mainstream adoption of containers for production workloads. In his session at 21st Cloud Expo, Ben McCormack, VP of Operations at Evernote, discussed how data centers of the future will be managed, how the p...
We just came off of a review of a product that handles both containers and virtual machines in the same interface. Under the covers, implementation of containers defaults to LXC, though recently Docker support was added. When reading online, or searching for information, increasingly we see “Container Management” products listed as competitors to Docker, when in reality things like Rocket, LXC/LXD, and Virtualization are Dockers competitors. After doing some looking around, we have decided tha...
High-velocity engineering teams are applying not only continuous delivery processes, but also lessons in experimentation from established leaders like Amazon, Netflix, and Facebook. These companies have made experimentation a foundation for their release processes, allowing them to try out major feature releases and redesigns within smaller groups before making them broadly available. In his session at 21st Cloud Expo, Brian Lucas, Senior Staff Engineer at Optimizely, discussed how by using ne...
Agile has finally jumped the technology shark, expanding outside the software world. Enterprises are now increasingly adopting Agile practices across their organizations in order to successfully navigate the disruptive waters that threaten to drown them. In our quest for establishing change as a core competency in our organizations, this business-centric notion of Agile is an essential component of Agile Digital Transformation. In the years since the publication of the Agile Manifesto, the conn...
The cloud revolution in enterprises has very clearly crossed the phase of proof-of-concepts into a truly mainstream adoption. One of most popular enterprise-wide initiatives currently going on are “cloud migration” programs of some kind or another. Finding business value for these programs is not hard to fathom – they include hyperelasticity in infrastructure consumption, subscription based models, and agility derived from rapid speed of deployment of applications. These factors will continue to...
While we understand Agile as a means to accelerate innovation, manage uncertainty and cope with ambiguity, many are inclined to think that it conflicts with the objectives of traditional engineering projects, such as building a highway, skyscraper or power plant. These are plan-driven and predictive projects that seek to avoid any uncertainty. This type of thinking, however, is short-sighted. Agile approaches are valuable in controlling uncertainty because they constrain the complexity that ste...
Digital transformation has changed the way users interact with the world, and the traditional healthcare experience no longer meets rising consumer expectations. Enterprise Health Clouds (EHCs) are designed to easily and securely deliver the smart and engaging digital health experience that patients expect today, while ensuring the compliance and data integration that care providers require. Jikku Venkat
"This all sounds great. But it's just not realistic." This is what a group of five senior IT executives told me during a workshop I held not long ago. We were working through an exercise on the organizational characteristics necessary to successfully execute a digital transformation, and the group was doing their ‘readout.' The executives loved everything we discussed and agreed that if such an environment existed, it would make transformation much easier. They just didn't believe it was reali...
"Codigm is based on the cloud and we are here to explore marketing opportunities in America. Our mission is to make an ecosystem of the SW environment that anyone can understand, learn, teach, and develop the SW on the cloud," explained Sung Tae Ryu, CEO of Codigm, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
"We're developing a software that is based on the cloud environment and we are providing those services to corporations and the general public," explained Seungmin Kim, CEO/CTO of SM Systems Inc., in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
"CA has been doing a lot of things in the area of DevOps. Now we have a complete set of tool sets in order to enable customers to go all the way from planning to development to testing down to release into the operations," explained Aruna Ravichandran, Vice President of Global Marketing and Strategy at CA Technologies, in this SYS-CON.tv interview at DevOps Summit at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.