Brian's Programming and Software Architecture Blog: Ungzipping Gzip Compression Without HTTP Headers or With File Size Limit

Ungzipping in the Browser

Sometimes, developers get given tasks outside of their usual area of responsibility. For example, dealing with gzipping.

Gzip is a compression algorithm that's existed for over 25 years. It's a standard on the Internet and almost everything is served gzipped if it is served properly. There's various ways to deal with this, for example just letting the webserver gzip on the fly. However, you may run into a situation where that is impossible. For example, you may have some artificial limit of file size of less than 1 MB.

https://docs.microsoft.com/en-us/azure/cdn/cdn-troubleshoot-compression

(no code splitting is not always an answer; in particular, if you have an integration between different products, code splitting creates an unstable integration between two different products with different release cycles. a little bit of knowledge is a dangerous thing without the details.)

And of course even if you managed to gzip, if the infrastructure cannot guarantee the CONTENT-TYPE and CONTENT-ENCODING HTTP Headers and even more HTTP headers like Vary: Accept-Encoding, then the browser may decide to download the gzipped files instead of ungzipping by itself. Or simply crash.

https://blog.stackpath.com/accept-encoding-vary-important

It is also a general ask for JavaScript developers, particularly on full stack JavaScript (for example with Express as the webserver) to deal with gzipping manually. However, who knows where it will be served? It could be served on Apache, on IIS or on a CDN. So chances are, you will be asked during your career to gzip files where

a) you cannot guarantee the HTTP headers

or have other restrictions such as

b) cannot guarantee the file size (as of May 2018, there are 400 issues open in the webpack issue tracker for the split by file size . Even if code splitting by file size (actually called chunking) is done, it's experimental and bug ridden. And besides, splitting into many files is not compression... unless you serve over HTTP2 serving many files introduces an overhead. Gzipping is a standard, it must be done and the gains are too big to ignore. We are looking at gains of 5 to 10 times.

https://css-tricks.com/the-difference-between-minification-and-gzipping/

(in case you are wondering, no you cannot access the browser's native ungzip facility with JavaScript -- that is only accessible if the HTTP headers are present, and you never have access to the raw script text anyway due to cross origin policy so you will be looking at an AJAX request. If you can't make an AJAX request because of missing Access-Control-Allow-Origin or missing whitelisting tough shit, you got much bigger problems).

So what is a developer to do? Wash his hands and blame the ops guys? Who cares about gzip right, it's not our problem it's the server's problem. In fact who cares about user experience at all it can take ten seconds to load we will just wash our hands of these stupid server troubles. We are not server guys we are developers who cares about HTTP headers and how it's hosted right?

Image result for troll face

Of course not. Let's put the Dev back in DevOps and ungzip on the fly, with or without HTTP headers, on any infrastructure (well except for the Access-Control-Allow-Origin header that everyone has). Yeah baby! It will be dirty, messy but it will work.

Build Process

You can gzip in many ways, for example with this plugin if you are using webpack.

https://github.com/webpack-contrib/compression-webpack-plugin

You can also just use the Linux gzip utility as part of your build process.

The Client Side Code (or, the SECRET SAUCE)

We will use the library pako.js to ungzip on the fly, with or without the correct HTTP headers.

http://nodeca.github.io/pako/

In order to make sure the JavaScript files load in the correct order, we will use JavaScript Promises (which we will require a shim for IE support) and the JavaScript Fetch API (which also requires a shim for IE support). These are the required libraries.

https://cdnjs.cloudflare.com/ajax/libs/fetch/2.0.4/fetch.min.js
https://cdnjs.cloudflare.com/ajax/libs/bluebird/3.5.1/bluebird.min.js
https://cdnjs.cloudflare.com/ajax/libs/pako/1.0.6/pako.min.js

We fetch, paying attention to three things

https://developer.mozilla.org/en-US/docs/Web/API/Fetch_API/Using_Fetch

1. Load scripts in the correct order
2. Deal with HTTP errors with a CheckXHR method (write this!)
3. Fallback to the JS version, should an error occur

fetch('http://www.example.com/test.js.gz')
   .then(CheckXHR)
   .then(function (response) {
      return response.arrayBuffer(); // important to pass to PAKO.JS as array not as string
   })

   .then(function (arr) {

      return injectScript(arr);

})

   // load more scripts here
   .onError(function (response) {
      // deal with error... I suggest loading the ungzipped JS files as a fallback here
   });

We will dynamically inject the script, again returning a JavaScript promise on completion. Because we are dealing with text in the inner HTML tag, we don't have to use onload or onreadystatechange (IE).

function injectScript(arr) {
   return new Promise(function (resolve, reject) {
      var script = document.createElement('script');
      script.text = pako.ungzip(arr, { to: 'string' });
      document.head.appendChild(script);
      resolve();
   });
}

And there we go, complete.

With great power comes great responsibility; make sure you measure the performance in the browser to see the decrease not only in file size but how long it takes to actually use the web application.

Hopefully this helps someone

P.S. Message to server guys : we can code on a 386 or a RaspberryPi or a Commodore 64 or TRS-80 or string and yarn and foodstuffs and but that doesn't mean it's a good idea or a good use of time or money or resources. Upgrade your infrastructure to allow gzipping of any arbitrary file size with the correct HTTP headers and make the infrastructure work with the developers not against them, because the next time the problem might not be so (un)easy to solve.

Brian's Programming and Software Architecture Blog

Sunday, May 13, 2018

Ungzipping Gzip Compression Without HTTP Headers or With File Size Limit

2 comments: