HTTP/2 Gzipped Data

For two years now I've been working on an extension for HTTP/2 that introduces a mechanism for applying gzip encoding to data transported between two endpoints in the Hypertext Transfer Protocol Version 2 (HTTP/2), analogous to Transfer-Encoding in HTTP/1.1. [1] It's gone through a few pretty serious revisions in its relatively long life, but I'm pretty happy with where it is and how it reads right now – I think it's about ready for publication as an RFC. However I don't know if it ever will be.

HTTP/2 is, for all intents and purposes, HTTPS-only. Even ignoring political drivers like the https-everywhere movement, HTTP/2 is HTTPS-only on the open web. Mostly this is because HTTP/1.x has been around for a looong time, and some middleboxes out there have pretty questionable code paths. One of the big assumptions has been to assume that all traffic on the web would look just like HTTP/1.0. And for some twenty-odd years that assumption has held. Various proxies and gateways have made this assumption, in some cases running successfully for decades (even since HTTP/0.9) and freely peeked at or modified any HTTP traffic that passed through them. However HTTP/2 is a big break; for the first time HTTP traffic on the web doesn't all look like HTTP/1.0 – it's now packed in some incomprehensible binary formats and uses built-in compression. Any of those old middleboxes that tries to read the stream of data would at best be confused, at worst crash (bringing down internet access for some number of people – which is not what we want a new HTTP protocol to do.) And any of those old middleboxes that modifies data could royally screw things up, by whacking unpackaged bytes in willy-nilly. The traditional way to foil those invasive boxes was to ship the traffic on a different TCP port. HTTPS, while also using SSL/TLS to encrypt all traffic (including metadata), runs by default on port 443, where plain HTTP runs on port 80. In this way the traffic side-steps around the naive old middleboxes by completely avoiding their ports, and anyone listening on port 443 knows that all they'll see is a garble of binary guff (so there's no point trying to read or modify it.)

One of the big goals of HTTP/2 was to make the web better for a lot of people, invisibly. This means all the improvements you get from the binary packing and compression should continue to work on existing sites with existing URLs. By continuing to use http:// and https:// URLs, we're also committed to using TCP ports 80 and 443. And since those old meddling middleboxes are still out there, screwing up port 80 traffic for everyone, port 443 (∴TLS, ∴HTTPS) remains the only viable option for carrying HTTP/2 traffic on the open web. 😔

This doesn't mean that HTTP/2 can't be carried over a cleartext port-80 channel, just that it might not work in the big dark cloud, and none of the major browsers will bother trying.

Compression can break encryption. There's a fair bit on this out in the web, especially if you search for the "BEAST" or "CRIME/BREACH" attacks, so I won't delve into it myself. The HTTP/2 spec is pretty clear on its position regarding compression of data within an encrypted channel:

   Implementations communicating on a secure channel MUST NOT compress
   content that includes both confidential and attacker-controlled data
   unless separate compression dictionaries are used for each source of
   data.  Compression MUST NOT be used if the source of data cannot be
   reliably determined.  Generic stream compression, such as that
   provided by TLS, MUST NOT be used with HTTP/2 (see Section 9.2). [2]

That last sentence, and peoples' general attitudes towards compression since BREACH, are what give my draft troubles. Some could argue that I'm trying to provide "generic stream compression" which is expressly forbidden; however the way the paragraph reads – and the fact that the referenced Section 9.2 is all about TLS compression – suggests to me that it's "generic TLS stream compression" that is forbidden, the proscription doesn't apply to cleartext HTTP traffic. The absolutist language in the spec is possibly a hangover from an earlier draft, when cleartext wasn't to be supported at all.

Early versions of SPDY (from which HTTP/2 is derived) and early drafts of the HTTP/2 spec included a "COMPRESSED" flag on DATA frames – very similar to what I'm reintroducing with my draft (but more vulnerable through its retained/reused compression state between frames) – which was yanked after BREACH. [3] That's a pretty powerful stigma to overcome.

On top of that, because the major browsers won't speak HTTP/2 without HTTPS, and since gzip compression inside a TLS tunnel is a Bad Thing™, I've lost a lot of potential implementors/supporters for my draft, and, worse, probably gained some detractors. That said, this is a feature that is queried or requested from time to time in the community ([4], [5], [6], [7], [8]), so I still retain some hope.

[4] – the post that started this all

Matthew Kerwin

CC BY-SA 4.0
development, web

Comments powered by Disqus