Mailing List Archive

Experimental gzip compression
Just for kicks, I've added some preliminary, experimental support for
gzip encoding of pages that have been saved in the file cache. If
$wgUseGzip is not enabled in LocalSettings, it shouldn't have any
effect; if it is, it'll make compressed copies of cached files and then
serve them if the client claims to accept gzip.

At present this only affects file-cachable pages: so plain current page
views by not-logged-in users. Compression is only done when generating
the cached file, so it oughtn't to drain CPU resources too much. My
informal testing shows the gzipping takes about 2-3 ms, which is much
shorter than most of the page generation steps. (Though it will eat up
some additional disk space, as both uncompressed and compressed copies
are kept on disk.)

I'd appreciate some testing with various user agents to see if things
are working. If you receive a compressed page, there'll be a comment at
the end of the page like <!-- Cached/compressed [timestamp] -->

A few notes:

This needs zlib support compiled into PHP to work. I've done this on
Larousse.

An on-the-fly compression filter could also be turned on for dynamic
pages and logged-in users, but I haven't done this yet. Compression
support could be a user-selectable option, so those with problem
browsers could turn it off, or those with slow modems could turn it on
where off by default. :)

The purpose of all this is of course to save bandwidth; there are two
ends of this, the server and the client:

Jimbo has pooh-poohed concerns about our bandwidth usage; certainly the
server has a nice fat pipe to the internet and isn't in danger of
choking, and whatever Bomis's overall bandwidth usage, Jimbo hasn't
complained that we're crowding out his legitimate business. :) But
still, we're looking at 5-20 *gigabytes* *per day*. A fair chunk of
that is probably images and archive dumps, but a lot is text.

On the client end: schmucks with dial-up may appreciate a little
compression. :)

I've also fixed what seems to be a conflict between the page cache and
client-side caching.

There are some race conditions remaining as far as making sure that two
loads of the same page don't overwrite each other's work or read
another's page partway through, and adding a gzipped second file
perhaps complicates this a bit... also still some cases where caches
aren't invalidated properly.

-- brion vibber (brion @ pobox.com)
Re: Experimental gzip compression [ In reply to ]
On Tue, May 20, 2003 at 05:03:03AM -0700, Brion Vibber wrote:
> Just for kicks, I've added some preliminary, experimental support for
> gzip encoding of pages that have been saved in the file cache. If
> $wgUseGzip is not enabled in LocalSettings, it shouldn't have any
> effect; if it is, it'll make compressed copies of cached files and then
> serve them if the client claims to accept gzip.
>
> At present this only affects file-cachable pages: so plain current page
> views by not-logged-in users. Compression is only done when generating
> the cached file, so it oughtn't to drain CPU resources too much. My
> informal testing shows the gzipping takes about 2-3 ms, which is much
> shorter than most of the page generation steps. (Though it will eat up
> some additional disk space, as both uncompressed and compressed copies
> are kept on disk.)

One of the nice things about gzip is that decompression is much much
cheaper than compression. It might almost make sense to just compress
everything and then decompress on the fly if you need it.

--
Nick Reinking -- eschewing obfuscation since 1981 -- Minneapolis, MN