Just for kicks, I've added some preliminary, experimental support for
gzip encoding of pages that have been saved in the file cache. If
$wgUseGzip is not enabled in LocalSettings, it shouldn't have any
effect; if it is, it'll make compressed copies of cached files and then
serve them if the client claims to accept gzip.
At present this only affects file-cachable pages: so plain current page
views by not-logged-in users. Compression is only done when generating
the cached file, so it oughtn't to drain CPU resources too much. My
informal testing shows the gzipping takes about 2-3 ms, which is much
shorter than most of the page generation steps. (Though it will eat up
some additional disk space, as both uncompressed and compressed copies
are kept on disk.)
I'd appreciate some testing with various user agents to see if things
are working. If you receive a compressed page, there'll be a comment at
the end of the page like <!-- Cached/compressed [timestamp] -->
A few notes:
This needs zlib support compiled into PHP to work. I've done this on
Larousse.
An on-the-fly compression filter could also be turned on for dynamic
pages and logged-in users, but I haven't done this yet. Compression
support could be a user-selectable option, so those with problem
browsers could turn it off, or those with slow modems could turn it on
where off by default. :)
The purpose of all this is of course to save bandwidth; there are two
ends of this, the server and the client:
Jimbo has pooh-poohed concerns about our bandwidth usage; certainly the
server has a nice fat pipe to the internet and isn't in danger of
choking, and whatever Bomis's overall bandwidth usage, Jimbo hasn't
complained that we're crowding out his legitimate business. :) But
still, we're looking at 5-20 *gigabytes* *per day*. A fair chunk of
that is probably images and archive dumps, but a lot is text.
On the client end: schmucks with dial-up may appreciate a little
compression. :)
I've also fixed what seems to be a conflict between the page cache and
client-side caching.
There are some race conditions remaining as far as making sure that two
loads of the same page don't overwrite each other's work or read
another's page partway through, and adding a gzipped second file
perhaps complicates this a bit... also still some cases where caches
aren't invalidated properly.
-- brion vibber (brion @ pobox.com)
gzip encoding of pages that have been saved in the file cache. If
$wgUseGzip is not enabled in LocalSettings, it shouldn't have any
effect; if it is, it'll make compressed copies of cached files and then
serve them if the client claims to accept gzip.
At present this only affects file-cachable pages: so plain current page
views by not-logged-in users. Compression is only done when generating
the cached file, so it oughtn't to drain CPU resources too much. My
informal testing shows the gzipping takes about 2-3 ms, which is much
shorter than most of the page generation steps. (Though it will eat up
some additional disk space, as both uncompressed and compressed copies
are kept on disk.)
I'd appreciate some testing with various user agents to see if things
are working. If you receive a compressed page, there'll be a comment at
the end of the page like <!-- Cached/compressed [timestamp] -->
A few notes:
This needs zlib support compiled into PHP to work. I've done this on
Larousse.
An on-the-fly compression filter could also be turned on for dynamic
pages and logged-in users, but I haven't done this yet. Compression
support could be a user-selectable option, so those with problem
browsers could turn it off, or those with slow modems could turn it on
where off by default. :)
The purpose of all this is of course to save bandwidth; there are two
ends of this, the server and the client:
Jimbo has pooh-poohed concerns about our bandwidth usage; certainly the
server has a nice fat pipe to the internet and isn't in danger of
choking, and whatever Bomis's overall bandwidth usage, Jimbo hasn't
complained that we're crowding out his legitimate business. :) But
still, we're looking at 5-20 *gigabytes* *per day*. A fair chunk of
that is probably images and archive dumps, but a lot is text.
On the client end: schmucks with dial-up may appreciate a little
compression. :)
I've also fixed what seems to be a conflict between the page cache and
client-side caching.
There are some race conditions remaining as far as making sure that two
loads of the same page don't overwrite each other's work or read
another's page partway through, and adding a gzipped second file
perhaps complicates this a bit... also still some cases where caches
aren't invalidated properly.
-- brion vibber (brion @ pobox.com)