Mailing List Archive

100% CPU runaway with filecache in 1.3.0beta6
When I enable the file cache and request a page from Mediawiki,
the browser is waiting for the request forever. On the server
the page has been created in the cache. It's compressed and the
content looks fine (zcat'ed). There's an Apache process
continuously running at almost 100% CPU.

Without the filecache it works fine.

I noticed some discussion from early july about the status of
the filecache. Should it work now in beta6?

I have enabled the file cache by adding this to
LocalSettings.php:

$wgShowIPinHeader = false;
$wgUseFileCache = true;
$wgFileCacheDirectory = "/home/rene/projects/carriere/cache";

This directory exists and is writable for Apache.

No other tweaks, it's a clean install (well, the database is
upgraded from beta5).

I'm using:
MediaWiki: 1.3.0beta6
PHP: 4.1.2 (apache)
MySQL: 3.23.49-log
on Debian woody

Any ideas?

--
Regards / Groeten, http://www.leren.nl
René Pijlman http://www.applinet.nl
Re: 100% CPU runaway with filecache in 1.3.0beta6 [ In reply to ]
Rene Pijlman wrote:
> When I enable the file cache and request a page from Mediawiki,
> the browser is waiting for the request forever. On the server
> the page has been created in the cache. It's compressed and the
> content looks fine (zcat'ed). There's an Apache process
> continuously running at almost 100% CPU.

Everything seems to work fine on my main test machine (Mac OS X 10.3.4,
PHP 4.3.2), but I can confirm this phenomenon on Debian Woody.

(Side note: the file cache doesn't interact well with output-buffered
gzipping. Comment out the line that sets that near the top of
LocalSettings.php; unfortunately that doesn't solve this problem.)

The output is being written out to the cache file *and sent to the
client* but the connection hangs there. I'm not sure why yet...

-- brion vibber (brion @ pobox.com)
Re: 100% CPU runaway with filecache in 1.3.0beta6 [ In reply to ]
Brion Vibber:
>Rene Pijlman:
>> When I enable the file cache and request a page from Mediawiki,
>> the browser is waiting for the request forever. On the server
>> the page has been created in the cache. It's compressed and the
>> content looks fine (zcat'ed). There's an Apache process
>> continuously running at almost 100% CPU.
>
>Everything seems to work fine on my main test machine (Mac OS X 10.3.4,
>PHP 4.3.2), but I can confirm this phenomenon on Debian Woody.
>
>(Side note: the file cache doesn't interact well with output-buffered
>gzipping. Comment out the line that sets that near the top of
>LocalSettings.php; unfortunately that doesn't solve this problem.)
>
>The output is being written out to the cache file *and sent to the
>client* but the connection hangs there. I'm not sure why yet...

I noticed that the 100% CPU occurs after index.php has finished,
and after return from the output callback.

My guess is the call to header() in the output callback
saveToFileCache() is not safe. This is writing to the buffer
that the output callback is processing.

if( $this->useGzip() ) {
if( wfClientAcceptsGzip() ) {
header( 'Content-Encoding: gzip' );

Perhaps this confuses PHP. Also, I guess this header doesn't
actually make it into the headers, so perhaps the gzipped data
is confusing something down the line.

--
Regards / Groeten, http://www.leren.nl
René Pijlman http://www.applinet.nl
Re: 100% CPU runaway with filecache in 1.3.0beta6 [ In reply to ]
Rene Pijlman:
>My guess is the call to header() in the output callback
>saveToFileCache() is not safe. This is writing to the buffer
>that the output callback is processing.
>
> if( $this->useGzip() ) {
> if( wfClientAcceptsGzip() ) {
> header( 'Content-Encoding: gzip' );

BTW, this looks conceptually flawed at this point. The encoding
was already decided when the data was written to the buffer. I
can't think of a reason to redecide and write this header only
now.

Also I wonder if it's wise to use the output callback to write
the file to the cache. That could be done earlier as well, I'd
think.

--
Regards / Groeten, http://www.leren.nl
René Pijlman http://www.applinet.nl
Re: 100% CPU runaway with filecache in 1.3.0beta6 [ In reply to ]
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Brion Vibber wrote:
| (Side note: the file cache doesn't interact well with output-buffered
| gzipping. Comment out the line that sets that near the top of
| LocalSettings.php; unfortunately that doesn't solve this problem.)
|
| The output is being written out to the cache file *and sent to the
| client* but the connection hangs there. I'm not sure why yet...

Found the problem. It seems that the buffer is being passed by reference
on PHP 4.1; the variable is modified by the function and all goes to
hell. Making a copy to operate on gets things working.

Diff attached; fix just added to CVS head and 1.3 branch.

- -- brion vibber (brion @ pobox.com)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (Darwin)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFBFf3RwRnhpk1wk44RAnF9AKDCBpi5aV1NkgnWpK8C4X/GdD4d/gCaA/YG
qaoKi/1RLTELU2sQ/4QqzFE=
=/76B
-----END PGP SIGNATURE-----
Re: 100% CPU runaway with filecache in 1.3.0beta6 [ In reply to ]
Brion Vibber:
>Found the problem. It seems that the buffer is being passed by reference
>on PHP 4.1; the variable is modified by the function and all goes to
>hell. Making a copy to operate on gets things working.
>
>Diff attached; fix just added to CVS head and 1.3 branch.

This solves the runaway, but it's not working correctly yet.

When I request a normal article anonymously with the cache
enabled, my browser shows gibberish (Firefox) or a download
dialog (IE). I've looked at the headers with wget -S, and
there's no:

Content-Encoding: gzip

--
Regards / Groeten, http://www.leren.nl
René Pijlman http://www.applinet.nl
Re: 100% CPU runaway with filecache in 1.3.0beta6 [ In reply to ]
Rene Pijlman:
>Brion Vibber:
>>Found the problem. It seems that the buffer is being passed by reference
>>on PHP 4.1; the variable is modified by the function and all goes to
>>hell. Making a copy to operate on gets things working.
>
>This solves the runaway, but it's not working correctly yet.

BTW it works fine now with

$wgUseGzip = false;

in LocalSettings.php. The generic caching problem is fixed by
Brion's patch, a compressed caching problem remains.

>I've looked at the headers with wget -S, and there's no:
>
> Content-Encoding: gzip

Correcting myself...

I forgot that wget by default sends a different Accept-Encoding
header. When I run it with:

wget -S --header='Accept-Encoding: gzip, deflate' url

the headers look OK:

1 HTTP/1.1 200 OK
2 Date: Sun, 08 Aug 2004 14:39:31 GMT
3 Server: Apache/1.3.26 (Unix) Debian GNU/Linux
mod_python/2.7.8 Python/2.1.3 PHP/4.1.2
4 X-Powered-By: PHP/4.1.2
5 Vary: Accept-Encoding
6 Expires: -1
7 Cache-Control: private, must-revalidate, max-age=0
8 Last-modified: Sat, 7 Aug 2004 22:51:25 GMT
9 Content-Encoding: gzip
10 Content-Length: 1803
11 Keep-Alive: timeout=15, max=100
12 Connection: Keep-Alive
13 Content-Type: text/html; charset=iso-8859-1
14 Content-Language: nl

... and wget stores the gzipped data in a file (that's correct I
guess). With zcat it looks fine.

The question remains: why don't Firefox and IE uncompress the
data before rendering...

--
Regards / Groeten, http://www.leren.nl
René Pijlman http://www.applinet.nl
Re: 100% CPU runaway with filecache in 1.3.0beta6 [ In reply to ]
Rene Pijlman wrote:
> This solves the runaway, but it's not working correctly yet.
>
> When I request a normal article anonymously with the cache
> enabled, my browser shows gibberish (Firefox) or a download
> dialog (IE). I've looked at the headers with wget -S, and
> there's no:
>
> Content-Encoding: gzip

Did you disable the generic gzipping at the top of LocalSettings.php
like I said was necessary in my previous mail?

-- brion vibber (brion @ pobox.com)
Re: 100% CPU runaway with filecache in 1.3.0beta6 [ In reply to ]
Brion Vibber:
>Rene Pijlman:
>> When I request a normal article anonymously with the cache
>> enabled, my browser shows gibberish (Firefox) or a download
>> dialog (IE)
>
>Did you disable the generic gzipping at the top of LocalSettings.php
>like I said was necessary in my previous mail?

Oh waitaminute, you're referring to this I guess:

if( !ini_get( 'zlib.output_compression' ) ) ob_start(
'ob_gzhandler' );

I completely overlooked that. Indeed, when I remove this line
and configure file caching like this:

$wgUseFileCache = true;
$wgFileCacheDirectory = "/home/rene/projects/carriere/cache";
$wgShowIPinHeader = false;
$wgUseGzip = true;

... it works fine. Files in the cache are compressed and both
Firefox and IE render pages correctly.

Thanks again for your help Brion.

If I'm not mistaken the cause of the problem was that both
ob_gzhandler and the file cache were compressing the output, so
it was compressed two times, which would explain the problem I
saw.

May I suggest to fix this in the code, to make configuration
easier?

I'd say that when PHP provides the mechanism for output
compression over the wire, there's no reason to duplicate this
mechanism in the file cache. You might as well completely remove
compression from the file cache and cache uncompressed on disk.
At the cost of some extra disk space this will improve
performance of the file cache (no CPU cycles for
(un)compression) and simplify the code. This will neatly
separate caching from compression.

Would it help if I implement and test this and submit a patch?

--
Regards / Groeten, http://www.leren.nl
René Pijlman http://www.applinet.nl
Re: 100% CPU runaway with filecache in 1.3.0beta6 [ In reply to ]
Rene Pijlman wrote:
> I'd say that when PHP provides the mechanism for output
> compression over the wire, there's no reason to duplicate this
> mechanism in the file cache. You might as well completely remove
> compression from the file cache and cache uncompressed on disk.
> At the cost of some extra disk space this will improve
> performance of the file cache (no CPU cycles for
> (un)compression) and simplify the code. This will neatly
> separate caching from compression.

The point of compressing the file cache is of course to save disk space.
Since this is a legitimate thing to do, I don't think there's a need to
remove that option.

-- brion vibber (brion @ pobox.com)
Re: 100% CPU runaway with filecache in 1.3.0beta6 [ In reply to ]
Brion Vibber:
>The point of compressing the file cache is of course to save disk space.
>Since this is a legitimate thing to do, I don't think there's a need to
>remove that option.

Hmm, I assumed this was intended to compress the data that is
sent to the client. Well OK, in that case I suggest to make the
file cache work correctly in all cases, e.g. by sending
uncompressed data to the output buffer when the PHP lib will
take care of compression.

The wfClientAcceptsGzip() branches of the code seem to duplicate
the functionality of ob_gzhandler, and can be removed without
loss of functionality.

An optimization would be to solve it the other way around:
decide per request to not compress with ob_gzhandler when the
file is available in compressed form in the file cache.

--
Regards / Groeten, http://www.leren.nl
René Pijlman http://www.applinet.nl
Re: 100% CPU runaway with filecache in 1.3.0beta6 [ In reply to ]
Rene Pijlman wrote:
> Brion Vibber:
>>The point of compressing the file cache is of course to save disk space.
>>Since this is a legitimate thing to do, I don't think there's a need to
>>remove that option.
>
>
> Hmm, I assumed this was intended to compress the data that is
> sent to the client.

That's just a nifty side effect. :)

> An optimization would be to solve it the other way around:
> decide per request to not compress with ob_gzhandler when the
> file is available in compressed form in the file cache.

ob_end_clean() would probably disable the handler correctly... you might
try slipping a call in and see if that does it. The comments in the
manual indicate that it may still _call_ the ob_gzhandler function (so
will modify the headers) but won't output any data.

The main potential problem with this would be that ob_gzhandler might
have a different idea of what accepts gzip than we do.

-- brion vibber (brion @ pobox.com)
Re: 100% CPU runaway with filecache in 1.3.0beta6 [ In reply to ]
Brion Vibber:
>Rene Pijlman:
>> An optimization would be to solve it the other way around:
>> decide per request to not compress with ob_gzhandler when the
>> file is available in compressed form in the file cache.
>
>ob_end_clean() would probably disable the handler correctly... you might
>try slipping a call in and see if that does it. The comments in the
>manual indicate that it may still _call_ the ob_gzhandler function (so
>will modify the headers) but won't output any data.
>
>The main potential problem with this would be that ob_gzhandler might
>have a different idea of what accepts gzip than we do.

Another problem is zlib.output_compression which is set in
php.ini and cannot be turned off at the scripting level
(according to a comment in the docs). And I guess most users
will want to enable it to compress all requests, including
requests that cannot be served from the file cache. So I don't
think an implementation which serves compressed cached files
without the overhead of uncompress/compress is worth the effort.

I have attached a patch which fixes the compressed file cache,
by always sending uncompressed data to the output buffer,
leaving compression up to zlib.output_compression or
ob_gzhandler.

The effect of the patch is that the file cache can be enabled
with $wgUseGzip set to true or false. $wgUseGzip decides if the
files stored in the cache are compressed or not.

But I think there should be a comment in the documentation that
setting $wgUseGzip to true makes little sense. The idead of the
cache is to spend some disk space to reduce CPU-cycles, so why
would you then want to spend CPU-cycles to reduce that disk
space?

I also suggest to change this in DefaultSettings.php:

# We can serve pages compressed in order to save bandwidth,
# but this will increase CPU usage.
# Requires zlib support enabled in PHP.
$wgUseGzip = function_exists( 'gzencode' );

to:

# Should the file cache be compressed, in order to save disk
# space. This will increase CPU usage.
# Requires zlib support enabled in PHP. To enable, change
# this line to:
# $wgUseGzip = function_exists('gzencode');
$wgUseGzip = false;

With this patch applied, wfClientAcceptsGzip() will no longer be
used.

--
Regards / Groeten, http://www.leren.nl
René Pijlman http://www.applinet.nl