Mailing List Archive

Re: what's being done about performance?
> This is fine, but you are trying a new car, and I want you to mount a
> speedometer so we can monitor the speed while driving, continuously
> keeping an eye on the problem instead of making believe we can fix it
> once and for all.

I agree that we need to have more clear data about performance numbers, and
that we need to monitor that data over time. My main point was to say that
1) we've got a faster and more adaptable code base in the wings, and 2) this
will give us time to do performance monitoring right, as well as to think
clearly about which things to optimize. As I said before, I think everybody
is trying to solve this problem, and that the developers are clearly moving
in the right direction. I'm no performance expert, but as a jack of all
trades and master of none, I'd like to toss out a few thoughts.

I'm concerned that we don't try to gather that data in a way which does
creates performance problems.

My experience is that it is a bad idea to make every function append
information to log file. I've never seen someone append to a log file on
every function call without a noticeable effect on performance, at least for
well designed ASP or JSP sites which have already minimized disk I/O.
I assume the same would be true with PHP. However, I having just read
Jimbo's e-mail, and it could be the I/O performance of the underling system
we were using -- mostly WinNT on slightly older machines -- that was to
blame.

That said, I'm aware of three common ways to get around this performance
bottleneck.

Most commonly, people just turn up the detain on their web server reporting
and parse those logs for the performance information they can get from
there. The problem with this is that it is sometime fails to provide the
fine grained information we'd need to see exactly which pages are causing
the slowdown.

Second, I've seen a lot of people storing log information in a database.
This is probably what we should do if we need more precise information than
we can get from the server logs.

The third solution is the most flexible, which is to set up a thread which
asynchronously processes a queue of log data and occasionally writes the
information to disk. This is clearly the most complex to implement, and I
wouldn't even consider it unless we decide we need realtime performance
monitoring, or there's something built into PHP to do this.

Regardless of how we choose to get this data, we can and should still be
thinking about how to optimize those functions which the server spends a LOT
of time doing, or those things which take a LONG time, as wikipedia usage
could very well continue to scale up dramatically, and we need to be ready
for this ahead of time. On the other hand, there is a cost to optimizations,
if only in the added complexity of the code. Complex code is more difficult
to update and maintain, and it is less likely to attract new developers over
time, so we need to make considered choices about what functions to 1) leave
unutilized, 2) optimize, or 3) remove altogether.