Mailing List Archive

response times
Source code and discussion follow below.

Over the last 24 hours I've polled some URLs each 10 minutes and
measured the response time. This is a statistics report:

min avg max (count) <2 2-5 5-15 >15 URL - response times in seconds
0.75 2.00 11.79 ( 144) 68% 27% 4% 0% http://pl.wikipedia.com/wiki.cgi?Szwecja
0.56 2.70 89.78 ( 144) 95% 0% 0% 3% http://www.wikipedia.com/wiki.png
1.47 2.90 8.63 ( 144) 27% 64% 7% 0% http://pl.wikipedia.com/wiki.cgi?Ostatnie_zmiany
1.36 2.98 28.68 ( 144) 30% 62% 6% 0% http://pl.wikipedia.com/
0.83 3.21 91.86 ( 144) 94% 0% 0% 4% http://eo.wikipedia.com/vikio.png
1.00 4.45 140.53 ( 144) 63% 29% 2% 4% http://eo.wikipedia.com/wiki/Svedio
1.08 5.70 137.43 ( 144) 53% 34% 7% 4% http://eo.wikipedia.com/
3.40 7.09 68.23 ( 144) 0% 38% 56% 4% http://eo.wikipedia.com/wiki/Lastaj_Sxangxoj
3.35 13.59 203.39 ( 144) 0% 18% 63% 18% http://www.wikipedia.com/wiki/special:RecentChanges
1.61 28.38 411.43 ( 144) 4% 23% 33% 38% http://www.wikipedia.com/wiki/Sweden
3.86 45.16 359.17 ( 144) 0% 2% 38% 58% http://www.wikipedia.com/

The first three columns present the minimum, average, and maximum
response times. The rows are sorted on the average column. As you
can see the minimums are very good: 0.56 seconds roundtrip from Sweden
to San Diego is excellent. Even the English Wikipedia start page
(with the worst average in the list) has been served on 3.86 seconds,
which is quite good (this happened 6:10 am GMT). However, the
striking numbers are the maximum response times of several minutes.

The fourth column is the number of samples, which is 144 in 24 hours.

The following four columns present the statistical distribution of
samples in four categories: The percentage of samples that were less
than 2 seconds, those between 2 and 5 seconds, those between 5 and 15
seconds, and those in excess of 15 seconds. I thing usability gurus
like Jakob Nielsen has declared that 5 seconds is an acceptable
maximum for normal pages and most people can accept 15 seconds
response time for special functions such as searches and the recent
changes list.

The Polish Wikipedia has not a single sample above 15 seconds in these
24 hours. The majority of samples is in the lower two categories,
which is very good.

The Esperanto Wikipedia has a small number of samples above 15
seconds, which is sad but perhaps not alarming. The "recent changes"
has 56 % of its samples in the high 5-15 seconds response time
interval, which is a little high. Perhaps this could be fixed by
setting the default "recent changes" list from 7 or 3 days. Who can
change this setting? Tell me when you change, and I will report how
the response time changed. Almost all other samples are in the 0-2
and 2-5 categories, which is very good.

For the English Wikipedia, the static logotype image is served in less
than 2 seconds in 95 % of the samples. For this URL, there are no
samples in the 2-5 or 5-15 intervals, but a few samples have very long
response times. Perhaps the entire server was put on hold by some
other event? The last three lines of the report are depressing.
Almost none of the samples fall in the 0-2 or 2-5 categories. This
has to be analysed further by instrumenting the source code to report
where the delay is introduced.

I now have a running copy of the Wikipedia on my computer and have
started to experiment with this instrumentation. It's really straight
forward. In version 1.14 of wiki.phtml, Magnus Manske introduced the
function getmicrotime(), but the call to the function is commented
out. Just after getmicrotime(), I introduce a new function:

function trace($text){
global $startTime, $traceText;
$now = getmicrotime();
$elapsed = $now - $startTime;
if ($elapsed > 3.0) {
$traceText = "$traceText\nAfter $elapsed seconds: $text";
$startTime = $now;
}
}

Then at the beginning of the "main" program (where Magnus left a
commented-out first call to getmicrotime), I declare:

global $startTime, $traceText;
$startTime = getmicrotime();
$traceText = "";

Then at various points throughout the code, just after function calls
that I suspect are time bandits, I insert calls to my trace function:

trace("Just after updating the database");

These informative texts will accumulate in $traceText if the elapsed
time since the start is more than 3 seconds.

At the end of the "main" program comes the question, what should be
done with this $traceText? Should it be inserted into a new database
table? Or appended to the end of a text log file? Or inserted as an
HTML comment into the generated web page? Where can we best use this
information? The easiest but perhaps least useful is this:

trace("bottom of wiki.phtml main");
if ($traceText != "")
$out = "<!-- traceText:$traceText -->\n$out";

Who can implement this in the real source code? I'm not in the gang.


--
Lars Aronsson (lars@aronsson.se)
Aronsson Datateknik
Teknikringen 1e, SE-583 30 Linuxköping, Sweden
tel +46-70-7891609
http://aronsson.se/ http://elektrosmog.nu/ http://susning.nu/
Re: response times [ In reply to ]
Just a couple quick notes for now...

On dim, 2002-05-05 at 13:21, Lars Aronsson wrote:
> The Esperanto Wikipedia has a small number of samples above 15
> seconds, which is sad but perhaps not alarming. The "recent changes"
> has 56 % of its samples in the high 5-15 seconds response time
> interval, which is a little high.

Note also that the perl-based Esperanto wiki filters output through a
character set conversion and doesn't do any page caching (since caching
didn't interact well with the conversion), which is bound to slow it
down a little bit.

> Perhaps this could be fixed by
> setting the default "recent changes" list from 7 or 3 days. Who can
> change this setting? Tell me when you change, and I will report how
> the response time changed. Almost all other samples are in the 0-2
> and 2-5 categories, which is very good.

Have you tried running your tests on, say,
http://eo.wikipedia.com/wiki.cgi?action=rc&days=3 ?

-- brion vibber (brion @ pobox.com)
Re: response times [ In reply to ]
Brion L. VIBBER wrote:
> Have you tried running your tests on, say,
> http://eo.wikipedia.com/wiki.cgi?action=rc&days=3 ?

I did a few samples now, but couldn't see any difference. I'll let it
run for the next 24 hours, though.


--
Lars Aronsson (lars@aronsson.se)
Aronsson Datateknik
Teknikringen 1e, SE-583 30 Linuxköping, Sweden
tel +46-70-7891609
http://aronsson.se/ http://elektrosmog.nu/ http://susning.nu/
response times [ In reply to ]
When I started to get extremely long response times (> 300 seconds), I
thought perhaps these are tiemouts or other kinds of accidents, so I
started to log HTTP status codes and the length of the returned
document as well. But the status code is 200 (OK) and the full
document is returned, some 400-1800 seconds after the request was
sent.

Is anybody able to use the English Wikipedia now? These are the last
two hours:

min avg max (count) <2 2-5 5-15 >15 URL - response times in seconds
0.72 0.79 1.54 ( 15) 100% 0% 0% 0% http://www.wikipedia.com/wiki.png
10.02 100.24 920.39 ( 13) 0% 0% 30% 69% http://www.wikipedia.com/wiki/special:RecentChanges
17.83 281.00 1840.27 ( 12) 0% 0% 0% 100% http://www.wikipedia.com/wiki/Sweden
22.14 333.33 1599.76 ( 10) 0% 0% 0% 100% http://www.wikipedia.com/


--
Lars Aronsson
<lars@aronsson.se>
tel +46-70-7891609
http://aronsson.se/ http://elektrosmog.nu/ http://susning.nu/
response times [ In reply to ]
Today Friday, the front page of the English Wikipedia has been fast
all day.

Another page (I monitor http://www.wikipedia.com/wiki/Sweden) was slow
for one period of 30 minutes (09:30-10:00 am GMT) and another period
of two hours (11:40-13:50 GMT). Some other URLs on the international
Wikipedias were also affected at the same time. This might be due to
maintenance or work being done on the scripts.

Subtract 7 hours from GMT to get the server's local time zone
(PDT = GMT -0700).

Apart from these two limited intervals, every URL that I monitor have
been fast all day, including the recent changes pages.

I'm very happy with this, and hope Brion and Jimmy (and who else?)
will soon get the talk namespace links back without hurting
performance. (But hey, never make big fixes five minutes before you
leave for the weekend! Better just leave it as is if you have to go.)


And now for some more relaxed Friday reading, actually related to
performance problems. (The following analysis might be politically
slanted. Don't take it too seriously.) The Swedish parliament
elections are coming up in September, so the political parties are
starting up their campaigns. The problem is there are no big issues
to fight about. The four non-socialist parties have unusually boring
candidates (Dukakis style), and everybody expects the current
social-democratic government to win. The single issue that seems to
be coming up is the national sick leave insurance, which is paid by
tax money, and far over budget. This is linked to the fact that
"burn-out" is now an accepted medical diagnosis for which you are
allowed to take a long sick leave on the tax payers' expense. You
would expect such welfare excesses to be on the social democrat
agenda, and that non-socialists would urge for tax cuts and a balanced
budget. However, the current s-d govt has been doing a great job
balancing the budget, and they will now have to deal with cutting back
this overgenerous sick leave compensation without hurting their
voters' feelings. Tough job. The Christian-democratic party's
candidate has already hurt a lot of feelings by claiming that "some"
of those receiving compensation are "cheating the system". That might
be true, but accusing "some" (who? me?) is obviously not the way to
attract voters. This issue now has media attention and some
interesting example cases are reported.

Like this one: Attorneys in Swedish district courts have been
right-sized in the past years, as part of balancing the budget. This
means that as soon as one gets sick, the rest get too much to do,
leading to stress and burn-out, which leads to more sick leaves.

Think of the court cases as HTTP requests arriving to Wikipedia.
There are some processes/attorneys there to handle the cases, but for
some reason one process gets blocked and cannot work. This leaves
more work for the remaining workers, but they are probably waiting for
the first process to get finished and unlock the resources (database
records?) that it is using. If processes are allowed to go to sleep
waiting for each other, the work will pile up. It will never end.

So, what is the solution? Throwing more attorneys at the problem?
Maybe, but more likely the work processes should be redesigned and
simplified. That allows the available attorneys to finish up a case
and take on the next one. Some of their tasks are more important than
others, but the performance or throughput of the system depends on
cutting away or redesigning the most time-consuming tasks. The high
degree of sick-leave is an indicator of system design flaws (albeit an
one), and thus not altogether bad.

In the same way, a high "load average" (as reported by the "uptime" or
"top" commands) is one indicator that the Wikipedia system is flawed.
The load average in a UNIX system is the number of processes that are
ready to run, waiting for the CPU to become available. Unfortunately,
most of them are just waiting to see if their wanted resource has
become available. If this is not the case (e.g. database record still
locked), they will go back to the end of the line, waiting again. Do
you remember those bread shop waiting lines in Soviet Russia?

Training new attorneys is in itself a time-consuming task, which
should be avoided if possible. Instead of paying sick leave (for how
long?) to the already trained attorneys, a "cure" for "burn-out"
should be found that can bring them back to work, thus relieving the
overload from their colleagues and saving tax payers' money at the
same time.

I have no idea how a "cure" for burn-out can be found, but I think it
is a necessary political trick, and thus will happen. It will not
hurt voters' feelings, and it is my guess that the people who can
achieve this will work for the winners of the election.

This might be the weakest analogy in history, but I think we should
treat the Wikipedia processes with the same dignity and respect that
the Swedish voters would expect. After all, they're supposed to work
for us. The processes feel self-fulfillment when they can finish
their job on time, and get distressed when they get locked up. Any
uncalled for delay will only result in more work piling up. That is a
flaw in the system design that has to be fixed, and we cannot go
around claiming that "some" of the workers are trying to cheat the
system. That will only lead to us losing their confidence.


--
Lars Aronsson (lars@aronsson.se)
Aronsson Datateknik
Teknikringen 1e, SE-583 30 Linuxköping, Sweden
tel +46-70-7891609
http://aronsson.se/ http://elektrosmog.nu/ http://susning.nu/
response times [ In reply to ]
I'm looking again at my log files, comparing three days:
- Thursday May 9, before Jimmy's change (old talk links)
- Friday May 10, after Jimmy's change (no talk links)
- Tuesday May 14, with the new version (new talk links)

The overall performance was best on Friday. It is now getting worse
again, albeit not as bad as Thursday. The new version is a real
improvement over the old talk links, but we aren't quite done yet.

The number of OK responses (HTTP status code 200) which take absurdly
long (longer than 60 seconds) is still very high (3-30 %), with the
Main Page of the English Wikipedia being the main exception (0 %).

Response times above some limit (say, 30, 60 or 120 seconds) can be
defined as absurdly long, because the user will have left for other
websites and is no longer waiting for the response. Instead of
spending more system resources (CPU cycles and allocated memory) on
these requests, it would be better to set a hard time out (in PHP or
Apache) and return an error message that says "sorry for the delay".
This would free up system resources that can be better used to serve
other requests.

In <http://www.php.net/manual/en/function.set-time-limit.php>, the PHP
function set_time_limit() is said to have a default of 30 seconds,
unless the configuration file has defined max_execution_time. Will
calling this function set the time limit for the current request only,
or set a permanent value for the server? What happens when PHP
execution times out? Is the connection to the client abruptly closed?
Or is an error message returned? Does an error message appear in the
log file? I haven't seen any timeouts of this kind.

However, even if a PHP execution timeout is set, the limit will not
include time that the request spent waiting to start execution. This
wait could happen in the UNIX socket listen backlog, waiting for the
connection to be accepted, or inside Apache, waiting for a child
process to become available. Increasing the value of a parameter like
ListenBacklog (in Apache httpd.conf) is not necessarily a solution,
because this will only keep more requests in queue, increasing overall
response time. Instead, the problem should be fixed at the exit end
of the queue. The key to better performance is keeping the server
fast and queues short, getting things done.

Here are some pages on Apache performance issues:

- Hints on Running a High-Performance Web Server,
http://httpd.apache.org/docs/misc/perf.html
- Apache Performance Notes,
http://httpd.apache.org/docs/misc/perf-tuning.html
- Professional Apache, chapter 8,
http://www.devshed.com/Talk/Books/ProApache/
- Tuning Your Apache Web Server,
http://dcb.sun.com/practices/howtos/tuning_apache.jsp
- Linux HTTP Benchmarking HOWTO,
http://www.xenoclast.org/doc/benchmark/HTTP-benchmarking-HOWTO/


--
Lars Aronsson (lars@aronsson.se)
Aronsson Datateknik
Teknikringen 1e, SE-583 30 Linuxköping, Sweden
tel +46-70-7891609
http://aronsson.se/ http://elektrosmog.nu/ http://susning.nu/
Re: response times [ In reply to ]
On Wednesday 15 May 2002 09:50, Lars Aronsson wrote:
> Response times above some limit (say, 30, 60 or 120 seconds) can be
> defined as absurdly long, because the user will have left for other
> websites and is no longer waiting for the response. Instead of
> spending more system resources (CPU cycles and allocated memory) on
> these requests, it would be better to set a hard time out (in PHP or
> Apache) and return an error message that says "sorry for the delay".
> This would free up system resources that can be better used to serve
> other requests.
>
> In <http://www.php.net/manual/en/function.set-time-limit.php>, the PHP
> function set_time_limit() is said to have a default of 30 seconds,
> unless the configuration file has defined max_execution_time. Will
> calling this function set the time limit for the current request only,
> or set a permanent value for the server? What happens when PHP
> execution times out? Is the connection to the client abruptly closed?
> Or is an error message returned? Does an error message appear in the
> log file? I haven't seen any timeouts of this kind.

PHP aborts the script, outputs an error message saying it timed out, and
closes the connection.

phma
response times [ In reply to ]
These last 48 hours, I've sampled the response time of a number of
Wikipedia URLs every 20 minutes, for a total of 144 samples each.
From this data I cannot tell the duration of the slowdown periods (I
sample too seldom), but I can tell how many samples of each URL have
"absurdly" long response times (60+ seconds). All URLs are *not*
equal:

Count URL
----- --------------------------------
17 http://de.wikipedia.com/wiki.jpg
9 http://www.wikipedia.com/wiki/1458
7 http://www.wikipedia.com/wiki/Sweden
6 http://de.wikipedia.com/wiki.cgi?Letzte_%c4nderungen
6 http://de.wikipedia.com/
5 http://www.wikipedia.com/wiki/Chemistry
4 http://de.wikipedia.com/wiki.cgi?Schweden
3 http://de.wikipedia.com/wiki.cgi?action=rc&days=7
2 http://www.wikipedia.com/wiki/special:RecentChanges
2 http://www.wikipedia.com/
2 http://eo.wikipedia.com/wiki/Svedio
2 http://eo.wikipedia.com/wiki/Lastaj_Sxangxoj
1 http://eo.wikipedia.com/wiki.cgi?action=rc&days=3
1 http://eo.wikipedia.com/vikio.png
1 http://eo.wikipedia.com/

(Is the Esperanto Wikipedia running Perl again? Didn't it already use
the new PHP software?)


--
Lars Aronsson <lars@aronsson.se>
tel +46-70-7891609
http://aronsson.se/ http://elektrosmog.nu/ http://susning.nu/
Re: response times [ In reply to ]
On ven, 2002-05-17 at 08:35, Lars Aronsson wrote:
> (Is the Esperanto Wikipedia running Perl again? Didn't it already use
> the new PHP software?)

Not "again" but "still". There's a test server (test-eo.wikipedia.com),
but something didn't go right in the database conversion (so we can't
seriously test anything on it!) and nobody at Bomis has touched it
since. :(

Note that I have not been able to reproduce the conversion error (the
character set conversion which should have changed "cx" etc into proper
UTF-8 characters didn't happen); it all works fine and dandy on my
computer. We'd be ever so grateful if Jason or Jimmy could try it again
sometime...

-- brion vibber (brion @ pobox.com)
Re: response times [ In reply to ]
I'll look at it today...

Jason

Brion L. VIBBER wrote:

> We'd be ever so grateful if Jason or Jimmy could try it again
> sometime...

--
"Jason C. Richey" <jasonr@bomis.com>
Re: response times [ In reply to ]
On ven, 2002-05-17 at 11:06, Jason Richey wrote:
> I'll look at it today...

Thanks!

-- brion vibber (brion @ pobox.com)

> Jason
>
> Brion L. VIBBER wrote:
>
> > We'd be ever so grateful if Jason or Jimmy could try it again
> > sometime...
>
> --
> "Jason C. Richey" <jasonr@bomis.com>
Re: response times [ In reply to ]
I have re-imported the data from the usemod-eo wiki to the php-eo
wiki. Let me know if it looks better...

Jason

Brion L. VIBBER wrote:

> On ven, 2002-05-17 at 11:06, Jason Richey wrote:
> > I'll look at it today...
>
> Thanks!
>
> -- brion vibber (brion @ pobox.com)
>
> > Jason
> >
> > Brion L. VIBBER wrote:
> >
> > > We'd be ever so grateful if Jason or Jimmy could try it again
> > > sometime...
> >
> > --
> > "Jason C. Richey" <jasonr@bomis.com>
>
>
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l@ross.bomis.com
> http://ross.bomis.com/mailman/listinfo/wikitech-l

--
"Jason C. Richey" <jasonr@bomis.com>
Re: response times [ In reply to ]
Sure. The files are attached.

Jason

Brion L. VIBBER wrote:

> On ven, 2002-05-17 at 15:40, Jason Richey wrote:
> > I have re-imported the data from the usemod-eo wiki to the php-eo
> > wiki. Let me know if it looks better...
>
> Nope, same problem: page titles and contents have cx, gx, ux, etc
> instead of accented characters.
>
> Can I see the wikiLocalSettings.php and convertWiki2SQL.php and make
> sure they're right? It's still working fine on my machine.
>
> -- brion vibber (brion @ pobox.com)

--
"Jason C. Richey" <jasonr@bomis.com>
Re: response times [ In reply to ]
On ven, 2002-05-17 at 15:40, Jason Richey wrote:
> I have re-imported the data from the usemod-eo wiki to the php-eo
> wiki. Let me know if it looks better...

Nope, same problem: page titles and contents have cx, gx, ux, etc
instead of accented characters.

Can I see the wikiLocalSettings.php and convertWiki2SQL.php and make
sure they're right? It's still working fine on my machine.

-- brion vibber (brion @ pobox.com)
Re: response times [ In reply to ]
Here's some (trimmed) output:

X-Powered-By: PHP/4.0.4pl1
Content-type: text/html

--Using Esperanto conversion settings--
converting: "AIM" -> "AIM"

...

converting: "Argxenta_Libro" -> "Argxenta_Libro"
converting: "Algxerio" -> "Algxerio"
converting: "Azerbajgxano" -> "Azerbajgxano"
converting: "Agxario" -> "Agxario"
converting: "Angxevo" -> "Angxevo"
converting: "Angxelo_KRIRUGXO" -> "Angxelo_KRIRUGXO"
converting: "Cxefpagxo" -> "Cxefpagxo"
converting: "Dekana_Pregxejo_De_Sankta_Maria_Magdalena" ->
"Dekana_Pregxejo_De_Sankta_Maria_Magdalena"
converting: "Esperanto-Renkontigxo" -> "Esperanto-Renkontigxo"
converting: "Figxioj" -> "Figxioj"


That's bad news... But it does appear that we are running an older
PHP...

Jason

Brion L. VIBBER wrote:

> On ven, 2002-05-17 at 16:02, Jason Richey wrote:
> > Sure. The files are attached.
>
> Hmm... Shouldn't be any problem. I tried substituting in those two
> files, everything still works.
>
> Try the attached convertWiki2SQL.php. I'ved added some debug messages:
> it should say "--Using Esperanto conversion settings--" at the very
> beginning, then for every piece of text that is run through the
> character set conversion it will spit out:
>
> converting: "X" -> "X'"
>
> where X is the old string and X' is the new string. Where X contains
> character sequences like "cx", "gx", "ux", etc, X' should contain UTF-8
> byte sequences; on a terminal in 8-bit Latin1 mode they will look like
> an accented capital A followed by a square or blank character.
>
> If you don't see these messages, something is very wrong and the
> conversion isn't being called.
>
> If you see the messages but the "cx", "gx", "ux" etc are still in the
> converted text, then there's something else strange going on; possibly a
> change in the behavior of PHP's str_replace function. (I'm still running
> PHP 4.0.6.) If that's it, I'll try to whip up an alternate method.
>
> -- brion vibber (brion @ pobox.com)



--
"Jason C. Richey" <jasonr@bomis.com>
Re: response times [ In reply to ]
On ven, 2002-05-17 at 16:02, Jason Richey wrote:
> Sure. The files are attached.

Hmm... Shouldn't be any problem. I tried substituting in those two
files, everything still works.

Try the attached convertWiki2SQL.php. I'ved added some debug messages:
it should say "--Using Esperanto conversion settings--" at the very
beginning, then for every piece of text that is run through the
character set conversion it will spit out:

converting: "X" -> "X'"

where X is the old string and X' is the new string. Where X contains
character sequences like "cx", "gx", "ux", etc, X' should contain UTF-8
byte sequences; on a terminal in 8-bit Latin1 mode they will look like
an accented capital A followed by a square or blank character.

If you don't see these messages, something is very wrong and the
conversion isn't being called.

If you see the messages but the "cx", "gx", "ux" etc are still in the
converted text, then there's something else strange going on; possibly a
change in the behavior of PHP's str_replace function. (I'm still running
PHP 4.0.6.) If that's it, I'll try to whip up an alternate method.

-- brion vibber (brion @ pobox.com)
Re: response times [ In reply to ]
On ven, 2002-05-17 at 17:04, Jason Richey wrote:
> Here's some (trimmed) output:
...
> converting: "Cxefpagxo" -> "Cxefpagxo"
...
>
> That's bad news... But it does appear that we are running an older
> PHP...

Okay... Well, the conversion function on the live wiki input *does* seem
to work (it uses preg_replace rather than str_replace), and since we're
now including the local settings it should be readily available.

Try setting $recodeCharset to wikiRecodeInputEo instead of
recodeCharsetEo, and that should put things right. (A brief diff to do
this is attached.)

-- brion vibber (brion @ pobox.com)