Mailing List Archive

Estimating parser time
I'm testing the postit script with and without non-ASCII chars, to see
if the apparent speed difference is real.

I'm not running any other load generators. Each test is an average over
roughly 50 pages. I'll alternate between the with and without states.

With: 2.39 secs average
Without: 2.91 secs average
With: 2.84 secs average
Without: 2.98 secs average
With: 2.64
Without: 3.18

Well, there does seem to be a difference, but it still might be noise.

All with = 7.87 / 3
All without = 9.07 / 3
Difference = 1.2 / 3 = 0.4 seconds

I then did two long runs:

With: 2.74 (averaged over 151 pages)
Without: 3.21 (averaged over 153 pages)

Difference: 0.47 seconds

Another two long runs:
With: 2.78 (averaged over 151 pages)
Without: 3.21 (averaged over 151 pages)

It looks like the difference is probably real, I guess due to the
greater likelihood of things that will make the regexp engine have to
lookahead and backtrack to check if things are links.

This is _not_ a big performance issue at the moment, as the Wikipedia
load is read-mostly. However, it's worth considering as a place to look
at in the future.

By then, we'll probably be running a three-tier architecture, with the
DB running on a separate machine from the PHP scripts, and so even then
this may be a low-priority issue.

Neil