Mailing List Archive

Slow parser
I just ran "ab n=10" for an atricle (with cache turned off) and deactivated
some functions to see where the slow parts are.

Full rendering : 4.99 sec
removeHTMLtags turned off : 3.319 sec

It seems removeHTMLtags is responsible for 1/3 of the *total* runtime, which
includes apache, php calling, and a thousand other things that can't be
avoided.

So, if these HTML tags are *never* used anyway, why can't we replace them
with < and > just prior to saving an edited article?

I'll be gone tomorrow until Saturday, and I doubt I can hack it today, so
it's up to you...

Magnus
Re: Slow parser [ In reply to ]
From: "Magnus Manske" <Magnus.Manske@epost.de>
>
> I'll be gone tomorrow until Saturday, and I doubt I can hack it today, so
> it's up to you...

I wouldn't dare, that's Brion's turf. :-) But my remarks for parsing also
apply here. You guys seem to be really fond of the explode operator, but
it's really not a very good idea.

-- Jan Hidders

PS. At the risc of comitting social suicide, let me admit that, yes, at one
point in my life I have been teaching compiler theory. But it is certainly
not my real area of expertise.
Re: Slow parser [ In reply to ]
On mer, 2002-02-13 at 07:36, Magnus Manske wrote:
> I just ran "ab n=10" for an atricle (with cache turned off) and deactivated
> some functions to see where the slow parts are.
>
> Full rendering : 4.99 sec
> removeHTMLtags turned off : 3.319 sec

How much time does parseContents() take?

> It seems removeHTMLtags is responsible for 1/3 of the *total* runtime, which
> includes apache, php calling, and a thousand other things that can't be
> avoided.

Well, it can be made much more efficient... As Jan has hinted, explode()
is a killer, and I can take that out.

> So, if these HTML tags are *never* used anyway, why can't we replace them
> with &lt; and &gt; just prior to saving an edited article?

I just have two objections:

First, it violates the principal of least surprise; the user doesn't get
the same thing upon a re-edit that they left during the last edit. This
is particularly annoying for people who are putting complicated tables
into articles (cf. [[Beryllium]] and [[Periodic Table]]) -- if they do
one thing wrong, POOF! Half their table <tags> suddenly turn into
&lt;tags&gt; and instead of fixing one tiny mistake, they fix one tiny
mistake AND change a lot of &lt;&gt;s back into <>s.
Conclusion: bad for users.

Second, enforcing the limited subset HTML is just a part of the wiki
parsing. Doing that on save is fine, but is basically doing half the
parsing job and caching that, then doing the other half when we display
the page. Why stop there, when we could just parse the wiki-specific
code while we're at it and save the final result?
Conclusion: what exactly is our goal here? To save processing time on
page load? This is most effectively done by caching the completely
parsed version, both HTML and wiki -> HTML.

> I'll be gone tomorrow until Saturday, and I doubt I can hack it today, so
> it's up to you...

-- brion vibber (brion @ pobox.com)
Re: Slow parser [ In reply to ]
On mer, 2002-02-13 at 09:00, Jan Hidders wrote:
> From: "Magnus Manske" <Magnus.Manske@epost.de>
> >
> > I'll be gone tomorrow until Saturday, and I doubt I can hack it today, so
> > it's up to you...
>
> I wouldn't dare, that's Brion's turf. :-) But my remarks for parsing also
> apply here. You guys seem to be really fond of the explode operator, but
> it's really not a very good idea.

I know, I know, explode is bad. I'll replace it with something cleaner,
it was just a shortcut to what should have been a walk-through-the-
string-stopping-when-you-find "<" thing.

> PS. At the risc of comitting social suicide, let me admit that, yes, at one
> point in my life I have been teaching compiler theory. But it is certainly
> not my real area of expertise.

(shudder)

-- brion vibber (brion @ pobox.com)