Mailing List Archive

about changing edited text
> > So, if these HTML tags are *never* used anyway,
> why can't we replace them
> > with < and > just prior to saving an edited
> article?
>
> I just have two objections:
>
> First, it violates the principal of least surprise;
> the user doesn't get
> the same thing upon a re-edit that they left during
> the last edit. This
> is particularly annoying for people who are putting
> complicated tables
> into articles (cf. [[Beryllium]] and [[Periodic
> Table]]) -- if they do
> one thing wrong, POOF! Half their table <tags>
> suddenly turn into
> &lt;tags&gt; and instead of fixing one tiny mistake,
> they fix one tiny
> mistake AND change a lot of &lt;&gt;s back into <>s.
> Conclusion: bad for users.
>
> Second, enforcing the limited subset HTML is just a
> part of the wiki
> parsing. Doing that on save is fine, but is
> basically doing half the
> parsing job and caching that, then doing the other
> half when we display
> the page. Why stop there, when we could just parse
> the wiki-specific
> code while we're at it and save the final result?
> Conclusion: what exactly is our goal here? To save
> processing time on
> page load? This is most effectively done by caching
> the completely
> parsed version, both HTML and wiki -> HTML.

My first thought when I read this was to have two
seperate versions of the page. One for the server and
one for the editor. Basically when someone saves a
page, this creates an html page and saves a seperate
wiki page. This way all processing would be done only
when someone saves a page rather than everytime
someone loads a page?

Is there a flaw in my reasoning other than the
database doubling in size? I guess it depends on
which is more important: space or speed?

Chuck

=====
Venu al la senpaga, libera enciklopedio
esperanta reta! http://eo.wikipedia.com/
====
Junuloj! Filadelfio, Usono 15an-17an de Februaro
http://unumondo.com/cgi-bin/wiki.pl?Filadelfia_JES

_________________________________________________________
Do You Yahoo!?
Información de Estados Unidos y América Latina, en Yahoo! Noticias.
Visítanos en http://noticias.espanol.yahoo.com
Re: about changing edited text [ In reply to ]
On mer, 2002-02-13 at 17:00, Chuck Smith wrote:
> My first thought when I read this was to have two
> seperate versions of the page. One for the server and
> one for the editor. Basically when someone saves a
> page, this creates an html page and saves a seperate
> wiki page. This way all processing would be done only
> when someone saves a page rather than everytime
> someone loads a page?
>
> Is there a flaw in my reasoning other than the
> database doubling in size? I guess it depends on
> which is more important: space or speed?

Nothing wrong with it, except that that's pretty much what we do already
by keeping a cached copy that's been completely converted from wiki
markup to HTML.

I'm not sure what advantage would be gotten out of storing a version
that has had HTML tags worked over, but still needs the wiki code
converted into HTML every time we load it. We get more speed by caching
the completely parsed version, or more storage savings by reparsing it
every time and not storing anything but the the editable text.

Storing the HTML-munged version seems like the worst of both worlds. :)
I'm willing to be convinced, though, if there's a really good reason I
haven't thought of or if I'm misunderstanding the problem.

-- brion vibber (brion @ pobox.com)
Re: about changing edited text [ In reply to ]
On mer, 2002-02-13 at 17:24, Jimmy Wales wrote:
> Brion Vibber wrote:
> > I'm not sure what advantage would be gotten out of storing a version
> > that has had HTML tags worked over, but still needs the wiki code
> > converted into HTML every time we load it. We get more speed by caching
> > the completely parsed version, or more storage savings by reparsing it
> > every time and not storing anything but the the editable text.
>
> It's worth noting that on the live server, I see no material difference when
> I turn caching on or off.

Interesting. I have to wonder whether this means caching is for some
reason not working at all... It seems to be disabled and/or broken at
the moment, unless someone sneaked in and fixed the other-languages bug
while I wasn't looking.

I ran "ab -n 10" on a couple pages running on my test server with
various states: caching on, caching off w/ no removeHTMLtags() call,
caching off with the old removeHTMLtags() code, and caching off with my
new as-yet unoptimized but more secure version of removeHTMLtags(). The
pages per second figures from three trials each:

Beryllium (large HTML table, various other tags)
* cached 2.06 2.06 2.16
* none 0.94 0.95 0.95
* old 0.90 0.90 0.89
* new 0.47 0.48 0.48

Esperanto-wiki mainpage: (a few <b>, <i>, and <font> tags)
* cached 3.26 3.13 3.47
* none 1.84 1.83 1.76
* old 1.82 1.80 1.80
* new 1.58 1.62 1.58

> Also, space is really cheap these days. And we're not in any immediate danger
> of running out of it.

Very true.

-- brion vibber (brion @ pobox.com)
Re: about changing edited text [ In reply to ]
On mer, 2002-02-13 at 22:20, Brion Vibber wrote:
> I ran "ab -n 10" on a couple pages running on my test server with
> various states: caching on, caching off w/ no removeHTMLtags() call,
> caching off with the old removeHTMLtags() code, and caching off with my
> new as-yet unoptimized but more secure version of removeHTMLtags(). The
> pages per second figures from three trials each:
>
> Beryllium (large HTML table, various other tags)
> * cached 2.06 2.06 2.16
> * none 0.94 0.95 0.95
> * old 0.90 0.90 0.89
> * new 0.47 0.48 0.48
>
> Esperanto-wiki mainpage: (a few <b>, <i>, and <font> tags)
> * cached 3.26 3.13 3.47
> * none 1.84 1.83 1.76
> * old 1.82 1.80 1.80
> * new 1.58 1.62 1.58

I've eliminated most of the loops from removeHTMLtags(), which is now
only a couple of percent slower than the old function which let
JavaScript through. Which is to say, still at >90% the performance of
not doing live HTML tag removal, but still below a completely parsed
cache.

Beryllium
* cached 2.20 2.13 2.15
* none 1.00 0.98 0.98
* newest 0.92 0.92 0.93

Esperanto-wiki mainpage
* cached 3.50 3.25 3.49
* none 1.87 1.89 1.87
* newest 1.86 1.82 1.80

Still room to optimize though,

-- brion vibber (brion @ pobox.com)
Re: about changing edited text [ In reply to ]
From: "Brion Vibber" <brion@pobox.com>
>
> Interesting. I have to wonder whether this means caching is for some
> reason not working at all... It seems to be disabled and/or broken at
> the moment, unless someone sneaked in and fixed the other-languages bug
> while I wasn't looking.

It works, otherwise we wouldn't have seen the language-link bug. I'm not the
least surprised that caching doesn't help much. With large datasets like
Wikipedia the bottlenecks are usually the database or programming that
doesn't keep an eye on the big O time/space complexity of its behavior.

-- Jan Hidders
Re: about changing edited text [ In reply to ]
On ¼aý, 2002-02-14 at 01:33, Jan Hidders wrote:
> From: "Brion Vibber" <brion@pobox.com>
> > Interesting. I have to wonder whether this means caching is for some
> > reason not working at all... It seems to be disabled and/or broken at
> > the moment, unless someone sneaked in and fixed the other-languages bug
> > while I wasn't looking.
>
> It works, otherwise we wouldn't have seen the language-link bug.

Well, I know it used to work on the live server (and the bug was visible
there) and it works now on my test machine (and the bug is visible
here), but right now I go to the live server and I *don't* see the
language bug. Which means either Jimbo left it disabled, or something
mysterious crept into the code that I don't know about.

Oh hey, I just noticed another bug! Wikilinks in the article change
style (lose underline) after an other-language link has been parsed.
That's probably my fault when I switched things over to the style-sheet
code, I'll fix it.

> I'm not the
> least surprised that caching doesn't help much. With large datasets like
> Wikipedia the bottlenecks are usually the database or programming that
> doesn't keep an eye on the big O time/space complexity of its behavior.

Very true. Which is why I don't think removeHTMLtags() is such a big
barrier that we have to do it only on article save. :)

-- brion vibber (brion @ pobox.com)
Re: about changing edited text [ In reply to ]
On ¼aý, 2002-02-14 at 02:47, Brion Vibber wrote:
> Oh hey, I just noticed another bug! Wikilinks in the article change
> style (lose underline) after an other-language link has been parsed.
> That's probably my fault when I switched things over to the style-sheet
> code, I'll fix it.

Got it. Also threw in the look-for-otherlanguage-links-when-cached fix,
and put in the rest of the URL in $wikiInterwiki so that links to
meta.wikipedia.com in the form [[m:Article Title]] work instead of
directing to the mainpage there.

-- brion vibber (brion @ pobox.com)
Re: about changing edited text [ In reply to ]
Brion Vibber wrote:
> Well, I know it used to work on the live server (and the bug was visible
> there) and it works now on my test machine (and the bug is visible
> here), but right now I go to the live server and I *don't* see the
> language bug. Which means either Jimbo left it disabled, or something
> mysterious crept into the code that I don't know about.

I left it disabled because it wasn't helping (didn't seem to be anyway) and
because of the bug it introduced, although that's pretty minor. I can turn it
back on.

But, like you, I'm wondering if maybe it wasn't _really_ on. I will
test for that today by reproducing the bug to "prove" the caching is
on.

--Jimbo