Mailing List Archive

Page caching
After trying MySQL 4.0, I also want to try implementing a
rendered page cache. Using the test suite, I investigated
the issues involved and found the following: All 12
combinations of Skin and quickbar don't affect article
content (well, Nostalgia skin eliminates the page title, but
we can define "content" around that). Likewise, other
options can be worked around. The problematic ones are auto
header numbering and question-mark links.

I think I can probably work around header numbering by
always including them, putting them in a <span> and setting
the visibility in CSS. That will have the side-effect that
old non-CSS browsers will always get numbered headings
whether they want them or not. I can live with that.

The "?" links are another matter: I can't think of a clean
way to offer both options; i.e., links on the whole word
with some style, or links only around the "?". How much
screaming do you think would take place if we simply
eliminated the "?" links, and just made two link style
options (one sedate and one loud)?

--
Lee Daniel Crocker <lee@piclab.com> <http://www.piclab.com/lee/>
"All inventions or works of authorship original to me, herein and past,
are placed irrevocably in the public domain, and may be used or modified
for any purpose, without permission, attribution, or notification."--LDC
Re: Page caching [ In reply to ]
On Tue, 18 Mar 2003, Lee Daniel Crocker wrote:
> The "?" links are another matter: I can't think of a clean
> way to offer both options; i.e., links on the whole word
> with some style, or links only around the "?".

We had this working on phase 2. Something like:

<span class="newlinkedge">[</span>
<a class="newlink" href="..." title="Edit 'Page title'">Page title</a>
<span class="newlinkedge">]<a href="..." title="Edit 'Page title'">?</a></span>

So for question-mark links the class "newlinkedge" was defined as visible
and "newlink" defined to inherit surrounding color/style instead of
looking linking; while for red links the class "newlinkedge" was defined
as invisible and "newlink" as red underlined.

The only change is in the stylesheet, which is generated or selected
depending on user options (with the behavior change that the titles of
question-mark links can be clicked; a behavior whose loss was bemoaned by
some when we switched to phase 3.)

There are of course other options that affect content, particularly the
TeX-related selections.

-- brion vibber (brion @ pobox.com)
Re: Page caching [ In reply to ]
> (Brion Vibber <vibber@aludra.usc.edu>):
>
> We had this working on phase 2. Something like:
>
> <span class="newlinkedge">[.</span>
> <a class="newlink" href="..." title="Edit 'Page title'">Page title</a>
> <span class="newlinkedge">]<a href="..." title="Edit 'Page title'">?</a></span>

Ick. I wouldn't want to add that much overhead on each link; it seems
OK to do it for numbered headings, but not for links. I'm really quite
tempted to just eliminate the "?" links, and I'm really curious if they
are liked enough to justify them.

> There are of course other options that affect content, particularly the
> TeX-related selections.

Yeah, that's a tough one too, but it's only on a small number of
pages, so even if it makes those pages uncacheable, we'll probably
still benefit from the cache. And I'm not sure we can't work around
or eliminate some of those options as well.

--
Lee Daniel Crocker <lee@piclab.com> <http://www.piclab.com/lee/>
"All inventions or works of authorship original to me, herein and past,
are placed irrevocably in the public domain, and may be used or modified
for any purpose, without permission, attribution, or notification."--LDC
Re: Page caching [ In reply to ]
On Tue, 18 Mar 2003, Lee Daniel Crocker wrote:
> > (Brion Vibber <vibber@aludra.usc.edu>):
> > <span class="newlinkedge">[.</span>
> > <a class="newlink" href="..." title="Edit 'Page title'">Page title</a>
> > <span class="newlinkedge">]<a href="..." title="Edit 'Page title'">?</a></span>
>
> Ick. I wouldn't want to add that much overhead on each link; it seems
> OK to do it for numbered headings, but not for links. I'm really quite
> tempted to just eliminate the "?" links, and I'm really curious if they
> are liked enough to justify them.

The Dutch folks explicitly requested that question links be retained as
the default there. There may be some resistance.

> > There are of course other options that affect content, particularly the
> > TeX-related selections.
>
> Yeah, that's a tough one too, but it's only on a small number of
> pages, so even if it makes those pages uncacheable, we'll probably
> still benefit from the cache. And I'm not sure we can't work around
> or eliminate some of those options as well.

Pages with TeX are a small minority, and it would not be a significant
problem to simply mark any page with TeX as uncacheable. It might be nicer
to allow them to be cached, but only for users using the default
TeX options... Inclusive special cases can be worried about later, though;
it's just the exclusive cases we should start with -- things to not cache
because they're trouble.

Another odd option is "show hoverbox over wiki links". IIRC, unchecking
this will remove the 'title' attributes from the <a href>s. I don't know
if that's something that can be done by CSS... I also don't know if anyone
uses or wants the option.

There's also the stub detector. Difficult to CSSize that, as the threshold
is user-selected. Simplest to disable caching for the rare users who use
it.

-- brion vibber (brion @ pobox.com)
Re: Page caching [ In reply to ]
On Tue, 2003-03-18 at 20:32, Brion Vibber wrote:
> On Tue, 18 Mar 2003, Lee Daniel Crocker wrote:
> > > (Brion Vibber <vibber@aludra.usc.edu>):
> > > <span class="newlinkedge">[.</span>
> > > <a class="newlink" href="..." title="Edit 'Page title'">Page title</a>
> > > <span class="newlinkedge">]<a href="..." title="Edit 'Page title'">?</a></span>
> >
> > Ick. I wouldn't want to add that much overhead on each link; it seems
> > OK to do it for numbered headings, but not for links. I'm really quite
> > tempted to just eliminate the "?" links, and I'm really curious if they
> > are liked enough to justify them.
>
> The Dutch folks explicitly requested that question links be retained as
> the default there. There may be some resistance.

I similarly much prefer the question-mark links.

I'm always suspicious of interface changes that are "necessitated" by
technical difficulties. Laziness is a virtue of the programmer, but so
is hubris. If worse comes to worst, we could always cache two versions.
Re: Page caching [ In reply to ]
> (Brion Vibber <vibber@aludra.usc.edu>):
>
> Another odd option is "show hoverbox over wiki links". IIRC, unchecking
> this will remove the 'title' attributes from the <a href>s. I don't know
> if that's something that can be done by CSS... I also don't know if anyone
> uses or wants the option.

I see no reason not to remove that option (and always have the hover box,
not always remove it).

> There's also the stub detector. Difficult to CSSize that, as the threshold
> is user-selected. Simplest to disable caching for the rare users who use
> it.

What does that do again?

--
Lee Daniel Crocker <lee@piclab.com> <http://www.piclab.com/lee/>
"All inventions or works of authorship original to me, herein and past,
are placed irrevocably in the public domain, and may be used or modified
for any purpose, without permission, attribution, or notification."--LDC
Re: Page caching [ In reply to ]
> (The Cunctator <cunctator@kband.com>):
>
> I similarly much prefer the question-mark links.
>
> I'm always suspicious of interface changes that are "necessitated" by
> technical difficulties. Laziness is a virtue of the programmer, but so
> is hubris. If worse comes to worst, we could always cache two versions.

Well, that makes me pretty damned virtuous :-) OK, the question
marks have to stay. Mybe we can tighten up that awful HTML code a bit.

--
Lee Daniel Crocker <lee@piclab.com> <http://www.piclab.com/lee/>
"All inventions or works of authorship original to me, herein and past,
are placed irrevocably in the public domain, and may be used or modified
for any purpose, without permission, attribution, or notification."--LDC
Re: Page caching [ In reply to ]
On Tue, 18 Mar 2003, The Cunctator wrote:
> I similarly much prefer the question-mark links.
>
> I'm always suspicious of interface changes that are "necessitated" by
> technical difficulties. Laziness is a virtue of the programmer, but so
> is hubris.

Hubris is a virtue now? All right! :)

> If worse comes to worst, we could always cache two versions.

I don't think it would be worth caching two versions. The vast majority of
visits are going to be from anons or users who are on the default
preferences, so that's the case to optimize for; the performance hit of
maintaining two sets of caches pages may well be more than the performance
hit of always regenerating pages for the minority option group.

-- brion vibber (brion @ pobox.com)
Re: Page caching [ In reply to ]
On Tue, 18 Mar 2003, Lee Daniel Crocker wrote:
> OK, the question marks have to stay. Mybe we can tighten up that
> awful HTML code a bit.

It should be possible to squeeze it all into a single <a>, something like:

<a class="newlink" href="..." title="Edit 'Page title'">
<span class="edge">[</span>
<span class="text">Page title</span>
<span class="edge">]</span>
<span class="q">?</span>
</a>

(Whitespace is for illustrative purposes only. A couple bytes could be
saved with shorter class names, of course...)

// Redlinks
a.newlink .edge,
a.newlink .q { display: none }
a.newlink .text { color: red }

// ? links
a.newlink .edge,
a.newlink .text { font-style: normal; font-color: inherit }

Might need some more tweaking to make the text color inherit properly,
but i think the html markup would be okay.

-- brion vibber (brion @ pobox.com)
Re: Page caching [ In reply to ]
On Tue, 18 Mar 2003, Lee Daniel Crocker wrote:
> > There's also the stub detector. Difficult to CSSize that, as the threshold
> > is user-selected. Simplest to disable caching for the rare users who use
> > it.
>
> What does that do again?

Links to pages sized under the threshold are specially marked (I think an
exclamation point in ?-mode, and a dark red in red links mode). Since
sizes can change more frequently than existence, and the threshold is a
user option, there's not a good way to cache it.

Hmm, maybe give each link its own id and then generate up-to-the-minute
CSS? Blecch.

-- brion vibber (brion @ pobox.com)
Re: Page caching [ In reply to ]
Lee Daniel Crocker wrote:
> The "?" links are another matter: I can't think of a clean
> way to offer both options; i.e., links on the whole word
> with some style, or links only around the "?". How much
> screaming do you think would take place if we simply
> eliminated the "?" links, and just made two link style
> options (one sedate and one loud)?

If the benefit will be substantial, and I think it will, I think this
is a cost we can live with.

I haven't seen a ? link in a long time, and I no longer miss them.

--Jimbo
Re: Page caching [ In reply to ]
Brion Vibber wrote:
> The Dutch folks explicitly requested that question links be retained as
> the default there. There may be some resistance.

Perhaps that resistance will fade quickly if they perceive it as a
tradeoff between this relatively minor style feature versus
performance.

--Jimbo
Re: Page caching [ In reply to ]
Jimmy Wales wrote:

>Lee Daniel Crocker wrote:
>
>
>>The "?" links are another matter: I can't think of a clean
>>way to offer both options; i.e., links on the whole word
>>with some style, or links only around the "?". How much
>>screaming do you think would take place if we simply
>>eliminated the "?" links, and just made two link style
>>options (one sedate and one loud)?
>>
>>
>
>If the benefit will be substantial, and I think it will, I think this
>is a cost we can live with.
>
I think we can improve the speed of rendering/parsing a page enough so
caching won't be necessary
1. Enhanced parser, probably in C++ (working on it)
2. Caching of the "known-to-be-existing" links in a single field
(separated with "\n", probably) in the cur table. That will greatly
reduce the number of database queries for the link table(s)

>I haven't seen a ? link in a long time, and I no longer miss them.
>
>
I don't either ;-)

Magnus
Re: Page caching [ In reply to ]
On Wed, Mar 19, 2003 at 01:16:25PM +0100, Magnus Manske wrote:
> Jimmy Wales wrote:
>
> >Lee Daniel Crocker wrote:
> >
> >
> >>The "?" links are another matter: I can't think of a clean
> >>way to offer both options; i.e., links on the whole word
> >>with some style, or links only around the "?". How much
> >>screaming do you think would take place if we simply
> >>eliminated the "?" links, and just made two link style
> >>options (one sedate and one loud)?
> >>
> >>
> >
> >If the benefit will be substantial, and I think it will, I think this
> >is a cost we can live with.
> >
> I think we can improve the speed of rendering/parsing a page enough so
> caching won't be necessary
> 1. Enhanced parser, probably in C++ (working on it)
> 2. Caching of the "known-to-be-existing" links in a single field
> (separated with "\n", probably) in the cur table. That will greatly
> reduce the number of database queries for the link table(s)
>
> >I haven't seen a ? link in a long time, and I no longer miss them.
> >
> >
> I don't either ;-)

We could use 2 phase rendering:
phase 1 phase 2
source -> cachable version -> uncachable version

So phase 2 will do TeX and those links which were broken at phase 1,
and the rest will be done in phase 1.

It could significantly improve performance at no cost in interface.
Re: Page caching [ In reply to ]
On Wed, 2003-03-19 at 03:42, Jimmy Wales wrote:
> Brion Vibber wrote:
> > The Dutch folks explicitly requested that question links be retained as
> > the default there. There may be some resistance.
>
> Perhaps that resistance will fade quickly if they perceive it as a
> tradeoff between this relatively minor style feature versus
> performance.

It's just a few bytes, which we could resoundingly make up for by
dropping the full hostname from the URLs.

-- brion vibber (brion @ pobox.com)
Re: Page caching [ In reply to ]
> (Magnus Manske <magnus.manske@web.de>):
>
> I think we can improve the speed of rendering/parsing a page enough
> so caching won't be necessary

I agree, but it's something that's easy to test with the new
suite, so if it works well, we can use it.

> 1. Enhanced parser, probably in C++ (working on it)

Waste of time. Parsing is clearly not a significant factor in
the present CPU load, and even one written in C would spend most
of its time looking up links in the database just as the PHP
code does. And the overhead of linking the two would wipe out
any savings.

> 2. Caching of the "known-to-be-existing" links in a single field
> (separated with "\n", probably) in the cur table. That will greatly
> reduce the number of database queries for the link table(s)

Caching known links in groups is indeed someting I will try, but
in their own table to reduce lock contention. Another thing I want
to look into is caching link lookups in a separate process outside
the page rendering, so the cache can be persistent across pages.
But first on the list in MySQL 4.0.12. I've already got the testing
setup working now--one machine on my LAN running the software and
another running the test suite--so I can start timing changes like
that. I still want to add another few days worth of code to the
tests though.

--
Lee Daniel Crocker <lee@piclab.com> <http://www.piclab.com/lee/>
"All inventions or works of authorship original to me, herein and past,
are placed irrevocably in the public domain, and may be used or modified
for any purpose, without permission, attribution, or notification."--LDC
Re: Page caching [ In reply to ]
Lee Daniel Crocker wrote:
>>(Magnus Manske <magnus.manske@web.de>):
>>1. Enhanced parser, probably in C++ (working on it)
>
>
> Waste of time. Parsing is clearly not a significant factor in
> the present CPU load, and even one written in C would spend most
> of its time looking up links in the database just as the PHP
> code does. And the overhead of linking the two would wipe out
> any savings.

Most of the parser is already running in my offline reader, and could be
adapted for "real" output. I'd like (some day) to write a "Phase IV" in
C++, if I can figure out a nice way to link it with the web server
(e.g., how to read <form> content). Is there a tutorial or something on
the web?

> Caching known links in groups is indeed someting I will try, but
> in their own table to reduce lock contention.

OTOH, we're reading the article data from the database anyway;
transmitting another field with it won't do much harm, I think.

> Another thing I want
> to look into is caching link lookups in a separate process outside
> the page rendering, so the cache can be persistent across pages.

Sounds like trouble to me :-)

Magnus
Re: Page caching [ In reply to ]
On Thu, Mar 20, 2003 at 07:28:53PM +0100, Magnus Manske wrote:
> Lee Daniel Crocker wrote:
> >>(Magnus Manske <magnus.manske@web.de>):
> >>1. Enhanced parser, probably in C++ (working on it)
> >
> >
> >Waste of time. Parsing is clearly not a significant factor in
> >the present CPU load, and even one written in C would spend most
> >of its time looking up links in the database just as the PHP
> >code does. And the overhead of linking the two would wipe out
> >any savings.

I tentatively agree, the parsing of the data itself is not a bottleneck.
Something like the Zend Optimizer or PHPA (as suggested on meta) could
certainly be worth a shot, especially since they are free.

> Most of the parser is already running in my offline reader, and could be
> adapted for "real" output. I'd like (some day) to write a "Phase IV" in
> C++, if I can figure out a nice way to link it with the web server
> (e.g., how to read <form> content). Is there a tutorial or something on
> the web?

This is generally done on a web server by web server basis.
You can read about:
Apache API (http://httpd.apache.org/docs/misc/API.html)
ISAPI (http://www.microsoft.com/iis)
NSAPI (http://www.le-berre.com/nsapi/nsapi.htm)

Enjoy...

> >Caching known links in groups is indeed someting I will try, but
> >in their own table to reduce lock contention.
>
> OTOH, we're reading the article data from the database anyway;
> transmitting another field with it won't do much harm, I think.
>
> >Another thing I want
> >to look into is caching link lookups in a separate process outside
> >the page rendering, so the cache can be persistent across pages.
>
> Sounds like trouble to me :-)
Re: Page caching [ In reply to ]
"Lee Daniel Crocker" skribis:

> I think I can probably work around header numbering by
> always including them, putting them in a <span> and setting
> the visibility in CSS. That will have the side-effect that
> old non-CSS browsers will always get numbered headings
> whether they want them or not. I can live with that.

What about putting the numbers only by CSS?

One could use counter-increment/counter-reset and
counter(..) to show them.
An example i found here:
http://jendryschik.de/wsdev/einfuehrung/beispiele/counter.html
(Here is the explanation:
http://jendryschik.de/wsdev/einfuehrung/css/generierter-content.html#zaehler
(in german).)


Paul