Mailing List Archive

Re: [Wikitext-l] Parsoid template transclusion behavior
[. Resending since I forgot to copy all lists -- please don't mind the
duplicate response on wikitext-l. ]

Our primary goal with Parsoid today is to ensure maximum compatibility
with the current default parser -- without that, it would be impossible
to switch over to Parsoid for all page rendering use cases.

But, at the core, Parsoid's design has always pursued a processing model
where content (fragment) generators (whether templates, extensions,
parser functions, or in the future wiki functions or other page
components) are decoupled from the page where they are embedded. This
lets us process them independently and incorporate those generated
fragments efficiently. Parsoid uses this model for extensions already.
But, that model hasn't held up for templates as they are implemented
today because of how they are used and what they generate (snippets of
text that can be full or partial attributes, mix of attributes and
content, parts of tables) -- table use cases being the most egregious of
those.

So given these practical realities, the simplest course of action for us
to handle templates today is to have them be fully expanded as textual
strings and do additional processing within Parsoid. But, Parsoid still
is able to clearly demarcate page content that comes from templates (and
other content generators) even where the template content combines with
page level content in some complex ways (some caused by table content
markup errors causing content fostering -- a source of unnecessary
complexity and headaches for us).

Our goal is to start moving towards the original decoupled processing
model for templates as well, but only after we are able to switch over
to Parsoid more fully and that is looking closer than ever at this
point. But, that is going to be a gradual evolution -- there are various
proposals we have considered in the past here, but typing is probably
the overarching concept that ties all those ideas together.

Hope that answers your primary question. Some additional tangential
details below while I am at it.

<tangent>All that said, I wouldn't invest too much time analyzing the
contents of that page and the notions of single-pass or multi-pass or
PEG vs not-PEG, etc. Those are somewhat immaterial implementation
details. I am not sure I would describe Parsoid as a single-pass model
today. It is single-pass in only so far as it processes the textual
string in one pass. But, otherwise, the generated tokens are processed
multiple times as they are transformed. The DOM that is built up is
processed multiple times ... so, if anything, Parsoid has a lot more
(20+) passes. Separately, given that we cannot really process the
wikitext stream to a fully processed semantic tree (because of the
nature of wikitext), we could have used other ways of generating tokens
along with corresponding token transformers to get the same end result.
Since it is mostly water under the bridge now, we haven't really
investigated the route of how this might have looked if we had used
traditional LALR techniques (as long as we realize the output of that
grammar would just be a different set of tokens, not a conventional
AST). I am mostly mentioning this tangent to emphasize that our goal
here is not to arrive at a formal (implementation) grammar in the
traditional programming language sense, but rather to transition to a
different (decoupled / typed) processing model while preserving
compatibility in the interim and while giving us feasible migration
paths to that model.</tangent>

Subbu.

On 2/16/24 23:10, psnbaotg wrote:
> Hello,
>
> Recently I'm researching Parsoid's design as MW is migrating to
> Parsoid. I found out that due to its single-pass tokenizing design,
> templates are not handled textually as the legacy parser does.
>
> This is good as the HTML now have information about which template
> they are transcluded from. However,
> https://www.mediawiki.org/wiki/Parsoid/limitations says "We have since
> decided to use the PHP preprocessor for template expansions, which
> side-steps these issues by reverting to the traditional textual
> preprocessor pass". Is this still true now?
>
> Best regards,
> Diskdance
>
> _______________________________________________
> Wikitext-l mailing list --wikitext-l@lists.wikimedia.org
> To unsubscribe send an email towikitext-l-leave@lists.wikimedia.org