Mailing List Archive

RFC: Parsoid & Extensions
[.---- Long mail - but only relevant to extension developers ----]

Greetings!

As some of you might know, on the Parsing Team [0], we are aspiring to
replace the core wikitext parser with Parsoid [1] on Wikimedia wikis late
next year and start to put to rest the two-parser ghost that has haunted us
for many years. In recent years, we achieved two major milestones along
the way: replace HTML4 tidy with HTML5 Remex [2], and port Parsoid from
Javascript to PHP [3].

Given that context, if you (help) maintain an extension that:

* uses a "parser hook" and/or
* uses the "parser API" (i.e. uses public properties / methods in
  Parser.php, ParserOutput.php, ParserOptions.php, etc.)

please read on. If you don't fit that description, you can stop reading
now!

Parsoid models and processes wikitext quite differently from the
core parser - all that Parsoid guarantees is that the rendering is largely
identical, not the specific process of generating the rendering. This
means that extensions that extend the behavior of the parser will need to
adapt to work with Parsoid instead to provide similar functionality. With
that in mind, we have been working to more clearly specify how extensions
need to adapt to the Parsoid regime.

PARSOID & EXTENSIONS:

At a high level, here are the questions we needed to answer, along with
some highly simplified answers:

1. How do extensions "hook" into Parsoid?
A. Extensions need to think in terms of transformations (convert this
   to that) instead of parser pipeline events (at this point in the
   pipeline, call this listener). An additional detail here is that
   extensions cannot maintain global ordered state within extension code
   since Parsoid doesn't guarantee handlers will be invoked in the same
   order in which they showed up in page source. See the wiki [4] for
   more details.

   As for the mechanics of registration, Parsoid uses existing mechanisms
   based on the extension.json file.

2. When the registered hook listeners are invoked by Parsoid, how do they
   process any wikitext they need to process?
A. Parsoid provides all registered listeners with an API object to interact
   with it. Direct use of Parsoid internals code is strongly discouraged
   and will be enforced in various ways including via code review.

3. How is the extension's output assimilated into the page output?
A. The output is treated as a "fully-processed" page/DOM fragment (with
   some caveats which will be clarified on wiki). It is appropriately
   decorated with additional markup, and slotted into place into the page.
   Extensions need not make any special efforts (aka strip state) to
   protect it from the parsing pipeline.

Slides 8-12 of the August 12 2020 Tech Talk [7] goes over the differences.
Check the wiki [4] for more details of Parsoid's Extension API. It also
maps core parser hooks to Parsoid's extension functionality.

CURRENT STATUS:

We consider the current proposal to be in late draft stage. That said, as
we discover unsupported functionality, we will augment the set of hooks and
the Parsoid Extension API as needed.

While there are a wide variety of extensions in the MediaWiki universe
with varied use cases, our initial goal for the next year is just Wikimedia
wikis and hence extensions that are deployed on the Wikimedia wikis.
Once we are done with that, we will turn our attention to supporting
extension use cases in the wider MediaWiki universe. But, now is a
good time for all extension developers to study and review this API
and give us feedback.

Since the beginning of this year, we've refactored all of the extensions
we've written Parsoid versions of (Cite, Gallery, Poem, Pre, JSON) to
now strictly use the Parsoid Extension API without cheating by virtue
of being in the Parsoid codebase. So, this proposal is actually backed
by an implementation that is in production for Wikimedia wikis.

FEEDBACK:

Here is where you come in.

* If you maintain / develop an extension, please review the document
  to see if your extension's use case is covered.

  Ideally, leave your feedback on the Parsoid Extension API talk page [5]
  since it helps keep it all in one place. Alternatively, you can also
  leave questions / concerns / other feedback on the Phabricator task
  we've filed for TechCom's RFC process [6].

* If you feel bold, start the process of updating your extensions *now*.
  Note that your extension will need to operate with both the existing
  core parser as well as Parsoid till such time we deprecate and stop
  using the core parser.

  There are known functionality gaps related to exposing ParserOutput
  object and providing setFunctionHook functionality. If your extension
  needs those, you should probably wait for us to fill that gap.

DOCS / MORE INFO / CONTACT:

* Check the wiki page [4] for docs and discuss on the talk page [5]
* Check the August 12, 2020 Tech Talk [7]
* Look at Parsoid code for extensions [8]
* Look at Parsoid docs for the Ext/ namespace [9]
* Talk to us on IRC in the #mediawiki-parsoid channel
* Email us at parsing-team@wikimedia.org

Thanks!
Subbu (on behalf of the Parsing Team).

-------------------------------------------------------------------------

0. https://www.mediawiki.org/wiki/Parsing
1. https://www.mediawiki.org/wiki/Parsing/Parser_Unification
2. https://blog.wikimedia.org/2018/07/09/tidy-html5-replacement/
3.
https://techblog.wikimedia.org/2020/02/12/parsoid-in-php-or-there-and-back-again/

4. https://www.mediawiki.org/wiki/Parsoid/Extension_API
5. https://www.mediawiki.org/wiki/Parsoid/Talk:Extension_API
6. https://phabricator.wikimedia.org/T260714
7. Slides:
https://commons.wikimedia.org/wiki/File:Parsoid_%26_Extensions_August_2020_Tech_Talk.pdf

   Video: https://www.youtube.com/watch?v=lS1xPkERWCM
8. https://github.com/wikimedia/parsoid/tree/master/src/Ext
9. https://doc.wikimedia.org/Parsoid-PHP/master/
_______________________________________________
MediaWiki-l mailing list
To unsubscribe, go to:
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
Re: RFC: Parsoid & Extensions [ In reply to ]
Hey Subbu,
Is there an easy way to determine whether or not my extensions are using
parser hooks? For example, a canonical list of hooks I can grep for in my
code?

On Mon, Sep 14, 2020 at 1:17 PM Subramanya Sastry <ssastry@wikimedia.org>
wrote:

> [.---- Long mail - but only relevant to extension developers ----]
>
> Greetings!
>
> As some of you might know, on the Parsing Team [0], we are aspiring to
> replace the core wikitext parser with Parsoid [1] on Wikimedia wikis late
> next year and start to put to rest the two-parser ghost that has haunted us
> for many years. In recent years, we achieved two major milestones along
> the way: replace HTML4 tidy with HTML5 Remex [2], and port Parsoid from
> Javascript to PHP [3].
>
> Given that context, if you (help) maintain an extension that:
>
> * uses a "parser hook" and/or
> * uses the "parser API" (i.e. uses public properties / methods in
> Parser.php, ParserOutput.php, ParserOptions.php, etc.)
>
> please read on. If you don't fit that description, you can stop reading
> now!
>
> Parsoid models and processes wikitext quite differently from the
> core parser - all that Parsoid guarantees is that the rendering is largely
> identical, not the specific process of generating the rendering. This
> means that extensions that extend the behavior of the parser will need to
> adapt to work with Parsoid instead to provide similar functionality. With
> that in mind, we have been working to more clearly specify how extensions
> need to adapt to the Parsoid regime.
>
> PARSOID & EXTENSIONS:
>
> At a high level, here are the questions we needed to answer, along with
> some highly simplified answers:
>
> 1. How do extensions "hook" into Parsoid?
> A. Extensions need to think in terms of transformations (convert this
> to that) instead of parser pipeline events (at this point in the
> pipeline, call this listener). An additional detail here is that
> extensions cannot maintain global ordered state within extension code
> since Parsoid doesn't guarantee handlers will be invoked in the same
> order in which they showed up in page source. See the wiki [4] for
> more details.
>
> As for the mechanics of registration, Parsoid uses existing mechanisms
> based on the extension.json file.
>
> 2. When the registered hook listeners are invoked by Parsoid, how do they
> process any wikitext they need to process?
> A. Parsoid provides all registered listeners with an API object to interact
> with it. Direct use of Parsoid internals code is strongly discouraged
> and will be enforced in various ways including via code review.
>
> 3. How is the extension's output assimilated into the page output?
> A. The output is treated as a "fully-processed" page/DOM fragment (with
> some caveats which will be clarified on wiki). It is appropriately
> decorated with additional markup, and slotted into place into the page.
> Extensions need not make any special efforts (aka strip state) to
> protect it from the parsing pipeline.
>
> Slides 8-12 of the August 12 2020 Tech Talk [7] goes over the differences.
> Check the wiki [4] for more details of Parsoid's Extension API. It also
> maps core parser hooks to Parsoid's extension functionality.
>
> CURRENT STATUS:
>
> We consider the current proposal to be in late draft stage. That said, as
> we discover unsupported functionality, we will augment the set of hooks and
> the Parsoid Extension API as needed.
>
> While there are a wide variety of extensions in the MediaWiki universe
> with varied use cases, our initial goal for the next year is just Wikimedia
> wikis and hence extensions that are deployed on the Wikimedia wikis.
> Once we are done with that, we will turn our attention to supporting
> extension use cases in the wider MediaWiki universe. But, now is a
> good time for all extension developers to study and review this API
> and give us feedback.
>
> Since the beginning of this year, we've refactored all of the extensions
> we've written Parsoid versions of (Cite, Gallery, Poem, Pre, JSON) to
> now strictly use the Parsoid Extension API without cheating by virtue
> of being in the Parsoid codebase. So, this proposal is actually backed
> by an implementation that is in production for Wikimedia wikis.
>
> FEEDBACK:
>
> Here is where you come in.
>
> * If you maintain / develop an extension, please review the document
> to see if your extension's use case is covered.
>
> Ideally, leave your feedback on the Parsoid Extension API talk page [5]
> since it helps keep it all in one place. Alternatively, you can also
> leave questions / concerns / other feedback on the Phabricator task
> we've filed for TechCom's RFC process [6].
>
> * If you feel bold, start the process of updating your extensions *now*.
> Note that your extension will need to operate with both the existing
> core parser as well as Parsoid till such time we deprecate and stop
> using the core parser.
>
> There are known functionality gaps related to exposing ParserOutput
> object and providing setFunctionHook functionality. If your extension
> needs those, you should probably wait for us to fill that gap.
>
> DOCS / MORE INFO / CONTACT:
>
> * Check the wiki page [4] for docs and discuss on the talk page [5]
> * Check the August 12, 2020 Tech Talk [7]
> * Look at Parsoid code for extensions [8]
> * Look at Parsoid docs for the Ext/ namespace [9]
> * Talk to us on IRC in the #mediawiki-parsoid channel
> * Email us at parsing-team@wikimedia.org
>
> Thanks!
> Subbu (on behalf of the Parsing Team).
>
> -------------------------------------------------------------------------
>
> 0. https://www.mediawiki.org/wiki/Parsing
> 1. https://www.mediawiki.org/wiki/Parsing/Parser_Unification
> 2. https://blog.wikimedia.org/2018/07/09/tidy-html5-replacement/
> 3.
>
> https://techblog.wikimedia.org/2020/02/12/parsoid-in-php-or-there-and-back-again/
>
> 4. https://www.mediawiki.org/wiki/Parsoid/Extension_API
> 5. https://www.mediawiki.org/wiki/Parsoid/Talk:Extension_API
> 6. https://phabricator.wikimedia.org/T260714
> 7. Slides:
>
> https://commons.wikimedia.org/wiki/File:Parsoid_%26_Extensions_August_2020_Tech_Talk.pdf
>
> Video: https://www.youtube.com/watch?v=lS1xPkERWCM
> 8. https://github.com/wikimedia/parsoid/tree/master/src/Ext
> 9. https://doc.wikimedia.org/Parsoid-PHP/master/
> _______________________________________________
> MediaWiki-l mailing list
> To unsubscribe, go to:
> https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
>
_______________________________________________
MediaWiki-l mailing list
To unsubscribe, go to:
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
Re: RFC: Parsoid & Extensions [ In reply to ]
https://www.mediawiki.org/wiki/Manual:Hooks has a list of all hooks
although I am not sure if that covers everything.

In any case, all parser hooks have a *Parse* string in them. So, if you
are using some hook and it has that substring, it is definitely a parser
hook.

Subbu.

On 9/14/20 1:14 PM, Ryan Kaldari wrote:
> Hey Subbu,
> Is there an easy way to determine whether or not my extensions are using
> parser hooks? For example, a canonical list of hooks I can grep for in my
> code?
>
> On Mon, Sep 14, 2020 at 1:17 PM Subramanya Sastry <ssastry@wikimedia.org>
> wrote:
>
>> [.---- Long mail - but only relevant to extension developers ----]
>>
>> Greetings!
>>
>> As some of you might know, on the Parsing Team [0], we are aspiring to
>> replace the core wikitext parser with Parsoid [1] on Wikimedia wikis late
>> next year and start to put to rest the two-parser ghost that has haunted us
>> for many years. In recent years, we achieved two major milestones along
>> the way: replace HTML4 tidy with HTML5 Remex [2], and port Parsoid from
>> Javascript to PHP [3].
>>
>> Given that context, if you (help) maintain an extension that:
>>
>> * uses a "parser hook" and/or
>> * uses the "parser API" (i.e. uses public properties / methods in
>> Parser.php, ParserOutput.php, ParserOptions.php, etc.)
>>
>> please read on. If you don't fit that description, you can stop reading
>> now!
>>
>> Parsoid models and processes wikitext quite differently from the
>> core parser - all that Parsoid guarantees is that the rendering is largely
>> identical, not the specific process of generating the rendering. This
>> means that extensions that extend the behavior of the parser will need to
>> adapt to work with Parsoid instead to provide similar functionality. With
>> that in mind, we have been working to more clearly specify how extensions
>> need to adapt to the Parsoid regime.
>>
>> PARSOID & EXTENSIONS:
>>
>> At a high level, here are the questions we needed to answer, along with
>> some highly simplified answers:
>>
>> 1. How do extensions "hook" into Parsoid?
>> A. Extensions need to think in terms of transformations (convert this
>> to that) instead of parser pipeline events (at this point in the
>> pipeline, call this listener). An additional detail here is that
>> extensions cannot maintain global ordered state within extension code
>> since Parsoid doesn't guarantee handlers will be invoked in the same
>> order in which they showed up in page source. See the wiki [4] for
>> more details.
>>
>> As for the mechanics of registration, Parsoid uses existing mechanisms
>> based on the extension.json file.
>>
>> 2. When the registered hook listeners are invoked by Parsoid, how do they
>> process any wikitext they need to process?
>> A. Parsoid provides all registered listeners with an API object to interact
>> with it. Direct use of Parsoid internals code is strongly discouraged
>> and will be enforced in various ways including via code review.
>>
>> 3. How is the extension's output assimilated into the page output?
>> A. The output is treated as a "fully-processed" page/DOM fragment (with
>> some caveats which will be clarified on wiki). It is appropriately
>> decorated with additional markup, and slotted into place into the page.
>> Extensions need not make any special efforts (aka strip state) to
>> protect it from the parsing pipeline.
>>
>> Slides 8-12 of the August 12 2020 Tech Talk [7] goes over the differences.
>> Check the wiki [4] for more details of Parsoid's Extension API. It also
>> maps core parser hooks to Parsoid's extension functionality.
>>
>> CURRENT STATUS:
>>
>> We consider the current proposal to be in late draft stage. That said, as
>> we discover unsupported functionality, we will augment the set of hooks and
>> the Parsoid Extension API as needed.
>>
>> While there are a wide variety of extensions in the MediaWiki universe
>> with varied use cases, our initial goal for the next year is just Wikimedia
>> wikis and hence extensions that are deployed on the Wikimedia wikis.
>> Once we are done with that, we will turn our attention to supporting
>> extension use cases in the wider MediaWiki universe. But, now is a
>> good time for all extension developers to study and review this API
>> and give us feedback.
>>
>> Since the beginning of this year, we've refactored all of the extensions
>> we've written Parsoid versions of (Cite, Gallery, Poem, Pre, JSON) to
>> now strictly use the Parsoid Extension API without cheating by virtue
>> of being in the Parsoid codebase. So, this proposal is actually backed
>> by an implementation that is in production for Wikimedia wikis.
>>
>> FEEDBACK:
>>
>> Here is where you come in.
>>
>> * If you maintain / develop an extension, please review the document
>> to see if your extension's use case is covered.
>>
>> Ideally, leave your feedback on the Parsoid Extension API talk page [5]
>> since it helps keep it all in one place. Alternatively, you can also
>> leave questions / concerns / other feedback on the Phabricator task
>> we've filed for TechCom's RFC process [6].
>>
>> * If you feel bold, start the process of updating your extensions *now*.
>> Note that your extension will need to operate with both the existing
>> core parser as well as Parsoid till such time we deprecate and stop
>> using the core parser.
>>
>> There are known functionality gaps related to exposing ParserOutput
>> object and providing setFunctionHook functionality. If your extension
>> needs those, you should probably wait for us to fill that gap.
>>
>> DOCS / MORE INFO / CONTACT:
>>
>> * Check the wiki page [4] for docs and discuss on the talk page [5]
>> * Check the August 12, 2020 Tech Talk [7]
>> * Look at Parsoid code for extensions [8]
>> * Look at Parsoid docs for the Ext/ namespace [9]
>> * Talk to us on IRC in the #mediawiki-parsoid channel
>> * Email us at parsing-team@wikimedia.org
>>
>> Thanks!
>> Subbu (on behalf of the Parsing Team).
>>
>> -------------------------------------------------------------------------
>>
>> 0. https://www.mediawiki.org/wiki/Parsing
>> 1. https://www.mediawiki.org/wiki/Parsing/Parser_Unification
>> 2. https://blog.wikimedia.org/2018/07/09/tidy-html5-replacement/
>> 3.
>>
>> https://techblog.wikimedia.org/2020/02/12/parsoid-in-php-or-there-and-back-again/
>>
>> 4. https://www.mediawiki.org/wiki/Parsoid/Extension_API
>> 5. https://www.mediawiki.org/wiki/Parsoid/Talk:Extension_API
>> 6. https://phabricator.wikimedia.org/T260714
>> 7. Slides:
>>
>> https://commons.wikimedia.org/wiki/File:Parsoid_%26_Extensions_August_2020_Tech_Talk.pdf
>>
>> Video: https://www.youtube.com/watch?v=lS1xPkERWCM
>> 8. https://github.com/wikimedia/parsoid/tree/master/src/Ext
>> 9. https://doc.wikimedia.org/Parsoid-PHP/master/
>> _______________________________________________
>> MediaWiki-l mailing list
>> To unsubscribe, go to:
>> https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
>>
> _______________________________________________
> MediaWiki-l mailing list
> To unsubscribe, go to:
> https://lists.wikimedia.org/mailman/listinfo/mediawiki-l

_______________________________________________
MediaWiki-l mailing list
To unsubscribe, go to:
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l