Mailing List Archive

Score, Kubernetes and switching to Shellbox
Hi everyone,

tl;dr: External shell outs are now run via Shellbox. Any deployed code
needs to use Shellbox/BoxedCommand, and documentation is available to
help migrate.

To safely re-enable Score (LilyPond) on Wikimedia wikis, we developed
Shellbox, a way to run shell commands in a remote, isolated container.
This is (hopefully) a stronger level of isolation than we previously had
with firejail, since it's relying on Linux containers and Kubernetes to
do the isolation. At the same time, this helps us in moving towards
running MediaWiki on Kubernetes, as we don't want to include all these
external commands inside the MediaWiki container. For the most part, any
new shelling out to external commands needs to be done via Shellbox.

A lot of the design and rationale behind Shellbox is captured in the
RfC: <https://phabricator.wikimedia.org/T260330>.

In Wikimedia production, so far Score, Timeline, SyntaxHighlight and
Wikidata constraint regex checking are all using Shellbox. Details about
that and links to dashboards are available at
<https://wikitech.wikimedia.org/wiki/Shellbox>. The main things that are
left are media-handling code that extracts metadata: DjVu, PdfHandler
and PagedTiffHandler, which is tracked at
<https://phabricator.wikimedia.org/T289228>, and videoscaling
(TimedMediaHandler).

Some work has to be done in MediaWiki to make code compatible with
Shellbox, specifically switching to "BoxedCommand", which now has its
own documentation page:
<https://www.mediawiki.org/wiki/Manual:BoxedCommand>. BoxedCommand works
transparently whether you have a separate Shellbox service set up or
not. This is the preferred way to write new shellouts going forward,
though Shell::command() isn't officially deprecated yet. So far all
shellouts that are used in Wikimedia production have already been
converted except for TimedMediaHandler.

Looking forward, I think this also gives us a lot of flexibility in
using more external commands in the future. First, we're less tied to
whatever OS version MediaWiki is running on, as long as it can be
built/shipped in a container, we can use it. And secondly, it's probably
OK if external commands aren't super well behaved (e.g. use too much
memory) since they're no longer sharing the same resources as an
appserver (this shouldn't be interpreted as a free pass for super
inefficient stuff of course).

I tried to keep this summary short, and am intending to write a longer
blog post that explains some more history in detail. But if you have any
questions or something isn't clear, please ask!

-- Kunal
_______________________________________________
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
Re: Score, Kubernetes and switching to Shellbox [ In reply to ]
Nice work, all!

On Tue, Oct 5, 2021 at 5:47 PM Kunal Mehta <legoktm@debian.org> wrote:

>
> Hi everyone,
>
> tl;dr: External shell outs are now run via Shellbox. Any deployed code
> needs to use Shellbox/BoxedCommand, and documentation is available to
> help migrate.
>
> To safely re-enable Score (LilyPond) on Wikimedia wikis, we developed
> Shellbox, a way to run shell commands in a remote, isolated container.
> This is (hopefully) a stronger level of isolation than we previously had
> with firejail, since it's relying on Linux containers and Kubernetes to
> do the isolation. At the same time, this helps us in moving towards
> running MediaWiki on Kubernetes, as we don't want to include all these
> external commands inside the MediaWiki container. For the most part, any
> new shelling out to external commands needs to be done via Shellbox.
>
> A lot of the design and rationale behind Shellbox is captured in the
> RfC: <https://phabricator.wikimedia.org/T260330>.
>
> In Wikimedia production, so far Score, Timeline, SyntaxHighlight and
> Wikidata constraint regex checking are all using Shellbox. Details about
> that and links to dashboards are available at
> <https://wikitech.wikimedia.org/wiki/Shellbox>. The main things that are
> left are media-handling code that extracts metadata: DjVu, PdfHandler
> and PagedTiffHandler, which is tracked at
> <https://phabricator.wikimedia.org/T289228>, and videoscaling
> (TimedMediaHandler).
>
> Some work has to be done in MediaWiki to make code compatible with
> Shellbox, specifically switching to "BoxedCommand", which now has its
> own documentation page:
> <https://www.mediawiki.org/wiki/Manual:BoxedCommand>. BoxedCommand works
> transparently whether you have a separate Shellbox service set up or
> not. This is the preferred way to write new shellouts going forward,
> though Shell::command() isn't officially deprecated yet. So far all
> shellouts that are used in Wikimedia production have already been
> converted except for TimedMediaHandler.
>
> Looking forward, I think this also gives us a lot of flexibility in
> using more external commands in the future. First, we're less tied to
> whatever OS version MediaWiki is running on, as long as it can be
> built/shipped in a container, we can use it. And secondly, it's probably
> OK if external commands aren't super well behaved (e.g. use too much
> memory) since they're no longer sharing the same resources as an
> appserver (this shouldn't be interpreted as a free pass for super
> inefficient stuff of course).
>
> I tried to keep this summary short, and am intending to write a longer
> blog post that explains some more history in detail. But if you have any
> questions or something isn't clear, please ask!
>
> -- Kunal
> _______________________________________________
> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
> To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org
> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
>
Re: Score, Kubernetes and switching to Shellbox [ In reply to ]
???? ???? ????

Sounds good for things like OCR scans, and book generation, the latter
being pushed to external wmf cloud resources.

Thanks for your work in this space. Sounds as if this will give
extensions a lot more scope for interesting things.

-- billinghurst

------ Original Message ------
From: "Kunal Mehta" <legoktm@debian.org>
To: "wikitech-l" <wikitech-l@lists.wikimedia.org>
Sent: 6/10/2021 9:46:13 AM
Subject: [Wikitech-l] Score, Kubernetes and switching to Shellbox

>
>Hi everyone,
>
>tl;dr: External shell outs are now run via Shellbox. Any deployed code needs to use Shellbox/BoxedCommand, and documentation is available to help migrate.
>
>To safely re-enable Score (LilyPond) on Wikimedia wikis, we developed Shellbox, a way to run shell commands in a remote, isolated container. This is (hopefully) a stronger level of isolation than we previously had with firejail, since it's relying on Linux containers and Kubernetes to do the isolation. At the same time, this helps us in moving towards running MediaWiki on Kubernetes, as we don't want to include all these external commands inside the MediaWiki container. For the most part, any new shelling out to external commands needs to be done via Shellbox.
>
>A lot of the design and rationale behind Shellbox is captured in the RfC: <https://phabricator.wikimedia.org/T260330>.
>
>In Wikimedia production, so far Score, Timeline, SyntaxHighlight and Wikidata constraint regex checking are all using Shellbox. Details about that and links to dashboards are available at <https://wikitech.wikimedia.org/wiki/Shellbox>. The main things that are left are media-handling code that extracts metadata: DjVu, PdfHandler and PagedTiffHandler, which is tracked at <https://phabricator.wikimedia.org/T289228>, and videoscaling (TimedMediaHandler).
>
>Some work has to be done in MediaWiki to make code compatible with Shellbox, specifically switching to "BoxedCommand", which now has its own documentation page: <https://www.mediawiki.org/wiki/Manual:BoxedCommand>. BoxedCommand works transparently whether you have a separate Shellbox service set up or not. This is the preferred way to write new shellouts going forward, though Shell::command() isn't officially deprecated yet. So far all shellouts that are used in Wikimedia production have already been converted except for TimedMediaHandler.
>
>Looking forward, I think this also gives us a lot of flexibility in using more external commands in the future. First, we're less tied to whatever OS version MediaWiki is running on, as long as it can be built/shipped in a container, we can use it. And secondly, it's probably OK if external commands aren't super well behaved (e.g. use too much memory) since they're no longer sharing the same resources as an appserver (this shouldn't be interpreted as a free pass for super inefficient stuff of course).
>
>I tried to keep this summary short, and am intending to write a longer blog post that explains some more history in detail. But if you have any questions or something isn't clear, please ask!
>
>-- Kunal
_______________________________________________
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/