Mailing List Archive

Is preprocessDump.php still useful?
This script was created in 2011 and takes an offline XML dump file,
containing page content wikitext, and feeds its entries through the
Preprocessor without actually importing any content into the wiki.

The documented purpose of the script is to "get statistics" or "fill the
cache". I was unable to find any stats being emitted. I did find that the
method called indeed fills "preprocess-hash" cache keys, which have a TTL
of 24 hours (e.g. Memcached).

I could not think of a use case for this and am wondering if anyone
remembers its original purpose and/or knows of a current need for it.

-- Timo

[1] First commit: http://mediawiki.org/wiki/Special:Code/MediaWiki/80466
[2] Commit history:
https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/core/+log/1.35.0/maintenance/preprocessDump.php

PS: This came up because there's a minor refactor proposed to the script,
and I was wondering how to test it and whether we it makes sense to
continue maintenance and support for it.
https://gerrit.wikimedia.org/r/641323
Re: Is preprocessDump.php still useful? [ In reply to ]
As far as Analytics / Statistics are concerned, this is just an interesting
artifact. The problems we have to solve while reading and processing XML
dumps are quite different (custom file formats to handle files bigger than
a single HDFS block, etc). Safe to delete in my opinion, the less code the
better.

On Wed, Jan 20, 2021 at 8:40 PM Krinkle <krinklemail@gmail.com> wrote:

> This script was created in 2011 and takes an offline XML dump file,
> containing page content wikitext, and feeds its entries through the
> Preprocessor without actually importing any content into the wiki.
>
> The documented purpose of the script is to "get statistics" or "fill the
> cache". I was unable to find any stats being emitted. I did find that the
> method called indeed fills "preprocess-hash" cache keys, which have a TTL
> of 24 hours (e.g. Memcached).
>
> I could not think of a use case for this and am wondering if anyone
> remembers its original purpose and/or knows of a current need for it.
>
> -- Timo
>
> [1] First commit: http://mediawiki.org/wiki/Special:Code/MediaWiki/80466
> [2] Commit history:
> https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/core/+log/1.35.0/maintenance/preprocessDump.php
>
> PS: This came up because there's a minor refactor proposed to the script,
> and I was wondering how to test it and whether we it makes sense to
> continue maintenance and support for it.
> https://gerrit.wikimedia.org/r/641323
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
Re: Is preprocessDump.php still useful? [ In reply to ]
Drafted removal at:
https://gerrit.wikimedia.org/r/c/mediawiki/core/+/659133

-- Timo


On Thu, Jan 21, 2021 at 1:39 AM Krinkle <krinklemail@gmail.com> wrote:

> This script was created in 2011 and takes an offline XML dump file,
> containing page content wikitext, and feeds its entries through the
> Preprocessor without actually importing any content into the wiki.
>
> The documented purpose of the script is to "get statistics" or "fill the
> cache". I was unable to find any stats being emitted. I did find that the
> method called indeed fills "preprocess-hash" cache keys, which have a TTL
> of 24 hours (e.g. Memcached).
>
> I could not think of a use case for this and am wondering if anyone
> remembers its original purpose and/or knows of a current need for it.
>
> -- Timo
>
> [1] First commit: http://mediawiki.org/wiki/Special:Code/MediaWiki/80466
> [2] Commit history:
> https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/core/+log/1.35.0/maintenance/preprocessDump.php
>
> PS: This came up because there's a minor refactor proposed to the script,
> and I was wondering how to test it and whether we it makes sense to
> continue maintenance and support for it.
> https://gerrit.wikimedia.org/r/641323
>