Mailing List Archive

PTS2023 - Multiple Indices Discussion
(A writeup of a discussion we held at PTS2023)

## The Problem To Be Solved

**Primary usecase**: Authors who want to use new perl syntax/features
in new versions of their modules, without breaking "older perl".
Ideally, allow older perl versions to install prior versions of said
module.

*For example*: Example::Module version 2.5 works on all perl versions.
Author wants to use some new features that appear in 5.36. So, author
bumps version number up to 3.000, sets required perl version to 5.036,
uploads to PAUSE. Ideally, users on older perl versions will still see
version 2.5 if they try to install it.

Secondary usecase: Authors who, having uploaded a newer version of a
module that now has a later minimum perl version requirement, wish to
still release bugfixes for older perl versions. These could be
described as *non-monotonic* releases; releases whose version number is
not a monotonic increase on the previous version.

*For example*: The above author now wishes to fix a bug in
Example::Module version 2.5, so uploads version 2.6. This becomes
available for the older perl versions, while still leaving 3.0
available for those on perl 5.36 or later.

## A Proposed Solution

The overall solution we arrive at is not perfect. It has shortcomings
and limitations. We simply provide it as a "reasonable effort" that
makes things easier for authors to update modules, while also making it
easier for users of older perl versions to still have access to working
modules. We don't claim it solves all problems for all people; it just
has to be better than the current state of affairs.

To support this, we'd need some way to maintain a packages index per
perl version, rather than simply having just one. CPAN client can then
by some mechanism use a more appropriate index tailored to that perl
version.

It is considered infeasible to get PAUSE alone to solve this. We can't
just generate multiple 02packages-VER.txt files, for example.

HAARG is already working on a feature in metacpan that could help this.
The idea is that individual users can publish named "packages
overlays"; a set of changes to make to the actual package index. Each
overlay creates a virtual CPAN mirror that in effect has a modified
index. This would allow an index per perl version to be created, by
means of overlays. These could all live under some new user account
created for this purpose. It doesn't have to be one "official" place.

I (Paul) am unsure exactly what the end-user client process would be
when installing modules through this method, but I imagine in some way
or other a user would configure their CPAN client to use some
alternative mirror/URL/something. Already this feature would be useful
for a bunch of other use-cases besides this one, so it seems a
worth-while thing to create anyway.

This now reduces the problem to a question of how to create those
alternative indexes, now that the above process will solve how to
distribute them to users.

We imagine "some process" (yet to be determined where this runs, by
whom, who maintains it, etc...) that either regularly scans the
"primary" index or by some other means is informed of new distribution
uploads. For each new upload it looks at the required perl version
declared by that module's META.yml. It then enters information about
that distribution into the overlay of that perl version and all later
versions, leaving untouched the prior entries in overlays relating to
older perls.

Thus, over time, the overlay associated with each perl version will
only refer to package distributions that declare via their own metadata
that they work with that version of perl.

Accepting that "we know this isn't perfect", the following observations
can be made. These primarily come from the fact that the index overlays
are a set of changes on top of the official package index.

* Because in the above description the per-perl-version overlay lists
are generated by some external process, there will be a delay
between a new module appearing in the official index, and any
modifications appearing in or being hidden by the per-version
overlays. There therefore exists a small window of time after a
package upload where the overlays might not yet be correct.

* Package overlays, and the process described above to create them,
cannot "hide" modules from the index. There is no way as written to
make a module disappear from an older-perl index. This means that
new modules that appear in new versions of existing distributions
would appear in older perl indexes. This doesn't really impact the
primary use-case for this solution, but it might cause user
confusion. If it is thought necessary to fix this, a more extensible
format for package overlays that allows modules to be removed could
be designed.

One notable advantage of this arrangement is that the majority of
module authors do not have to do anything special. We can construct
indexes out of existing data and maintain them going forward, entirely
by inspecting the minimum perl version declaration that modules already
provide.

## Supporting Non-Monotonic Releases

The above process does not yet solve the secondary use-case; that of
allowing non-monotonic bugfix releases. We discussed a way to solve
this one, that involves more opt-in from the module author to provide
extra data for the per-version indexers to use.

This part of the discussion was much less precisely specified, and
details can be filled in later. In summary: The suggestion was that the
"latest" module meta for a package can (direcrly or indirectly) provide
a mapping from perl versions to versions of its own package, that says
which other version numbers to apply to the per-perl index.

While it *could* be stored directly, that means that in the secondary
use-case described above, the module author would upload an
Example-Module-2.6.tar.gz module release, but then would have to update
the metadata in the module by also releasing an
Example-Module-3.001.tar.gz; a release that might otherwise contain no
changes except to that metadata.

Better then, for the module to indirectly provide this mapping, by
means perhaps of naming another file in some format yet to be
described, that would in that module author's directory on PAUSE.
Additionally, PAUSE would have to allow that file to be replaced
directly

## Summary of Next Actions

* Create the concept of per-user named "package overlays" on
(meta)cpan, allowing users to publish alternate indexes

* Build a process by which a specific (virtual) user can publish
alternate indexes based on older perl versions

* Reconstruct some initial overlay indexes, containing "what these
overlays would have been" had this process always existed

* *Optionally*: Update `CPAN.pm`, `cpanminus`, et.al. to automatically
use the older-perl alternate index when appropriate. Or at least,
offer guidance to users on how they can easily use this.

* Extension: Define the format and process by which authors can
declare non-monotonic version releases for older perls

+ This may require changes to PAUSE, to permit users to replace
certain files of some name pattern with newer content at the same
name

--
Paul "LeoNerd" Evans

leonerd@leonerd.org.uk | https://metacpan.org/author/PEVANS
http://www.leonerd.org.uk/ | https://www.tindie.com/stores/leonerd/
Re: PTS2023 - Multiple Indices Discussion [ In reply to ]
On Sun, Apr 30, 2023 at 04:04:43PM +0100, Paul "LeoNerd" Evans wrote:
> (A writeup of a discussion we held at PTS2023)
>
> ## The Problem To Be Solved
>
> **Primary usecase**: Authors who want to use new perl syntax/features
> in new versions of their modules, without breaking "older perl".
> Ideally, allow older perl versions to install prior versions of said
> module.
>
> *For example*: Example::Module version 2.5 works on all perl versions.
> Author wants to use some new features that appear in 5.36. So, author
> bumps version number up to 3.000, sets required perl version to 5.036,
> uploads to PAUSE. Ideally, users on older perl versions will still see
> version 2.5 if they try to install it.
>
> Secondary usecase: Authors who, having uploaded a newer version of a
> module that now has a later minimum perl version requirement, wish to
> still release bugfixes for older perl versions. These could be
> described as *non-monotonic* releases; releases whose version number is
> not a monotonic increase on the previous version.
>
> *For example*: The above author now wishes to fix a bug in
> Example::Module version 2.5, so uploads version 2.6. This becomes
> available for the older perl versions, while still leaving 3.0
> available for those on perl 5.36 or later.

I've been in the same discussion, and have started to work on part of
a solution. I believe the first part is less complex than made out in
Paul's email, and it can be used to build the second part.

## The PAUSE index

The PAUSE index is keyed by module/package, and points to the CPAN
distribution that provides it. An optional version of the module is
listed in the data, but each package only shows up once per index.

What makes the PAUSE index special is that *it reflects the state of
permissions at the moment of the upload of the distribution*.

When using CPAN.pm to install a *package*, the index is used to map the
requested module/package to a distribution. CPAN.pm installs the
corresponding *distribution*, installing all other packages in it as a
side-effect.

If a package is listed in the index, the author of the distribution had
the proper permissions on it at the time of the distribution upload.

PAUSE *does not* index non-monotonic releases. The distribution file is
available on CPAN, but the PAUSE indexer ignores it completely. Anything
that uses the PAUSE index will never know it was ever uploaded.


## Building an index for older Perl versions

Since the PAUSE indexer ignores non-monotonic releases, I'm going to
ignore them in the first part of this email.

> It is considered infeasible to get PAUSE alone to solve this. We can't
> just generate multiple 02packages-VER.txt files, for example.

Let's assume PAUSE had been generating a 02packages-VER.txt for each
published Perl version (at the time of distribution upload) for years. How
would that work?

I think it would go like this for every upload:

1. get a minimum supported Perl version (MIN) from the
distribution's META file (set MIN = 0 if there no data)
2. add a line for each authorized package in the distribution (i.e. the
distribution author has permissions over that package) to all indices
where VER >= MIN

With that simple hypothetical model, we would have 02packages-VER.txt
files nowadays.


During the Perl Toolchain Summit, I started working on building
this (a packages index per perl version), using the historical
02packages.details.txt information which is published in a git repository
going back to 2012.

The repository is available at https://github.com/batchpause/PAUSE-git
(cloning it takes a long time...)

The idea is to seed each per-version index with the first index in that
repository and, as new releases are indexed by PAUSE and show up in the
historical data repository, only add them to the per version index when
the distribution's metadata says they work with that version of Perl.
(as described above)

This should give us what we want for monotonic releases. Basically,
the index for a given version of Perl will stop getting updates for a
distribution when the metadata says the newer versions don't support
it anymore.

The git repository is updated every half hour, so the per-version
indices will always lag behind the PAUSE index by at least that
much time.

I think this covers the first part of Paul's email.

We can rebuild a per-version set of indices, just from the historical
02packages.details.txt data, without the need for overlays.

> ## Supporting Non-Monotonic Releases

I think for the second part, we can patch the indices described above.

The main issue is making sure that the changes are authoritative, per
the PAUSE model. The META file points to a mapping file, and by virtue
of having been indexed by PAUSE, that mapping file can be trusted
for the modules in the distribution that the author has permission on.
(and only those)

--
Philippe Bruhat (BooK)

There is no solution to a problem of sheer greed.
(Moral from Groo The Wanderer #94 (Epic))
Re: PTS2023 - Multiple Indices Discussion [ In reply to ]
On 30/04/2023 16:04, Paul "LeoNerd" Evans wrote:

> It is considered infeasible to get PAUSE alone to solve this. We can't
> just generate multiple 02packages-VER.txt files, for example.

My apologies for the late reply.

That is exactly what I did with CPxxxAN. That used a copy of the
CPAN-testers database to see what was the latest version of a
distribution which had passing tests for a given operating system and
version of perl, created an 02packages file to suit, and created a CPAN
"mirror" populated with files from a BackPAN mirror.

It used to be available at eg CP5.6.2AN.barnyard.co.uk, or
CP5.8.4-solarisAN.barnyard.co.uk and so on.

> I (Paul) am unsure exactly what the end-user client process would be
> when installing modules through this method, but I imagine in some way
> or other a user would configure their CPAN client to use some
> alternative mirror/URL/something.

Yes, that's exactly what users of cpXXXan did.

It's no longer running, and the person I handed over maintenance to and
ownership of the Github repo hasn't taken it anywhere, but the crappy
code is still available to anyone who wants to resurrect it:
https://github.com/DrHyde/cpXXXan

--
David Cantrell
Re: PTS2023 - Multiple Indices Discussion [ In reply to ]
On Sun, 30 Apr 2023 at 17:05, Paul "LeoNerd" Evans <leonerd@leonerd.org.uk>
wrote:

> (A writeup of a discussion we held at PTS2023)
>
> ## The Problem To Be Solved
>
> **Primary usecase**: Authors who want to use new perl syntax/features
> in new versions of their modules, without breaking "older perl".
> Ideally, allow older perl versions to install prior versions of said
> module.
>
>
What about alternative approach - making each perl version as a library
(and accepting possible performance impact of using newer syntax with old
perl),
reusing pluggable keyword token in grammar to switch between bison parsers
(using of api.prefix).

Currently trying to refactor toke.c so this approach will be possible (from
parser point of view as well as from reusability).

Best regards,
Brano
Re: PTS2023 - Multiple Indices Discussion [ In reply to ]
On 2023-04-30 8:04 a.m., Paul "LeoNerd" Evans wrote:
> **Primary usecase**: Authors who want to use new perl syntax/features
> in new versions of their modules, without breaking "older perl".
> Ideally, allow older perl versions to install prior versions of said
> module.
>
> *For example*: Example::Module version 2.5 works on all perl versions.
> Author wants to use some new features that appear in 5.36. So, author
> bumps version number up to 3.000, sets required perl version to 5.036,
> uploads to PAUSE. Ideally, users on older perl versions will still see
> version 2.5 if they try to install it.
>
> Secondary usecase: Authors who, having uploaded a newer version of a
> module that now has a later minimum perl version requirement, wish to
> still release bugfixes for older perl versions. These could be
> described as *non-monotonic* releases; releases whose version number is
> not a monotonic increase on the previous version.
>
> *For example*: The above author now wishes to fix a bug in
> Example::Module version 2.5, so uploads version 2.6. This becomes
> available for the older perl versions, while still leaving 3.0
> available for those on perl 5.36 or later.
<snip>

Thank you, what you are proposing sounds incredibly useful.

-- Darren Duncan
Re: PTS2023 - Multiple Indices Discussion [ In reply to ]
Hi there,

On Mon, 5 Jun 2023, Darren Duncan wrote:
> On 2023-04-30 8:04 a.m., Paul "LeoNerd" Evans wrote:
>> **Primary usecase**: Authors who want to use new perl syntax/features
>> in new versions of their modules, without breaking "older perl".
>> Ideally, allow older perl versions to install prior versions of said
>> module.
>> [snip]
>
> Thank you, what you are proposing sounds incredibly useful.

+1

--

73,
Ged.