Mailing List Archive

Declare the type of source
Hello all,

long story short:

when there is a major change (new gcc, new libc, and so on), tinderbox takes a
lot of time to test the entire tree.

Let's do a practical example:
A new version of sys-devel/gcc is added to the tree.

There is no way to know how much packages compiles C/C++ code, so the easiest
way is compile the entire tree.

Instead, imagine that each ebuild declares a variable called SOURCETYPE ( or
similar, or in metadata.xml if you prefer ) and with a tool like equery/eix we
are able to get the list of all packages that compiles C code.

The same thing applies to other languages like python, ruby, go and so on
where compile the dev-$language category covers a lot of packages, but there
will be always other ebuilds that uses $language in other categories.

What do you think?

Agostino
Re: Declare the type of source [ In reply to ]
On Mon, 2021-06-28 at 15:00 +0200, Agostino Sarubbo wrote:
> Hello all,
>
> long story short:
>
> when there is a major change (new gcc, new libc, and so on), tinderbox takes a
> lot of time to test the entire tree.
>
> Let's do a practical example:
> A new version of sys-devel/gcc is added to the tree.
>
> There is no way to know how much packages compiles C/C++ code, so the easiest
> way is compile the entire tree.
>
> Instead, imagine that each ebuild declares a variable called SOURCETYPE ( or
> similar, or in metadata.xml if you prefer ) and with a tool like equery/eix we
> are able to get the list of all packages that compiles C code.
>
> The same thing applies to other languages like python, ruby, go and so on
> where compile the dev-$language category covers a lot of packages, but there
> will be always other ebuilds that uses $language in other categories.
>
> What do you think?
>

It's a worthwhile goal but it's practically impossible to get it right.
For example, right now we've had quite a few cases of Python ebuilds
wrongly declaring <stabilize-allarches/>, i.e. people missing use of C
besides Python.

That is, unless you can figure out a way for Portage to reliably detect
SOURCETYPE and tell people what to set.

--
Best regards,
Micha? Górny
Re: Declare the type of source [ In reply to ]
Hi

28.06.2021 14:00, Agostino Sarubbo ?????:
> Hello all,
>
> long story short:
>
> when there is a major change (new gcc, new libc, and so on), tinderbox takes a
> lot of time to test the entire tree.
>
> Let's do a practical example:
> A new version of sys-devel/gcc is added to the tree.
>
> There is no way to know how much packages compiles C/C++ code, so the easiest
> way is compile the entire tree.
>
> Instead, imagine that each ebuild declares a variable called SOURCETYPE ( or
> similar, or in metadata.xml if you prefer ) and with a tool like equery/eix we
> are able to get the list of all packages that compiles C code.
>
> The same thing applies to other languages like python, ruby, go and so on
> where compile the dev-$language category covers a lot of packages, but there
> will be always other ebuilds that uses $language in other categories.
>
> What do you think?
>
> Agostino
>
>

I can easily imagine a scenario where some pure perl package would fail
tests because one of indirect dependencies was compiled with clang
instead of gcc.

--
Best regards,
Alexey "DarthGandalf" Sokolov
Re: Declare the type of source [ In reply to ]
On Mon, 2021-06-28 at 15:00 +0200, Agostino Sarubbo wrote:
>
> Instead, imagine that each ebuild declares a variable called SOURCETYPE ( or
> similar, or in metadata.xml if you prefer ) and with a tool like equery/eix we
> are able to get the list of all packages that compiles C code.
>

I think all you are really asking for is that we stop omitting a random
subset of @system from *DEPEND.

This is long overdue, for many reasons, but in particular it would
force us to declare a dependency on a C compiler if one is needed and
allow you to re-test only those packages that use a C compiler.
Re: Declare the type of source [ In reply to ]
On Mon, Jun 28, 2021 at 9:46 AM Michael Orlitzky <mjo@gentoo.org> wrote:
>
> On Mon, 2021-06-28 at 15:00 +0200, Agostino Sarubbo wrote:
> >
> > Instead, imagine that each ebuild declares a variable called SOURCETYPE ( or
> > similar, or in metadata.xml if you prefer ) and with a tool like equery/eix we
> > are able to get the list of all packages that compiles C code.
> >
>
> I think all you are really asking for is that we stop omitting a random
> subset of @system from *DEPEND.
>
> This is long overdue, for many reasons, but in particular it would
> force us to declare a dependency on a C compiler if one is needed and
> allow you to re-test only those packages that use a C compiler.

++ - this would also support parallel building of @system.

Obviously we'll still need a core set of packages needed for
bootstrapping/etc, but there is no reason @system couldn't just be
another virtual.

You could also have convenience virtuals for things like the C
toolchain and so on. This will both support alternate implementations
and avoid having to have laundry lists of deps in every ebuild.

A simple way to transition would be to create a system virtual and add
it to all ebuilds, but ask that this be removed in future updates in
favor of more specific dependencies. Over time then the tree would
move to specified true deps. Catalyst could still use a virtual as a
target for bootstrapping stages.

Another tool that would be useful is what some other distros do - use
mount namespaces/etc to allow build systems to only see parts of the
filesystem (down to the file level) that are specified in
dependencies. This would basically eliminate unspecified or automagic
dependencies, since anything not specified basically doesn't exist at
build time. If you didn't want to use mount namespaces then our
sandbox already allows limiting read access to only specified files -
we just configure it to allow read-only to everything for every
package.

--
Rich
Re: Declare the type of source [ In reply to ]
Am Montag, 28. Juni 2021, 16:13:41 CEST schrieb Rich Freeman:
> On Mon, Jun 28, 2021 at 9:46 AM Michael Orlitzky <mjo@gentoo.org> wrote:
> >
> > On Mon, 2021-06-28 at 15:00 +0200, Agostino Sarubbo wrote:
> > >
> > > Instead, imagine that each ebuild declares a variable called SOURCETYPE ( or
> > > similar, or in metadata.xml if you prefer ) and with a tool like equery/eix we
> > > are able to get the list of all packages that compiles C code.
> > >
> >
> > I think all you are really asking for is that we stop omitting a random
> > subset of @system from *DEPEND.
> >
> > This is long overdue, for many reasons, but in particular it would
> > force us to declare a dependency on a C compiler if one is needed and
> > allow you to re-test only those packages that use a C compiler.
>
> ++ - this would also support parallel building of @system.
>
> Obviously we'll still need a core set of packages needed for
> bootstrapping/etc, but there is no reason @system couldn't just be
> another virtual.
>
> You could also have convenience virtuals for things like the C
> toolchain and so on. This will both support alternate implementations
> and avoid having to have laundry lists of deps in every ebuild.
>
> A simple way to transition would be to create a system virtual and add
> it to all ebuilds, but ask that this be removed in future updates in
> favor of more specific dependencies. Over time then the tree would
> move to specified true deps. Catalyst could still use a virtual as a
> target for bootstrapping stages.
>
> Another tool that would be useful is what some other distros do - use
> mount namespaces/etc to allow build systems to only see parts of the
> filesystem (down to the file level) that are specified in
> dependencies. This would basically eliminate unspecified or automagic
> dependencies, since anything not specified basically doesn't exist at
> build time. If you didn't want to use mount namespaces then our
> sandbox already allows limiting read access to only specified files -
> we just configure it to allow read-only to everything for every
> package.
>
>

Hello,

I was already writing an answer, which describes basically the same idea,
when Rich's mail arrived. I want to post my mail anyway. Maybe it provides
some additional information:

Wouldn't the right place be in BDEPEND, maybe hidden by some eclass magic?

Some time ago, I have looked into Nix. They try to get reliable input by path
manipulation and therefore can depend on the compiler (with a specific
version). If I get there build system right, it only builds a software if
all dependencies are installed beforehand in a specific input specific
folder.

I'm questioning myself, if this is also possible in a Gentoo compatible
way with Linux namespaces?:
1. Create a new namespace for / (consisting of no files).
2. Bindmount/Link every file of each dependency into it at the exact same place.
3. Link some socket(?) to communicate with the outer portage.
4. Trigger the build process.

I imagine something like the the TemporaryFileSystem feature of Systemd
together with BindPaths [1]. This uses Linux namespaces internally, too.

In a pseudo service file syntax:
```
[Service]
ExecStart=ebuild mytool-1.0.2 compile
TemporaryFileSystem=:ro
BindPaths=$(equery files $(equery depgraph =mytool-1.0.2))
```

This should only build, if _all_ build dependencies are present
(including every compiler and base system tool). Of course, it needs a
bigger rework of the portage build process.

Gerion

[1] https://www.freedesktop.org/software/systemd/man/systemd.exec.html#BindPaths=
Re: Declare the type of source [ In reply to ]
On Mon, 2021-06-28 at 09:46 -0400, Michael Orlitzky wrote:
> On Mon, 2021-06-28 at 15:00 +0200, Agostino Sarubbo wrote:
> >
> > Instead, imagine that each ebuild declares a variable called SOURCETYPE ( or
> > similar, or in metadata.xml if you prefer ) and with a tool like equery/eix we
> > are able to get the list of all packages that compiles C code.
> >
>
> I think all you are really asking for is that we stop omitting a random
> subset of @system from *DEPEND.
>
> This is long overdue, for many reasons, but in particular it would
> force us to declare a dependency on a C compiler if one is needed and
> allow you to re-test only those packages that use a C compiler.
>

Which C compiler? The one I have in CC for the package in question, or
the one selected via gcc-config, or any random C compiler that might not
be used at all? Does that mean that if clang is pulled via depgraph,
Portage will insist on depcleaning gcc by default?

--
Best regards,
Micha? Górny
Re: Declare the type of source [ In reply to ]
On Mon, 2021-06-28 at 16:31 +0200, Gerion Entrup wrote:
>
> This should only build, if _all_ build dependencies are present
> (including every compiler and base system tool). Of course, it needs a
> bigger rework of the portage build process.

We had a GSoC project that aimed to do something like this back in
2011:

https://gitweb.gentoo.org/proj/autodep.git/

At this point, we'd be starting over from scratch, but in short: it's a
good idea. Filling out the list of dependencies should be boring and
fool-proof.
Re: Declare the type of source [ In reply to ]
On Mon, 2021-06-28 at 16:52 +0200, Micha? Górny wrote:
> >
> > This is long overdue, for many reasons, but in particular it would
> > force us to declare a dependency on a C compiler if one is needed and
> > allow you to re-test only those packages that use a C compiler.
> >
>
> Which C compiler? The one I have in CC for the package in question, or
> the one selected via gcc-config, or any random C compiler that might not
> be used at all? Does that mean that if clang is pulled via depgraph,
> Portage will insist on depcleaning gcc by default?
>

If the package declares a dependency on e.g. virtual/c-compiler, ago
would want to re-test it whenever a new version of any compiler is
released that satisfies virtual/c-compiler. Conversely, if the package
doesn't require virtual/c-compiler, we may assume that it doesn't
compile C code, and is not affected by the new version.

I know switching toolchains is a mess at the moment, but independently
adding (correct) dependency information should make many things easier
without harming anything else. The precise nature of the dependencies
would of course need some thought.
Re: Declare the type of source [ In reply to ]
Am Montag, 28. Juni 2021, 15:03:59 CEST schrieb Micha? Górny:
> On Mon, 2021-06-28 at 15:00 +0200, Agostino Sarubbo wrote:
> > Hello all,
> >
> > long story short:
> >
> > when there is a major change (new gcc, new libc, and so on), tinderbox
> > takes a lot of time to test the entire tree.
> >
> > Let's do a practical example:
> > A new version of sys-devel/gcc is added to the tree.
> >
> > There is no way to know how much packages compiles C/C++ code, so the
> > easiest way is compile the entire tree.
> >
> > Instead, imagine that each ebuild declares a variable called SOURCETYPE (
> > or similar, or in metadata.xml if you prefer ) and with a tool like
> > equery/eix we are able to get the list of all packages that compiles C
> > code.
> >
> > The same thing applies to other languages like python, ruby, go and so on
> > where compile the dev-$language category covers a lot of packages, but
> > there will be always other ebuilds that uses $language in other
> > categories.
> >
> > What do you think?
>
> It's a worthwhile goal but it's practically impossible to get it right.
> For example, right now we've had quite a few cases of Python ebuilds
> wrongly declaring <stabilize-allarches/>, i.e. people missing use of C
> besides Python.
>
> That is, unless you can figure out a way for Portage to reliably detect
> SOURCETYPE and tell people what to set.

When I read this my initial idea was like that:

-if SOURCETYPE is not given it is assumed to be *, i.e. the package depends on
any language when rebuilding much like it would be now
-if SOURCETYPE is given then tc-getcc and friends would start returning /bin/
false for every language compiler not enabled, or better just call die
-similarly CC, CXX and friends are exported to /bin/false, which should catch
a lot of ebuilds that use the compilers without using tc-get* now

Sounds like EAPI=9 ;)

For now one could stuff this into an eclass to get the ebuilds right until
portage actually makes use of the variable to simplify the build tree.

Eike
Re: Declare the type of source [ In reply to ]
On luned? 28 giugno 2021 17:07:57 CEST Michael Orlitzky wrote:
> If the package declares a dependency on e.g. virtual/c-compiler, ago
> would want to re-test it whenever a new version of any compiler is
> released that satisfies virtual/c-compiler. Conversely, if the package
> doesn't require virtual/c-compiler, we may assume that it doesn't
> compile C code, and is not affected by the new version.

I need to admit that your solution is more simplest because there is nothing
to implement.

We can create a new category (like virtual) called tinderbox, then for example
we could have:
tinderbox/c
tinderbox/c++
tinderbox/go

and so on.

Those tinderbox 'packages' added as DEPEND must not pull a default compiler or
so, instead they will not pull anything.

They are there with the purpose of show the output of something like:
equery depends tinderbox/c


Agostino
Re: Declare the type of source [ In reply to ]
On Mon, Jun 28, 2021 at 11:58 AM Agostino Sarubbo <ago@gentoo.org> wrote:
>
> On lunedì 28 giugno 2021 17:07:57 CEST Michael Orlitzky wrote:
> > If the package declares a dependency on e.g. virtual/c-compiler, ago
> > would want to re-test it whenever a new version of any compiler is
> > released that satisfies virtual/c-compiler. Conversely, if the package
> > doesn't require virtual/c-compiler, we may assume that it doesn't
> > compile C code, and is not affected by the new version.
>
> I need to admit that your solution is more simplest because there is nothing
> to implement.
>
> We can create a new category (like virtual) called tinderbox, then for example
> we could have:
> tinderbox/c
> tinderbox/c++
> tinderbox/go
>
> and so on.
>
> Those tinderbox 'packages' added as DEPEND must not pull a default compiler or
> so, instead they will not pull anything.

This seems unnecessarily complex. Why not make them REAL virtuals.
If your package depends on any C compiler then have it pull in
virtual/c-compiler (or whatever we want to call it). If your package
depends on gcc explicitly, then just depend on gcc. With the system
of empty virtuals you provide you then need to have logic for what the
virtuals "really" mean since they don't actually pull in anything, and
they're also useless for actual dependency-resolution as a bonus.

> They are there with the purpose of show the output of something like:
> equery depends tinderbox/c

The problem with this is that if something works with gcc and not with
clang, and clang changes, you test it anyway, because you have these
hard-coded definitions of what "c" is. If you use real virtuals, then
you just find all the reverse deps of clang and that is what you need
to test, because they're just normal dependencies.

You don't actually have to implement the full removal of the @system
special logic to do this. You can specify things that are in @system
as dependencies even if our policies don't currently require it.
Listing two paths to the same dependency doesn't hurt anything as far
as I'm aware.

--
Rich