Mailing List Archive

Proposal to undeprecate EGO_SUM
Judging from the gentoo-dev@ mailing list discussion [1] about EGO_SUM,
where some voices where in agreement that EGO_SUM has its raison d'être,
while there where no arguments in favor of eventually removing EGO_SUM,
I hereby propose to undeprecate EGO_SUM.

1: https://archives.gentoo.org/gentoo-dev/message/1a64a8e7694c3ee11cd48a58a95f2faa
Re: Proposal to undeprecate EGO_SUM [ In reply to ]
On Mon, 2022-06-13 at 09:44 +0200, Florian Schmaus wrote:
> Judging from the gentoo-dev@ mailing list discussion [1] about EGO_SUM,
> where some voices where in agreement that EGO_SUM has its raison d'être,
> while there where no arguments in favor of eventually removing EGO_SUM,
> I hereby propose to undeprecate EGO_SUM.
>
> 1: https://archives.gentoo.org/gentoo-dev/message/1a64a8e7694c3ee11cd48a58a95f2faa
>

"We've been rehashing the discussion until all opposition got tired
and stopped replying, then we claim everyone agrees".

--
Best regards,
Micha? Górny
Re: Proposal to undeprecate EGO_SUM [ In reply to ]
>>>>> On Mon, 13 Jun 2022, Micha? Górny wrote:

> On Mon, 2022-06-13 at 09:44 +0200, Florian Schmaus wrote:
>> Judging from the gentoo-dev@ mailing list discussion [1] about EGO_SUM,
>> where some voices where in agreement that EGO_SUM has its raison d'être,
>> while there where no arguments in favor of eventually removing EGO_SUM,
>> I hereby propose to undeprecate EGO_SUM.
>>
>> 1: https://archives.gentoo.org/gentoo-dev/message/1a64a8e7694c3ee11cd48a58a95f2faa

Can this be done without requesting changes to package managers?
Previous examples are unexporting variables because their size exceeds
the limit of the Linux kernel [2], or introduction of additional phase
functions that bypass Manifest validation [3].

> "We've been rehashing the discussion until all opposition got tired
> and stopped replying, then we claim everyone agrees".

[2] https://bugs.gentoo.org/721088
[3] https://bugs.gentoo.org/833567
Re: Proposal to undeprecate EGO_SUM [ In reply to ]
On 13/06/2022 10.29, Micha? Górny wrote:
> On Mon, 2022-06-13 at 09:44 +0200, Florian Schmaus wrote:
>> Judging from the gentoo-dev@ mailing list discussion [1] about EGO_SUM,
>> where some voices where in agreement that EGO_SUM has its raison d'être,
>> while there where no arguments in favor of eventually removing EGO_SUM,
>> I hereby propose to undeprecate EGO_SUM.
>>
>> 1: https://archives.gentoo.org/gentoo-dev/message/1a64a8e7694c3ee11cd48a58a95f2faa
>>
>
> "We've been rehashing the discussion until all opposition got tired
> and stopped replying, then we claim everyone agrees".

I understand this comment so that there was already a discussion about
deprecating and removing EGO_SUM. I usually try to follow what's going
on Gentoo and I remember the discussion about introducing dependency
tarballs. But I apparently have missed the part where EGO_SUM was slated
for removal. And it appears I am not the only one, at least Ionen also
wrote "Missed bits and pieces but was never quite sure why this went
toward full deprecation, just discouraged may have been fair enough, …".

In any case, I am sorry for bringing this discussion up again. But since
I started rehashing this, no arguments why EGO_SUM should be removed
have been provided. And so far, I failed to find the old discussions
where I'd hope to find some rationale behind the deprecation of EGO_SUM. :/

- Flow
Re: Proposal to undeprecate EGO_SUM [ In reply to ]
On 13/06/2022 10.49, Ulrich Mueller wrote:
>>>>>> On Mon, 13 Jun 2022, Micha? Górny wrote:
>
>> On Mon, 2022-06-13 at 09:44 +0200, Florian Schmaus wrote:
>>> Judging from the gentoo-dev@ mailing list discussion [1] about EGO_SUM,
>>> where some voices where in agreement that EGO_SUM has its raison d'être,
>>> while there where no arguments in favor of eventually removing EGO_SUM,
>>> I hereby propose to undeprecate EGO_SUM.
>>>
>>> 1: https://archives.gentoo.org/gentoo-dev/message/1a64a8e7694c3ee11cd48a58a95f2faa
>
> Can this be done without requesting changes to package managers?

What is 'this' here? The patchset does not make changes to any package
manager, just the go-module eclass.

Note that this is not about finding about an alternative to dependency
tarballs. It is just about re-allowing EGO_SUM in addition to dependency
tarballs for packaging Go software in Gentoo.

- Flow
Re: Proposal to undeprecate EGO_SUM [ In reply to ]
>>>>> On Mon, 13 Jun 2022, Florian Schmaus wrote:

>>>> Judging from the gentoo-dev@ mailing list discussion [1] about EGO_SUM,
>>>> where some voices where in agreement that EGO_SUM has its raison d'être,
>>>> while there where no arguments in favor of eventually removing EGO_SUM,
>>>> I hereby propose to undeprecate EGO_SUM.
>>>>
>>>> 1: https://archives.gentoo.org/gentoo-dev/message/1a64a8e7694c3ee11cd48a58a95f2faa

>> Can this be done without requesting changes to package managers?

> What is 'this' here?

Undeprecating EGO_SUM.

> The patchset does not make changes to any package manager, just the
> go-module eclass.

> Note that this is not about finding about an alternative to dependency
> tarballs. It is just about re-allowing EGO_SUM in addition to
> dependency tarballs for packaging Go software in Gentoo.

OK. Thanks for the clarification.

Ulrich
Re: Proposal to undeprecate EGO_SUM [ In reply to ]
On Mon, 2022-06-13 at 11:30 +0200, Florian Schmaus wrote:
> On 13/06/2022 10.29, Micha? Górny wrote:
> > On Mon, 2022-06-13 at 09:44 +0200, Florian Schmaus wrote:
> > > Judging from the gentoo-dev@ mailing list discussion [1] about EGO_SUM,
> > > where some voices where in agreement that EGO_SUM has its raison d'être,
> > > while there where no arguments in favor of eventually removing EGO_SUM,
> > > I hereby propose to undeprecate EGO_SUM.
> > >
> > > 1: https://archives.gentoo.org/gentoo-dev/message/1a64a8e7694c3ee11cd48a58a95f2faa
> > >
> >
> > "We've been rehashing the discussion until all opposition got tired
> > and stopped replying, then we claim everyone agrees".
>
> I understand this comment so that there was already a discussion about
> deprecating and removing EGO_SUM. I usually try to follow what's going
> on Gentoo and I remember the discussion about introducing dependency
> tarballs. But I apparently have missed the part where EGO_SUM was slated
> for removal. And it appears I am not the only one, at least Ionen also
> wrote "Missed bits and pieces but was never quite sure why this went
> toward full deprecation, just discouraged may have been fair enough, …".
>
> In any case, I am sorry for bringing this discussion up again. But since
> I started rehashing this, no arguments why EGO_SUM should be removed
> have been provided. And so far, I failed to find the old discussions
> where I'd hope to find some rationale behind the deprecation of EGO_SUM. :/
>

I disagree. Robin has made a pretty complete summary in his mail, with
numbers that prove how bad EGO_SUM is/was [1]. While he may have
disagreed with dependency tarballs, he brought pretty clear arguments
how EGO_SUM is even worse. Multiplied by all the Gentoo systems that
won't ever install 95% of Go packages, yet all have to carry their
overhead.

[1]
https://archives.gentoo.org/gentoo-dev/message/8e2a4002bfc6258d65dcf725db347cb9

--
Best regards,
Micha? Górny
Re: Proposal to undeprecate EGO_SUM [ In reply to ]
On Mon, 2022-06-13 at 10:29 +0200, Micha? Górny wrote:
> On Mon, 2022-06-13 at 09:44 +0200, Florian Schmaus wrote:
> > Judging from the gentoo-dev@ mailing list discussion [1] about EGO_SUM,
> > where some voices where in agreement that EGO_SUM has its raison d'être,
> > while there where no arguments in favor of eventually removing EGO_SUM,
> > I hereby propose to undeprecate EGO_SUM.
> >
> > 1: https://archives.gentoo.org/gentoo-dev/message/1a64a8e7694c3ee11cd48a58a95f2faa
> >
>
> "We've been rehashing the discussion until all opposition got tired
> and stopped replying, then we claim everyone agrees".

First of all, I am sorry for my tone.

I have been thinking about it and I was wrong to oppose this change.
I have been conflating two problem: EGO_SUM and Manifest sizes.
However, while EGO_SUM might be an important factor contributing to
the latter, I think we shouldn't single it out and instead focus
on addressing the actual problem.

That said, I believe it's within maintainer's right to decide what API
to deprecate and what API to support. So I'd suggest getting William's
approval for this rather than changing the supported API of that eclass
via drive-by commits.

--
Best regards,
Micha? Górny
Re: Proposal to undeprecate EGO_SUM [ In reply to ]
On 14/06/2022 11.37, Micha? Górny wrote:
> On Mon, 2022-06-13 at 10:29 +0200, Micha? Górny wrote:
>> On Mon, 2022-06-13 at 09:44 +0200, Florian Schmaus wrote:
>>> Judging from the gentoo-dev@ mailing list discussion [1] about EGO_SUM,
>>> where some voices where in agreement that EGO_SUM has its raison d'être,
>>> while there where no arguments in favor of eventually removing EGO_SUM,
>>> I hereby propose to undeprecate EGO_SUM.
>>>
>>> 1: https://archives.gentoo.org/gentoo-dev/message/1a64a8e7694c3ee11cd48a58a95f2faa
>>>
>>
>> "We've been rehashing the discussion until all opposition got tired
>> and stopped replying, then we claim everyone agrees".
>
> First of all, I am sorry for my tone.

No worries and no offense taken. I can easily see how this could be
considered rehashing a old discussion, but the truth is simply that the
deprecation of EGO_SUM cough me by surprise.


> I have been thinking about it and I was wrong to oppose this change.
> I have been conflating two problem: EGO_SUM and Manifest sizes.
> However, while EGO_SUM might be an important factor contributing to
> the latter, I think we shouldn't single it out and instead focus
> on addressing the actual problem.

Exactly my line of though. Especially since it is not unlikely that we
will run into this problem with other programming language ecosystems
too (where the "dependency tarball" solution may not be easily viable).


> That said, I believe it's within maintainer's right to decide what API
> to deprecate and what API to support. So I'd suggest getting William's
> approval for this rather than changing the supported API of that eclass
> via drive-by commits.

That was never my intention, hence the subject starts with "Proposal to"
and I explicitly but William in CC. I believed that one week after the
discussion around my initial gentoo-dev@ post, which gave me the
impression that un-deprecating EGO_SUM has some supporters and no
opposer, it was time to post a concrete proposal in form of a suggested
code change.

Looking forward to William's take on this. :)

- Flow
Re: Proposal to undeprecate EGO_SUM [ In reply to ]
(I hope this makes it to the -dev list)

Hello everyone -

I'm not an official dev but frequently report bugs, fixes and also
maintain a few go-based ebuilds in my private overlay. I also hate
golang with the force of a thousand suns, but hat's not important
right now.

Since I recently converted all my ebuilds from EGO_SUM to the
tarball way of doing things I'd like to chime in.
Also I'm not going to rehash everything that has been said, except
maybe that the space usage of the tarballs is nothing short of *insane*.

OTOH having to paste a weird list of dependencies into the
ebuild is also insane, even though get-ego-vendor makes this
palatable. With an eye towards fixing *that* with a bit more
automation, let's look at the pieces of the puzzle.

The candidate on the table: the ebuild for restic, a popular
and pretty clever backup program.

The restic ebuild by itself is ~40k:
$cd /var/db/repos/gentoo/app-backup/restic
$ls -al restic-0.13.1.ebuild
-rw-r--r-- 1 root root 40699 Apr 23 13:11 restic-0.13.1.ebuild

If we separate the ebuild from the EGO_SUM blurb, we get:
$ls -al restic-0.13.1*
-rw-r--r-- 1 holger users 39668 Jun 14 17:50 restic-0.13.1-EGO_SUM
-rw-r--r-- 1 holger users 1030 Jun 14 17:51 restic-0.13.1.ebuild

Nothing new here. But how large is the EGO_SUM really?
$ls -al restic-0.13.1-EGO_SUM.bz2
-rw-r--r-- 1 holger users 7902 Jun 14 17:50 restic-0.13.1-EGO_SUM.bz2

Much smaller obviously, but probably still too large for including in
$FILESDIR. So my idea here is: instead of chucking EGO_SUM (automatically
generated declarative dependency management) out the window, can we not
separate the two and instead of uploading the tarball upload the
dependency set instead? This does not fix the mentioned trust problem
since a dev space can still be hijacked, but that is already the case.
Anyway.

The only new requirement here would be to load/parse the EGO_SUM.bz2 into
the ebuild, but I'm sure that can be solved. Note that only the SHA of
the EGO_SUM.bz2 would be verified as dependency, not all the
contents - same as with the tarball.

This would eliminate the space bloat/bandwith amplification problem,
distfile caching across ebuilds could again work as expected (even though
go successfully makes that almost futile), and with slightly better
tooling in ego-get-vendor could reduce toil when bumping an ebuild.

I'm looking forward to hear why this idea is terrible. :)

Thank you all for Gentoo.

cheers
Holger
Re: Re: Proposal to undeprecate EGO_SUM [ In reply to ]
On 14.06.22 18:33, Holger Hoffstätte wrote:
> So my idea here is: instead of chucking EGO_SUM (automatically
> generated declarative dependency management) out the window, can we not
> separate the two and instead of uploading the tarball upload the
> dependency set instead?
I think that this idea that has been pitched already (see for example
Robin's post [1]), although in a broader non-Go-specific sense and it is
one obvious way to move forward.

An, and probably the largest, obstacle is that this can not be
implemented in an eclass alone. Due the sandboxing during the build
process, fetching distfiles, which is what we are talking about, is the
package managers job and hence, I believe, this would require adustments
to the package manager and package manager specification (PMS).

The basic idea, at least to my understanding (or how I would propose
it), is to have a new top-level ebuild variable

SRC_URI_FILE="https://example.org/manifests/restic-0.13.1.files"

where restic-0.13.1.files contains lines like

<SRC_URI> <SIZE> <HASH> [<TARGET_FILENAME>]

which is, as you nicely demonstrated on the restic ebuild, where the
bytes contributing to the ebuild size bloat originate from.

Those bytes are now outsourced from ::gentoo, can be fetched on-demand,
allowing the package manager to download the individual distfiles into
DISTDIR, where an, e.g., the go eclass can process them further within
the constraints of the security sandbox.

- Flow


1:
https://archives.gentoo.org/gentoo-dev/message/8e2a4002bfc6258d65dcf725db347cb9
Re: Proposal to undeprecate EGO_SUM [ In reply to ]
(replying to the first post here as I believe this post is relevant to
most, if not all, subthreads)

I've prepared a PoC of an automated solution for vendoring[1] a while
back (around the start of this whole discussion) that would place trust
on the infrastructure (though potentially verifiable).

My concept provides two solutions:
1) go mod vendor - not verifiable by users (as vendor tars don't include
enough information for checksumming - see also [2])
2) modcache - significantly larger but verifiable on the client (against
existing go.sum). These archives really go up to gigabytes in size as
opposed to a few megs of vendored tarballs.

Please note that [1] is on a small server, possibly broken, pretty slow,
and not fit for production yet. Ping me on IRC if you encounter issues
so that I can "unjam" it.

Also note that this thing doesn't attempt much to figure out how to
convert a ${PV} or any other format versions, and essentially leaves
that up to the GOPROXY (with very little extra work, see: [3]).

The proposed solution here is that the developer passes something like
https://go.gentoo.org/vendor/...@${PV} -> vendor.tar into $SRC_URI,
which would get initiated with a call to ``pkgdev manifest'' or such
(possibly authenticated via IP or keys or something, to prevent abuse),
and be done with it.

The biggest downside I've seen so far (excluding further developing the
solution) is that some Go programs don't respect the restrictions of the
Go module system, and thus fail to fetch.

[1]: https://vengor.aarsen.me/
[2]: https://github.com/golang/go/issues/27348
[3]: https://git.sr.ht/~arsen/vengor/tree/ab1ae7b275ab492d4806de88cfbf67e7b97c1ade/item/vengor/__init__.py#L101-127

--
Arsen Arsenovi?
Re: Re: Proposal to undeprecate EGO_SUM [ In reply to ]
On Tue, 2022-06-14 at 19:03 +0200, Florian Schmaus wrote:
> On 14.06.22 18:33, Holger Hoffstätte wrote:
> > So my idea here is: instead of chucking EGO_SUM (automatically
> > generated declarative dependency management) out the window, can we not
> > separate the two and instead of uploading the tarball upload the
> > dependency set instead?
> I think that this idea that has been pitched already (see for example
> Robin's post [1]), although in a broader non-Go-specific sense and it is
> one obvious way to move forward.
>
> An, and probably the largest, obstacle is that this can not be
> implemented in an eclass alone. Due the sandboxing during the build
> process, fetching distfiles, which is what we are talking about, is the
> package managers job and hence, I believe, this would require adustments
> to the package manager and package manager specification (PMS).
>
> The basic idea, at least to my understanding (or how I would propose
> it), is to have a new top-level ebuild variable
>
> SRC_URI_FILE="https://example.org/manifests/restic-0.13.1.files"
>
> where restic-0.13.1.files contains lines like
>
> <SRC_URI> <SIZE> <HASH> [<TARGET_FILENAME>]
>
> which is, as you nicely demonstrated on the restic ebuild, where the
> bytes contributing to the ebuild size bloat originate from.
>
> Those bytes are now outsourced from ::gentoo, can be fetched on-demand,
> allowing the package manager to download the individual distfiles into
> DISTDIR, where an, e.g., the go eclass can process them further within
> the constraints of the security sandbox.
>

Anything that involves breaking the Portage plan-depgraph / fetch&build
separately would require major architectural changes, so can be rejected
immediately as "not going to be implemented in our lifetimes".

--
Best regards,
Micha? Górny
Re: Proposal to undeprecate EGO_SUM [ In reply to ]
On Mon, Jun 13, 2022 at 12:26:43PM +0200, Ulrich Mueller wrote:
> >>>>> On Mon, 13 Jun 2022, Florian Schmaus wrote:
>
> >>>> Judging from the gentoo-dev@ mailing list discussion [1] about EGO_SUM,
> >>>> where some voices where in agreement that EGO_SUM has its raison d'?tre,
> >>>> while there where no arguments in favor of eventually removing EGO_SUM,
> >>>> I hereby propose to undeprecate EGO_SUM.
> >>>>
> >>>> 1: https://archives.gentoo.org/gentoo-dev/message/1a64a8e7694c3ee11cd48a58a95f2faa
>
> >> Can this be done without requesting changes to package managers?
>
> > What is 'this' here?
>
> Undeprecating EGO_SUM.
>
> > The patchset does not make changes to any package manager, just the
> > go-module eclass.
>
> > Note that this is not about finding about an alternative to dependency
> > tarballs. It is just about re-allowing EGO_SUM in addition to
> > dependency tarballs for packaging Go software in Gentoo.

Like I said on my earlier reply, there have been packages that break
using EGO_SUM. Also, Robin's proposal will not be happening, if it does,
for some time since it will require an eapi bump and doesn't have a
working implementation.

The most pressing concern about EGO_SUM is that it can make portage
crash because of the size of SRC_URI, so it definitely should not be
preferred over dependency tarballs.

If you want to chat more about this on the list we can, but for now,
let's not undeprecate EGO_SUM in the eclass.

William
Re: Re: Proposal to undeprecate EGO_SUM [ In reply to ]
On Wed, 2022-06-15 at 07:53 +0200, Micha? Górny wrote:
> On Tue, 2022-06-14 at 19:03 +0200, Florian Schmaus wrote:
> > On 14.06.22 18:33, Holger Hoffstätte wrote:
> > > So my idea here is: instead of chucking EGO_SUM (automatically
> > > generated declarative dependency management) out the window, can we not
> > > separate the two and instead of uploading the tarball upload the
> > > dependency set instead?
> > I think that this idea that has been pitched already (see for example
> > Robin's post [1]), although in a broader non-Go-specific sense and it is
> > one obvious way to move forward.
> >
> > An, and probably the largest, obstacle is that this can not be
> > implemented in an eclass alone. Due the sandboxing during the build
> > process, fetching distfiles, which is what we are talking about, is the
> > package managers job and hence, I believe, this would require adustments
> > to the package manager and package manager specification (PMS).
> >
> > The basic idea, at least to my understanding (or how I would propose
> > it), is to have a new top-level ebuild variable
> >
> > SRC_URI_FILE="https://example.org/manifests/restic-0.13.1.files"
> >
> > where restic-0.13.1.files contains lines like
> >
> > <SRC_URI> <SIZE> <HASH> [<TARGET_FILENAME>]
> >
> > which is, as you nicely demonstrated on the restic ebuild, where the
> > bytes contributing to the ebuild size bloat originate from.
> >
> > Those bytes are now outsourced from ::gentoo, can be fetched on-demand,
> > allowing the package manager to download the individual distfiles into
> > DISTDIR, where an, e.g., the go eclass can process them further within
> > the constraints of the security sandbox.
> >
>
> Anything that involves breaking the Portage plan-depgraph / fetch&build
> separately would require major architectural changes, so can be rejected
> immediately as "not going to be implemented in our lifetimes".
>

Just to be clear, I'm not against this proposal. In fact, I think it's
probably the best solution that's been proposed so far. What I wanted
to point out is that we probably don't have anyone who would actually
implement that.

--
Best regards,
Micha? Górny
Re: Proposal to undeprecate EGO_SUM [ In reply to ]
Hi,

I've been working on adding a go based ebuild to Gentoo yesterday and I
got this warning form portage saying that EGO_SUM is deprecated and
should be avoided. Since I remember there was an intense discussion
about this on the ML I went back and have re-read the threads before
writing this piece. I'd like to provide my perspective as user, a
proxied maintainer, and overlay owner. I also run a private mirror on my
LAN to serve my hosts in order to reduce load on external mirrors.

Before diving in I think it's worth reading mgorny's blog post "The
modern packager’s security nightmare"[1] as it's relevant to the
discussion, and something I deeply agree with.

With all that being said, I feel that the tarball idea is a bad due to
many reasons.

From security point of view, I understand that we still have to trust
maintainers not to do funky stuff, but I think this issue goes beyond
that.

First of all one of the advantages of Gentoo is that it gets it's source
code from upstream (yes, I'm aware of mirrors acting as a cache layer),
which means that poisoning source code needs to be done at upstream
level (effectively means hacking GitHub, PyPi, or some standalone
project's Gitea/cgit/gitlab/etc. instance or similar), sources which
either have more scrutiny or have a limited blast radius.

Additionally if an upstream dependency has a security issue it's easier
to scan all EGO_SUM content and find packages that potentially depend on
a broken dependency and force a re-pinning and rebuild. The tarball
magic hides this completely and makes searching very expensive.

In fact using these vendor tarballs is the equivalent of "static
linking" in the packaging space. Why are we introducing the same issue
in the repository space? This kills the reusability of already
downloaded dependencies and bloats storage requirements. This is
especially bad on laptops, where SSD free space might be limited, in
case the user does not nuke their distfiles after each upgrade.

Considering that BTRFS (and possibly other filesystems) support on the
fly compression the physical cost of a few inflated ebuilds and
Manifests is actually way smaller than the logical size would indicate.
Compare that to the huge incompressible tarballs that now we need to
store.

As a proxied maintainer or overlay owner hosting these huge tarballs
also becomes problem (i.e. we need some public space with potentially
gigabytes of free space and enough bandwidth to push that to users).
Pushing toward vendor tarballs creates an extra expense on every level
(Gentoo infra, mirrors, proxy maintainers, overlay owners, users).

If bloating portage is a big issue and we frown upon go stuff anyway (or
only a few users need these packages), why not consider moving all go
packages into an officially supported go packages only overlay? I
understand that this would not solve the kernel buffer issue where we
run out of environment variable space, but it would debloat the main
portage tree.

It also breaks reproducibility. With EGO_SUM I can check out an older
version of portage tree (well to some extent) and rebuild packages since
dependency upstream is very likely to host old versions of their source.
With the tarballs this breaks since as soon as an ebuild is dropped from
mainline portage the vendor tarballs follow them too. There is no way
for the user to roll back a package a few weeks back (e.g. if new
version has bugs), unlike with EGO_SUM.

In fact I feel this goes against the spirit of portage too, since now
instead of "just describing" how to obtain sources and build them, now
it now depends on essentially ephemeral blobs, which happens to be
externalized from the portage tree itself. I'm aware that we have
ebuilds that pull in patches and other stuff from dev space already, but
we shouldn't make this even worse.

Finally with EGO_SUM we had a nice tool get-ego-vendor which produced
the EGO_SUM for maintainers which has made maintenance easier. However I
haven't found any new guidance yet on how to maintain go packages with
the new tarball method (e.g. what needs to go into the vendor tarball,
what changes are needed in ebuilds). Overall this complifates further
ebuild development and verification of PRs.

In summary, IMHO the EGO_SUM way of handling of go packages has more
benefits than drawbacks compared to the vendor tarballs.

Cheers,
Zoltan

[1]
https://blogs.gentoo.org/mgorny/2021/02/19/the-modern-packagers-security-nightmare/
Re: Proposal to undeprecate EGO_SUM [ In reply to ]
On Mon, Jun 27, 2022 at 01:43:19 +0200, Zoltan Puskas wrote:
> Hi,
>
> I've been working on adding a go based ebuild to Gentoo yesterday and I
> got this warning form portage saying that EGO_SUM is deprecated and
> should be avoided. Since I remember there was an intense discussion
> about this on the ML I went back and have re-read the threads before
> writing this piece. I'd like to provide my perspective as user, a
> proxied maintainer, and overlay owner. I also run a private mirror on my
> LAN to serve my hosts in order to reduce load on external mirrors.
>
> Before diving in I think it's worth reading mgorny's blog post "The
> modern packager’s security nightmare"[1] as it's relevant to the
> discussion, and something I deeply agree with.
>
> With all that being said, I feel that the tarball idea is a bad due to
> many reasons.
>
> From security point of view, I understand that we still have to trust
> maintainers not to do funky stuff, but I think this issue goes beyond
> that.
>
> First of all one of the advantages of Gentoo is that it gets it's source
> code from upstream (yes, I'm aware of mirrors acting as a cache layer),
> which means that poisoning source code needs to be done at upstream
> level (effectively means hacking GitHub, PyPi, or some standalone
> project's Gitea/cgit/gitlab/etc. instance or similar), sources which
> either have more scrutiny or have a limited blast radius.
>
> Additionally if an upstream dependency has a security issue it's easier
> to scan all EGO_SUM content and find packages that potentially depend on
> a broken dependency and force a re-pinning and rebuild. The tarball
> magic hides this completely and makes searching very expensive.
>
> In fact using these vendor tarballs is the equivalent of "static
> linking" in the packaging space. Why are we introducing the same issue
> in the repository space? This kills the reusability of already
> downloaded dependencies and bloats storage requirements. This is
> especially bad on laptops, where SSD free space might be limited, in
> case the user does not nuke their distfiles after each upgrade.
>
> Considering that BTRFS (and possibly other filesystems) support on the
> fly compression the physical cost of a few inflated ebuilds and
> Manifests is actually way smaller than the logical size would indicate.
> Compare that to the huge incompressible tarballs that now we need to
> store.
>
> As a proxied maintainer or overlay owner hosting these huge tarballs
> also becomes problem (i.e. we need some public space with potentially
> gigabytes of free space and enough bandwidth to push that to users).
> Pushing toward vendor tarballs creates an extra expense on every level
> (Gentoo infra, mirrors, proxy maintainers, overlay owners, users).
>
> If bloating portage is a big issue and we frown upon go stuff anyway (or
> only a few users need these packages), why not consider moving all go
> packages into an officially supported go packages only overlay? I
> understand that this would not solve the kernel buffer issue where we
> run out of environment variable space, but it would debloat the main
> portage tree.
>

Rephrasing this just to ensure I'm understanding it correctly: you're
suggesting to move _everything_ that uses Go into its own overlay. Let's
call it gentoo-go for the sake of the example.

If the above is accurate, then I hard disagree.

The biggest package that I have that uses Go is docker (and accompanying
tools). Personal distaste of docker aside, it's a very popular piece of
software, and I don't think it's fair to require all the people who want
to use it to first enable and sync gentoo-go before they can install it.

And what about transitive dependencies? Suppose app-misc/cool-package is
written in some language that isn't Go, but it has a dependency on
sys-apps/cool-util which has a dependency on something written in Go.
Should a user wanting to install cool-package have to enable the
gentoo-go overlay now too? Even though app-misc/cool-package would look
like it doesn't need the overlay unless you dig into the deps.

Not a dev, just a user who really likes Gentoo :)

- Oskari

> It also breaks reproducibility. With EGO_SUM I can check out an older
> version of portage tree (well to some extent) and rebuild packages since
> dependency upstream is very likely to host old versions of their source.
> With the tarballs this breaks since as soon as an ebuild is dropped from
> mainline portage the vendor tarballs follow them too. There is no way
> for the user to roll back a package a few weeks back (e.g. if new
> version has bugs), unlike with EGO_SUM.
>
> In fact I feel this goes against the spirit of portage too, since now
> instead of "just describing" how to obtain sources and build them, now
> it now depends on essentially ephemeral blobs, which happens to be
> externalized from the portage tree itself. I'm aware that we have
> ebuilds that pull in patches and other stuff from dev space already, but
> we shouldn't make this even worse.
>
> Finally with EGO_SUM we had a nice tool get-ego-vendor which produced
> the EGO_SUM for maintainers which has made maintenance easier. However I
> haven't found any new guidance yet on how to maintain go packages with
> the new tarball method (e.g. what needs to go into the vendor tarball,
> what changes are needed in ebuilds). Overall this complifates further
> ebuild development and verification of PRs.
>
> In summary, IMHO the EGO_SUM way of handling of go packages has more
> benefits than drawbacks compared to the vendor tarballs.
>
> Cheers,
> Zoltan
>
> [1]
> https://blogs.gentoo.org/mgorny/2021/02/19/the-modern-packagers-security-nightmare/
>
Re: Proposal to undeprecate EGO_SUM [ In reply to ]
Hey,

>
> Rephrasing this just to ensure I'm understanding it correctly: you're
> suggesting to move _everything_ that uses Go into its own overlay. Let's
> call it gentoo-go for the sake of the example.
>
> If the above is accurate, then I hard disagree.

Yes, that was the suggestion, you understood it correctly.

>
> The biggest package that I have that uses Go is docker (and accompanying
> tools). Personal distaste of docker aside, it's a very popular piece of
> software, and I don't think it's fair to require all the people who want
> to use it to first enable and sync gentoo-go before they can install it.

It could be enabled by default for everyone, and people would have the choice to
disable it or mask everything except what they are using in that case, so the
extra user toil could be avoided by a creaful rollout. I'm not saying it would
be an elegant solution though.

>
> And what about transitive dependencies? Suppose app-misc/cool-package is
> written in some language that isn't Go, but it has a dependency on
> sys-apps/cool-util which has a dependency on something written in Go.
> Should a user wanting to install cool-package have to enable the
> gentoo-go overlay now too? Even though app-misc/cool-package would look
> like it doesn't need the overlay unless you dig into the deps.

This is however a valid point, something I did not consider.

Any reverse dependencies (i.e. packages in main portage tree depending on
gentoo-go) would be anithetical to the overlay philosopy (the other direction of
dependencies is okay though). This invalidates my separate overlay
suggestion, consider it withdrawn.

However I think that my other points still stand, until someone convinces
me otherwise.

>
> Not a dev, just a user who really likes Gentoo :)

Thanks for your perspective, it was a valueable observation. :)

>
> - Oskari
>

Cheers,
Zoltan
Re: Proposal to undeprecate EGO_SUM [ In reply to ]
On Mon, Jun 27, 2022 at 01:43:19AM +0200, Zoltan Puskas wrote:

*snip*

> First of all one of the advantages of Gentoo is that it gets it's source
> code from upstream (yes, I'm aware of mirrors acting as a cache layer),
> which means that poisoning source code needs to be done at upstream
> level (effectively means hacking GitHub, PyPi, or some standalone
> project's Gitea/cgit/gitlab/etc. instance or similar), sources which
> either have more scrutiny or have a limited blast radius.

I don't quite follow what you mean.
Upstream for go modules is actually proxy.golang.org, or some other
similar proxy, which the go tooling knows how to access [1].

> Additionally if an upstream dependency has a security issue it's easier
> to scan all EGO_SUM content and find packages that potentially depend on
> a broken dependency and force a re-pinning and rebuild. The tarball
> magic hides this completely and makes searching very expensive.

I'm not comfortable at all with us changing the dependencies like this
downstream for the same reason the Debian folks ultimately were against
it for kubernetes. If you make these kinds of changes you are affectively
creating a fork, and that would mean we would be building packages with untested
libraries [2].

*snip*

> Considering that BTRFS (and possibly other filesystems) support on the
> fly compression the physical cost of a few inflated ebuilds and

The problem here is the size of SRC_URI when you add the EGO_SUM_SRC_URI
to it. SRC_URI gets exported to the environment, so it can crash portage
if it is too big.

> Manifests is actually way smaller than the logical size would indicate.
> Compare that to the huge incompressible tarballs that now we need to
> store.
>
> As a proxied maintainer or overlay owner hosting these huge tarballs
> also becomes problem (i.e. we need some public space with potentially
> gigabytes of free space and enough bandwidth to push that to users).
> Pushing toward vendor tarballs creates an extra expense on every level
> (Gentoo infra, mirrors, proxy maintainers, overlay owners, users).

I agree that creating the dependency tarballs is not ideal. We asked for
another option [3], but as you can see from the bug this was refused by
the PMS team. That refusal is the only reason we have to worry about
dependency tarballs.

> It also breaks reproducibility. With EGO_SUM I can check out an older
> version of portage tree (well to some extent) and rebuild packages since
> dependency upstream is very likely to host old versions of their source.
> With the tarballs this breaks since as soon as an ebuild is dropped from
> mainline portage the vendor tarballs follow them too. There is no way
> for the user to roll back a package a few weeks back (e.g. if new
> version has bugs), unlike with EGO_SUM.

The contents of a dependency tarball is created using "go mod download",
which is controlled by the go.mod/go.sum files in the package. So, it is
possible to recreate the dependency tarball any time.

I do not see any advantage EGO_SUM offers over the dependency tarballs
in this space.

> Finally with EGO_SUM we had a nice tool get-ego-vendor which produced
> the EGO_SUM for maintainers which has made maintenance easier. However I
> haven't found any new guidance yet on how to maintain go packages with
> the new tarball method (e.g. what needs to go into the vendor tarball,
> what changes are needed in ebuilds). Overall this complifates further
> ebuild development and verification of PRs.

The documentation for how to build dependency tarballs is in the eclass.
The GOMODCACHE environment variable is used in the eclass to point to
the location where the dependency tarball is unpacked, and that location
is read by the normal go tooling.

> In summary, IMHO the EGO_SUM way of handling of go packages has more
> benefits than drawbacks compared to the vendor tarballs.

EGO_SUM can cause portage to break; that is the primary reason support
is going away.

We attempted another solution that was refused, so the only option we
have currently is to build the dependency tarballs.

>
> Cheers,
> Zoltan
>
> [1]
> https://blogs.gentoo.org/mgorny/2021/02/19/the-modern-packagers-security-nightmare/
>

[1] https://go.dev/ref/mod
[2] https://lwn.net/Articles/835599/
[3] https://bugs.gentoo.org/833567
Re: Proposal to undeprecate EGO_SUM [ In reply to ]
On 15/07/2022 23.34, William Hubbs wrote:
> On Mon, Jun 27, 2022 at 01:43:19AM +0200, Zoltan Puskas wrote:
>> In summary, IMHO the EGO_SUM way of handling of go packages has more
>> benefits than drawbacks compared to the vendor tarballs.
>
> EGO_SUM can cause portage to break; that is the primary reason support
> is going away.
>
> We attempted another solution that was refused, so the only option we
> have currently is to build the dependency tarballs.

That reads as if you wrote it under the assumption that we can only
either use dependency tarballs or use EGO_SUM. At the same time, I have
not seen an argument why we can not simply do *both*.

EGO_SUM has numerous advantages over dependency tarballs, but can not be
used if the size of the EGO_SUM value crosses a threshold. So why not
mandate dependency tarballs if a point is crossed and otherwise allow
EGO_SUM? That way, we could have the best of both worlds.

- Flow
Re: Proposal to undeprecate EGO_SUM [ In reply to ]
On 16.7.2022 14.24, Florian Schmaus wrote:
>
> That reads as if you wrote it under the assumption that we can only
> either use dependency tarballs or use EGO_SUM. At the same time, I have
> not seen an argument why we can not simply do *both*.
>
> EGO_SUM has numerous advantages over dependency tarballs, but can not be
> used if the size of the EGO_SUM value crosses a threshold. So why not
> mandate dependency tarballs if a point is crossed and otherwise allow
> EGO_SUM? That way, we could have the best of both worlds.
>
> - Flow
>

++ this sounds most sensible. This is also how I've understood your
proposal.

-- juippis
Re: Proposal to undeprecate EGO_SUM [ In reply to ]
On Sat, Jul 16, 2022 at 02:58:04PM +0300, Joonas Niilola wrote:
> On 16.7.2022 14.24, Florian Schmaus wrote:
> >
> > That reads as if you wrote it under the assumption that we can only
> > either use dependency tarballs or use EGO_SUM. At the same time, I have
> > not seen an argument why we can not simply do *both*.
> >
> > EGO_SUM has numerous advantages over dependency tarballs, but can not be
> > used if the size of the EGO_SUM value crosses a threshold. So why not
> > mandate dependency tarballs if a point is crossed and otherwise allow
> > EGO_SUM? That way, we could have the best of both worlds.
> >
> > - Flow
> >
>
> ++ this sounds most sensible. This is also how I've understood your
> proposal.

Remember that with EGO_SUM all of the bloated manifests and ebuilds are
on every user's system.

I added mgorny as a cc to this message because he made it pretty clear
at some point in the previous discussion that the size of these ebuilds
and manifests is unacceptable.

William
Re: Proposal to undeprecate EGO_SUM [ In reply to ]
On 16/07/2022 20.51, William Hubbs wrote:
> On Sat, Jul 16, 2022 at 02:58:04PM +0300, Joonas Niilola wrote:
>> On 16.7.2022 14.24, Florian Schmaus wrote:
>>>
>>
>> ++ this sounds most sensible. This is also how I've understood your
>> proposal.
>
> Remember that with EGO_SUM all of the bloated manifests and ebuilds are
> on every user's system.
>
> I added mgorny as a cc to this message because he made it pretty clear
> at some point in the previous discussion that the size of these ebuilds
> and manifests is unacceptable.
>
> William

I want to give another option. Both ways are allowed by eclass, but by
QA policy (or some other decision), it is prohibited to use EGO_SUM in
main ::gentoo tree.

As a result, overlays and ::guru can use the EGO_SUM or dist distfile
(remember, they don't have access to hosting on dev.g.o).

--
Arthur Zamarin
arthurzam@gentoo.org
Gentoo Linux developer (Python, Arch Teams, pkgcore stack, GURU)
Re: Proposal to undeprecate EGO_SUM [ In reply to ]
On Sat, Jul 16, 2022 at 09:31:35PM +0300, Arthur Zamarin wrote:
> I want to give another option. Both ways are allowed by eclass, but by
> QA policy (or some other decision), it is prohibited to use EGO_SUM in
> main ::gentoo tree.
>
> As a result, overlays and ::guru can use the EGO_SUM or dist distfile
> (remember, they don't have access to hosting on dev.g.o).
Yes; this is the option I was trying to propose as an intermediate step
until we have indirect Manifests that provide the best of both worlds
(not bloating the tree, and not requiring creation of dep tarballs).


--
Robin Hugh Johnson
Gentoo Linux: Dev, Infra Lead, Foundation Treasurer
E-Mail : robbat2@gentoo.org
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136
Re: Proposal to undeprecate EGO_SUM [ In reply to ]
On Sat, Jul 16, 2022 at 06:46:40PM +0000, Robin H. Johnson wrote:
> On Sat, Jul 16, 2022 at 09:31:35PM +0300, Arthur Zamarin wrote:
> > I want to give another option. Both ways are allowed by eclass, but by
> > QA policy (or some other decision), it is prohibited to use EGO_SUM in
> > main ::gentoo tree.
> >
> > As a result, overlays and ::guru can use the EGO_SUM or dist distfile
> > (remember, they don't have access to hosting on dev.g.o).
> Yes; this is the option I was trying to propose as an intermediate step
> until we have indirect Manifests that provide the best of both worlds
> (not bloating the tree, and not requiring creation of dep tarballs).

I could force this in the eclass with the following flow if I know how
to tell if the ebuild inheriting it is in the main tree or not:

# in_main_tree is a place holder for a test to see if the ebuld running
# this is in the tree
if [[ -n ${EGO_SUM} && in_main_tree ]]; then
eqawarn "EGO_SUM is not allowed in the main tree"
eqawarn "This will become a fatal error in the future"
fi

The only question is, is there a way to reliably tell whether or not
we are in the main tree?

William

1 2  View All