Hi,
I've been working on adding a go based ebuild to Gentoo yesterday and I
got this warning form portage saying that EGO_SUM is deprecated and
should be avoided. Since I remember there was an intense discussion
about this on the ML I went back and have re-read the threads before
writing this piece. I'd like to provide my perspective as user, a
proxied maintainer, and overlay owner. I also run a private mirror on my
LAN to serve my hosts in order to reduce load on external mirrors.
Before diving in I think it's worth reading mgorny's blog post "The
modern packager’s security nightmare"[1] as it's relevant to the
discussion, and something I deeply agree with.
With all that being said, I feel that the tarball idea is a bad due to
many reasons.
From security point of view, I understand that we still have to trust
maintainers not to do funky stuff, but I think this issue goes beyond
that.
First of all one of the advantages of Gentoo is that it gets it's source
code from upstream (yes, I'm aware of mirrors acting as a cache layer),
which means that poisoning source code needs to be done at upstream
level (effectively means hacking GitHub, PyPi, or some standalone
project's Gitea/cgit/gitlab/etc. instance or similar), sources which
either have more scrutiny or have a limited blast radius.
Additionally if an upstream dependency has a security issue it's easier
to scan all EGO_SUM content and find packages that potentially depend on
a broken dependency and force a re-pinning and rebuild. The tarball
magic hides this completely and makes searching very expensive.
In fact using these vendor tarballs is the equivalent of "static
linking" in the packaging space. Why are we introducing the same issue
in the repository space? This kills the reusability of already
downloaded dependencies and bloats storage requirements. This is
especially bad on laptops, where SSD free space might be limited, in
case the user does not nuke their distfiles after each upgrade.
Considering that BTRFS (and possibly other filesystems) support on the
fly compression the physical cost of a few inflated ebuilds and
Manifests is actually way smaller than the logical size would indicate.
Compare that to the huge incompressible tarballs that now we need to
store.
As a proxied maintainer or overlay owner hosting these huge tarballs
also becomes problem (i.e. we need some public space with potentially
gigabytes of free space and enough bandwidth to push that to users).
Pushing toward vendor tarballs creates an extra expense on every level
(Gentoo infra, mirrors, proxy maintainers, overlay owners, users).
If bloating portage is a big issue and we frown upon go stuff anyway (or
only a few users need these packages), why not consider moving all go
packages into an officially supported go packages only overlay? I
understand that this would not solve the kernel buffer issue where we
run out of environment variable space, but it would debloat the main
portage tree.
It also breaks reproducibility. With EGO_SUM I can check out an older
version of portage tree (well to some extent) and rebuild packages since
dependency upstream is very likely to host old versions of their source.
With the tarballs this breaks since as soon as an ebuild is dropped from
mainline portage the vendor tarballs follow them too. There is no way
for the user to roll back a package a few weeks back (e.g. if new
version has bugs), unlike with EGO_SUM.
In fact I feel this goes against the spirit of portage too, since now
instead of "just describing" how to obtain sources and build them, now
it now depends on essentially ephemeral blobs, which happens to be
externalized from the portage tree itself. I'm aware that we have
ebuilds that pull in patches and other stuff from dev space already, but
we shouldn't make this even worse.
Finally with EGO_SUM we had a nice tool get-ego-vendor which produced
the EGO_SUM for maintainers which has made maintenance easier. However I
haven't found any new guidance yet on how to maintain go packages with
the new tarball method (e.g. what needs to go into the vendor tarball,
what changes are needed in ebuilds). Overall this complifates further
ebuild development and verification of PRs.
In summary, IMHO the EGO_SUM way of handling of go packages has more
benefits than drawbacks compared to the vendor tarballs.
Cheers,
Zoltan
[1]
https://blogs.gentoo.org/mgorny/2021/02/19/the-modern-packagers-security-nightmare/