Mailing List Archive

[GLEP78] Updating specification
Hi,

We are updating the draft of GLEP78 "Gentoo Binary Package Container
Format" to address some security issues, and included some new designs
for this purpose.

The new draft and the difference version from the old one are attached.

Please feel free to give any comments and suggestions.

Thanks,
Sheng Yu
Re: [GLEP78] Updating specification [ In reply to ]
>>>>> On Mon, 13 Sep 2021, Sheng Yu wrote:

> -The archive contains a number of files, stored in a single directory
> -whose name should match the basename of the package file. However,
> -the implementation must be able to process an archive where
> -the directory name is mismatched. There should be no explicit archive
> -member entry for the directory.
> +The archive contains a number of files. All package-related files
> +should be stored in a single directory whose name matches the CPV of
> +the package file. However, the implementation must be able to process
> +an archive where the directory name is mismatched. There should be no
> +explicit archive member entry for the directory.

I wonder about CPV here. That's ${CATEGORY}/${P} and contains a slash,
so it cannot be the name of a directory. Also, what about the package
revision?

> +6. The package manifest data file ``Manifest`` (required).
> +
> +7. A signature for the package Manifest file ``Manifest.sig``
> + (optional).

Given that the outer archive is uncompressed tar, every file will be
zero-padded to a full block which adds some amount of bloat. So, could
the signature be inlined in the Manifest file? That's also what GLEP 74
specifies.

Also, IIRC one of the goals of the format was to allow partial download
of metadata. That will only work if the Manifest file will be the first
file in the archive (or at least appear before the image archive).

> +The implementation follows the Manifest specifications in GLEP 74
> +[#GLEP74]_ and uses the DATA tag for files within the archive.

AFAICS, GLEP 74 specifies an OpenPGP cleartext signature in the file
itself, not a detached signature.

Ulrich
Re: [GLEP78] Updating specification [ In reply to ]
On Mon, 2021-09-13 at 12:08 +0200, Ulrich Mueller wrote:
> > > > > > On Mon, 13 Sep 2021, Sheng Yu wrote:
>
> > -The archive contains a number of files, stored in a single
> > directory
> > -whose name should match the basename of the package file. However,
> > -the implementation must be able to process an archive where
> > -the directory name is mismatched. There should be no explicit
> > archive
> > -member entry for the directory.
> > +The archive contains a number of files. All package-related files
> > +should be stored in a single directory whose name matches the CPV
> > of
> > +the package file. However, the implementation must be able to
> > process
> > +an archive where the directory name is mismatched. There should be
> > no
> > +explicit archive member entry for the directory.
>
> I wonder about CPV here. That's ${CATEGORY}/${P} and contains a slash,
> so it cannot be the name of a directory. Also, what about the package
> revision?

Please restore the previous wording. The GLEP deliberately did not
enforce a specific filename because it's about internal format.

>
> > +6. The package manifest data file ``Manifest`` (required).
> > +
> > +7. A signature for the package Manifest file ``Manifest.sig``
> > + (optional).
>
> Given that the outer archive is uncompressed tar, every file will be
> zero-padded to a full block which adds some amount of bloat. So, could
> the signature be inlined in the Manifest file? That's also what GLEP
> 74
> specifies.

Using inline signature in Manifest makes sense.

>
> Also, IIRC one of the goals of the format was to allow partial
> download
> of metadata. That will only work if the Manifest file will be the
> first
> file in the archive (or at least appear before the image archive).

I disagree. This is solved by having detached metadata signature -- you
can do a partial fetch and verify the metadata directly.

On the other hand, putting Manifest first would make it impossible to
create the archive from data stream without using temporary files,
effectively doubling the needed free space. Well, technically you could
just reserve space and write Manifest later but that would strongly
depend on the size of PGP signature and that's not something I'd feel
comfortable relying on.

--
Best regards,
Micha? Górny
Re: [GLEP78] Updating specification [ In reply to ]
On Mon, Sep 13, 2021 at 5:02 PM Micha? Górny <mgorny@gentoo.org> wrote:
>
> On Mon, 2021-09-13 at 12:08 +0200, Ulrich Mueller wrote:
> >
> > Also, IIRC one of the goals of the format was to allow partial
> > download
> > of metadata. That will only work if the Manifest file will be the
> > first
> > file in the archive (or at least appear before the image archive).
>
> I disagree. This is solved by having detached metadata signature -- you
> can do a partial fetch and verify the metadata directly.
>

Another option I've tossed out there in the past is having a content
hash of the metadata and putting that in the filename. That obviously
won't tell you anything about the contents of the file without reading
it, but if you're looking for a file with specific metadata you could
predict its filename. This was intended to work with having multiple
hashes for the same file using subsets of the metadata, using symbolic
links.

The thinking here is that you'd just hash a subset of metadata useful
for identifying what file you'd want to download, such as CHOST,
linked dependency versions, use flags, etc. You'd probably hash it
with/without stuff like use flags so that you could either take a shot
at getting the file exactly configured how you want, or accepting a
version with any set of flags.

Of course, this idea goes in direct opposition to your statement about
not wanting to specify the filename. I get that argument. The intent
here was to allow portage to go hunting through trusted repositories
to find packages it can use without having to sync a lot of data - if
you know the exact filename then a simple GET tells you if it is there
or not.

--
Rich
Re: [GLEP78] Updating specification [ In reply to ]
??????? Original Message ???????

On Monday, September 13th, 2021 at 17:02, Micha? Górny <mgorny@gentoo.org> wrote:
> On Mon, 2021-09-13 at 12:08 +0200, Ulrich Mueller wrote:
> > > > > > > On Mon, 13 Sep 2021, Sheng Yu wrote:
> >
> > > -The archive contains a number of files, stored in a single
> > > directory
> > > -whose name should match the basename of the package file. However,
> > > -the implementation must be able to process an archive where
> > > -the directory name is mismatched. There should be no explicit
> > > archive
> > > -member entry for the directory.
> > > +The archive contains a number of files. All package-related files
> > > +should be stored in a single directory whose name matches the CPV
> > > of
> > > +the package file. However, the implementation must be able to
> > > process
> > > +an archive where the directory name is mismatched. There should be
> > > no
> > > +explicit archive member entry for the directory.
> >
> > I wonder about CPV here. That's ${CATEGORY}/${P} and contains a slash,
> > so it cannot be the name of a directory. Also, what about the package
> > revision?
>
> Please restore the previous wording. The GLEP deliberately did not
> enforce a specific filename because it's about internal format.

Got it, but maybe we need to add a requirement for human readability.
Since users should not have to check the data within the metadata.

> >
> > > +6. The package manifest data file ``Manifest`` (required).
> > > +
> > > +7. A signature for the package Manifest file ``Manifest.sig``
> > > + (optional).
> >
> > Given that the outer archive is uncompressed tar, every file will be
> > zero-padded to a full block which adds some amount of bloat. So, could
> > the signature be inlined in the Manifest file? That's also what GLEP
> > 74
> > specifies.
>
> Using inline signature in Manifest makes sense.

This makes sense but leads to another problem: we allowed user-defined
GPG commands, which gives us no control over exactly what format is
generated. And I do not feel hard-code "--clear-sign" and "--detach-sign"
is good practice.

> >
> > Also, IIRC one of the goals of the format was to allow partial
> > download
> > of metadata. That will only work if the Manifest file will be the
> > first
> > file in the archive (or at least appear before the image archive).
>
> I disagree. This is solved by having detached metadata signature -- you
> can do a partial fetch and verify the metadata directly.
>
> On the other hand, putting Manifest first would make it impossible to
> create the archive from data stream without using temporary files,
> effectively doubling the needed free space. Well, technically you could
> just reserve space and write Manifest later but that would strongly
> depend on the size of PGP signature and that's not something I'd feel
> comfortable relying on.
>

Reserve space also wasted extra space and need a padding file.

Thanks,
Sheng Yu
Re: [GLEP78] Updating specification [ In reply to ]
On Monday, September 13th, 2021 at 18:04, Rich Freeman <rich0@gentoo.org> wrote:
>
> On Mon, Sep 13, 2021 at 5:02 PM Micha? Górny <mgorny@gentoo.org> wrote:
> >
> > On Mon, 2021-09-13 at 12:08 +0200, Ulrich Mueller wrote:
> > >
> > > Also, IIRC one of the goals of the format was to allow partial
> > > download
> > > of metadata. That will only work if the Manifest file will be the
> > > first
> > > file in the archive (or at least appear before the image archive).
> >
> > I disagree. This is solved by having detached metadata signature -- you
> > can do a partial fetch and verify the metadata directly.
> >
>
> Another option I've tossed out there in the past is having a content
> hash of the metadata and putting that in the filename. That obviously
> won't tell you anything about the contents of the file without reading
> it, but if you're looking for a file with specific metadata you could
> predict its filename. This was intended to work with having multiple
> hashes for the same file using subsets of the metadata, using symbolic
> links.
>
> The thinking here is that you'd just hash a subset of metadata useful
> for identifying what file you'd want to download, such as CHOST,
> linked dependency versions, use flags, etc. You'd probably hash it
> with/without stuff like use flags so that you could either take a shot
> at getting the file exactly configured how you want, or accepting a
> version with any set of flags.
>
> Of course, this idea goes in direct opposition to your statement about
> not wanting to specify the filename. I get that argument. The intent
> here was to allow portage to go hunting through trusted repositories
> to find packages it can use without having to sync a lot of data - if
> you know the exact filename then a simple GET tells you if it is there
> or not.

Interesting concept, although this should be counted in the
binpkg-multi-instance. A predictable configuration hash, rather than
relying on index to get the difference between variants.

Something like:
bar/foo-1.0-r2-e3b0c44298fc1c149afbf4c8996fb9.gpkg.tar

Thanks,
Sheng Yu