Mailing List Archive

RFC: lzma tarball usage
Hello,

Over the course of this year, a lzma-utils buildtime dependency has been
added to a few system packages, to handle .tar.lzma tarballs.
This has huge implications on the requirement of the system toolchain,
which is highly disturbing from a minimal (lets say embedded) systems
concern - lzma-utils depends on the C++ compiler and the libstdc++
beast, while a minimal system would like to avoid this at all cost.

I do realize one would remove build-time dependencies and the toolchain
on an embedded system on deployment anyway, but this means gcc USE=nocxx
USE flag is pretty much useless, while it would be nice to use it to
ensure that nothing sneaks in during development that depends on the C++
standard library easily instead of finding things break later.

This is a plea and also a request for comments on the matter of
using .tar.lzma tarballs or not, and for what packages this is
acceptable and for what not.

I'd be happy if some other unpacker is used than lzma-utils - one that
does not depend on libstdc++ - I'm sure it can be done, heck it's done
in integrated form in some other projects in less than a couple
kilobytes of code for the unpacking from a VFS. Meanwhile please
consider using the upstream provided .tar.gz tarballs instead and not
roll patchsets in .lzma just cause you can.

coreutils and linux-headers come to my mind out of system packages right
now. I'm sure more dragons await me.


--
Mart Raudsepp
Gentoo Developer
Mail: leio@gentoo.org
Weblog: http://planet.gentoo.org/developers/leio
Re: RFC: lzma tarball usage [ In reply to ]
On 07-05-2008 16:23:12 +0300, Mart Raudsepp wrote:
> This is a plea and also a request for comments on the matter of
> using .tar.lzma tarballs or not, and for what packages this is
> acceptable and for what not.

Just as a little background:
GNU chose to switch from bzip2 to lzma, for it produces smaller files
(less bandwith) and decompresses faster.

They no longer provide the bzip2 versions of archives for newer releases
IIRC, so it's either tar.gz or tar.lzma.

> I'd be happy if some other unpacker is used than lzma-utils - one that
> does not depend on libstdc++ - I'm sure it can be done, heck it's done
> in integrated form in some other projects in less than a couple
> kilobytes of code for the unpacking from a VFS. Meanwhile please
> consider using the upstream provided .tar.gz tarballs instead and not
> roll patchsets in .lzma just cause you can.

See above why it might not just be "'cause you can".

> coreutils and linux-headers come to my mind out of system packages right
> now. I'm sure more dragons await me.

m4, that one gave me some headaches, because lzma-utils required some
eautoreconf, which introduced a nice cycle.


--
Fabian Groffen
Gentoo on a different level
--
gentoo-dev@lists.gentoo.org mailing list
Re: RFC: lzma tarball usage [ In reply to ]
On Wed, 2008-05-07 at 16:23 +0300, Mart Raudsepp wrote:

> I'd be happy if some other unpacker is used than lzma-utils - one that
> does not depend on libstdc++ - I'm sure it can be done, heck it's done
> in integrated form in some other projects in less than a couple
> kilobytes of code for the unpacking from a VFS. Meanwhile please
> consider using the upstream provided .tar.gz tarballs instead and not
> roll patchsets in .lzma just cause you can.

busybox has unlzma and seems to be a part of "system".

Should also be easy to create a really tiny unlzma from the busybox
source and ship with portage, or create a patch for tar or something.

-nc


--
gentoo-dev@lists.gentoo.org mailing list
Re: RFC: lzma tarball usage [ In reply to ]
On Wed, May 7, 2008 at 3:23 PM, Mart Raudsepp <leio@gentoo.org> wrote:
> I'd be happy if some other unpacker is used than lzma-utils - one that
> does not depend on libstdc++ - I'm sure it can be done, heck it's done
> in integrated form in some other projects in less than a couple
> kilobytes of code for the unpacking from a VFS. Meanwhile please
> consider using the upstream provided .tar.gz tarballs instead and not
> roll patchsets in .lzma just cause you can.

tar-1.20 has lzma support, so maybe it could handle this too, once it
goes into stable
--
gentoo-dev@lists.gentoo.org mailing list
Re: RFC: lzma tarball usage [ In reply to ]
>>>>> On Wed, 07 May 2008, Natanael Copa wrote:

> busybox has unlzma and seems to be a part of "system".

> Should also be easy to create a really tiny unlzma from the busybox
> source and ship with portage, or create a patch for tar or something.

The decoder of lzma-utils is also written in C only.

So it would also be possible to compile "lzmadec" without any need
for C++. Just call "make" in subdirs liblzmadec and lzmadec.

Ulrich
--
gentoo-dev@lists.gentoo.org mailing list
Re: RFC: lzma tarball usage [ In reply to ]
Hi,
I sent this to -dev to, but I think as an ordinary user I can't write there...

On Wed, May 7, 2008 at 3:23 PM, Mart Raudsepp <leio@gentoo.org> wrote:
> I'd be happy if some other unpacker is used than lzma-utils - one that
> does not depend on libstdc++ - I'm sure it can be done, heck it's done
> in integrated form in some other projects in less than a couple
> kilobytes of code for the unpacking from a VFS. Meanwhile please
> consider using the upstream provided .tar.gz tarballs instead and not
> roll patchsets in .lzma just cause you can.

tar-1.20 has lzma support, so maybe it could handle this too, once it
goes into stable.
--
gentoo-dev@lists.gentoo.org mailing list
Re: RFC: lzma tarball usage [ In reply to ]
>>>>> On Wed, 7 May 2008, Benedikt Morbach wrote:

> tar-1.20 has lzma support, so maybe it could handle this too, once it
> goes into stable

This doesn't help, since it needs the lzma binary as a filter.

Ulrich
--
gentoo-dev@lists.gentoo.org mailing list
Re: RFC: lzma tarball usage [ In reply to ]
On Wed, 2008-05-07 at 16:23 +0300, Mart Raudsepp wrote:
> I do realize one would remove build-time dependencies and the toolchain
> on an embedded system on deployment anyway, but this means gcc USE=nocxx
> USE flag is pretty much useless, while it would be nice to use it to
> ensure that nothing sneaks in during development that depends on the C++
> standard library easily instead of finding things break later.

It's a pain in the ass for Release Engineering, too. At this point,
we're looking into how we need to modify the bootstrap sequence to
accommodate people using lzma for system (and lower) packages.

http://bugs.gentoo.org/show_bug.cgi?id=220074

We're already getting reports of this due to someone deciding that it'd
be a good idea to use lzma for our daily portage snapshots without any
discussion here. Luckily, we still have the other tarballs to use, too.

--
Chris Gianelloni
Release Engineering Strategic Lead
Games Developer
Re: RFC: lzma tarball usage [ In reply to ]
Hi,


I think, as long as there is no really minimal lzmadec available
yet (as standalone package), we should more standard compressors
like gzip or bzip2. Adding that whole bunch of deps just to
save a few bytes IMHO isn't worth it.


cu
--
---------------------------------------------------------------------
Enrico Weigelt == metux IT service - http://www.metux.de/
---------------------------------------------------------------------
Please visit the OpenSource QM Taskforce:
http://wiki.metux.de/public/OpenSource_QM_Taskforce
Patches / Fixes for a lot dozens of packages in dozens of versions:
http://patches.metux.de/
---------------------------------------------------------------------
--
gentoo-dev@lists.gentoo.org mailing list
Re: RFC: lzma tarball usage [ In reply to ]
Enrico Weigelt wrote:
> I think, as long as there is no really minimal lzmadec available
> yet (as standalone package), we should more standard compressors
> like gzip or bzip2. Adding that whole bunch of deps just to
> save a few bytes IMHO isn't worth it.

Keep in mind that this might mean doing our own repackaging of upstream
if they don't have a supported option. I think the only other option
would be to create an "lzmalite" package or something like that which
simply contains the decompressor in ordinary C. You could really turn
that into a separate package like gentoolkit or whatever - I wouldn't
actually embed the code into portage since that isn't the unix way and
it just forced other package managers (and other distros) to do the same
thing. An lzmalite package could have a life of its own and as a result
benefit from fewer bugs/etc.

But, I'm not going to be the one writing the thing, so feel free to not
listen to any of this... :)
--
gentoo-dev@lists.gentoo.org mailing list
Re: RFC: lzma tarball usage [ In reply to ]
Richard Freeman wrote:
> Enrico Weigelt wrote:
>> I think, as long as there is no really minimal lzmadec available
>> yet (as standalone package), we should more standard compressors
>> like gzip or bzip2. Adding that whole bunch of deps just to save a
>> few bytes IMHO isn't worth it.
>
> Keep in mind that this might mean doing our own repackaging of
> upstream if they don't have a supported option. I think the only
> other option would be to create an "lzmalite" package or something
> like that which simply contains the decompressor in ordinary C. You
> could really turn that into a separate package like gentoolkit or
> whatever - I wouldn't actually embed the code into portage since that
> isn't the unix way and it just forced other package managers (and
> other distros) to do the same thing. An lzmalite package could have a
> life of its own and as a result benefit from fewer bugs/etc.
>
> But, I'm not going to be the one writing the thing, so feel free to
> not listen to any of this... :)
All upstreams in question still use gzip, they have only dropped bzip2
support in favor of lzma.
--
gentoo-dev@lists.gentoo.org mailing list
Re: RFC: lzma tarball usage [ In reply to ]
Ulrich Mueller <ulm@gentoo.org> posted
18465.49899.138685.587639@a1i15.kph.uni-mainz.de, excerpted below, on
Wed, 07 May 2008 16:55:39 +0200:

> The decoder of lzma-utils is also written in C only.
>
> So it would also be possible to compile "lzmadec" without any need for
> C++. Just call "make" in subdirs liblzmadec and lzmadec.

What about USE=decode-only or something similar for lzma-utils, then? If
desired, it could even be masked on "normal" profiles, but would then be
there for the embedded and releng folks.

--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman

--
gentoo-dev@lists.gentoo.org mailing list
Re: Re: RFC: lzma tarball usage [ In reply to ]
flameeyes@gmail.com (Diego 'Flameeyes' Pettenò) writes:

> USE=cxx should do just fine, it will disable the C++-related parts,
> whatever they are. Sincerely I'd quite like to enable it on my vserver's
> build chroots too.

Should that be USE=-cxx? The help for USE=cxx says that this builds
support for C++.
--
gentoo-dev@lists.gentoo.org mailing list
Re: RFC: lzma tarball usage [ In reply to ]
Mart Raudsepp wrote:
> Hello,
>
> Over the course of this year, a lzma-utils buildtime dependency has been
> added to a few system packages, to handle .tar.lzma tarballs.
> This has huge implications on the requirement of the system toolchain,
> which is highly disturbing from a minimal (lets say embedded) systems
> concern - lzma-utils depends on the C++ compiler and the libstdc++
> beast, while a minimal system would like to avoid this at all cost.

I'd rewrite the C++ code in plain C if isn't that complex...

lu

--

Luca Barbato
Gentoo Council Member
Gentoo/linux Gentoo/PPC
http://dev.gentoo.org/~lu_zero

--
gentoo-dev@lists.gentoo.org mailing list
Re: RFC: lzma tarball usage [ In reply to ]
>>>>> On Thu, 08 May 2008, Diego 'Flameeyes' Pettenò wrote:

>>> So it would also be possible to compile "lzmadec" without any need
>>> for C++. Just call "make" in subdirs liblzmadec and lzmadec.
>>
>> What about USE=decode-only or something similar for lzma-utils,
>> then? If desired, it could even be masked on "normal" profiles, but
>> would then be there for the embedded and releng folks.

> USE=cxx should do just fine, it will disable the C++-related parts,
> whatever they are. Sincerely I'd quite like to enable it on my
> vserver's build chroots too.

See <https://bugs.gentoo.org/show_bug.cgi?id=220899> for a first
attempt of an ebuild.

Ulrich
--
gentoo-dev@lists.gentoo.org mailing list
Re: Re: RFC: lzma tarball usage [ In reply to ]
Ryan Hill wrote:
> On Wed, 07 May 2008 16:23:12 +0300
> Mart Raudsepp <leio@gentoo.org> wrote:
>
>
>> Hello,
>>
>> Over the course of this year, a lzma-utils buildtime dependency has
>> been added to a few system packages, to handle .tar.lzma tarballs.
>> This has huge implications on the requirement of the system toolchain,
>> which is highly disturbing from a minimal (lets say embedded) systems
>> concern - lzma-utils depends on the C++ compiler and the libstdc++
>> beast, while a minimal system would like to avoid this at all cost.
>>
>
> The new lzma-utils codebase uses liblzma, written in C. It's at the
> alpha stage but supposedly supports encoding/decoding the current lzma
> format "well enough" (;P). It probably has some fun bugs to find
> and squish.
>
> http://sf.net/mailarchive/forum.php?thread_name=200804251652.58484.lasse.collin%40tukaani.org&forum_name=lzmautils-announce
>
>
According to the mailing list this change was done to fix security holes
in the format and also resulted in a slightly different format that's
incompatible with the previous verion. So lzma 5.x and higher will be a
different on disk format. It's troubling to me that projects are using
lzma when it's on disk format isn't even final and the project has
security issues.
--
gentoo-dev@lists.gentoo.org mailing list
Re: Re: RFC: lzma tarball usage [ In reply to ]
On Thu, 08 May 2008 09:17:08 -0400
Doug Goldstein <cardoe@gentoo.org> wrote:
> It's troubling to me that projects are using lzma when it's on disk
> format isn't even final and the project has security issues.

You mean projects like 'GNU tar'?

--
Ciaran McCreesh
Re: Re: RFC: lzma tarball usage [ In reply to ]
Ciaran McCreesh wrote:
> On Thu, 08 May 2008 09:17:08 -0400
> Doug Goldstein <cardoe@gentoo.org> wrote:
>
>> It's troubling to me that projects are using lzma when it's on disk
>> format isn't even final and the project has security issues.
>>
>
> You mean projects like 'GNU tar'?
>
>
As far as I know Ciaran, all GNU projects have switched or are in the
process of switching to lzma over bzip2. I believe the issue in question
which prompted this original e-mail was due to coreutils. But I could be
wrong.
--
gentoo-dev@lists.gentoo.org mailing list
Re: Re: RFC: lzma tarball usage [ In reply to ]
On Thu, 08 May 2008 09:32:34 -0400
Doug Goldstein <cardoe@gentoo.org> wrote:
> Ciaran McCreesh wrote:
> > On Thu, 08 May 2008 09:17:08 -0400
> > Doug Goldstein <cardoe@gentoo.org> wrote:
> >> It's troubling to me that projects are using lzma when it's on disk
> >> format isn't even final and the project has security issues.
> >
> > You mean projects like 'GNU tar'?
> >
> As far as I know Ciaran, all GNU projects have switched or are in the
> process of switching to lzma over bzip2. I believe the issue in
> question which prompted this original e-mail was due to coreutils.
> But I could be wrong.

You miss my point. GNU tar sometimes changes its on disk format (and
will be doing so again at some point for xattrs), and it's had security
issues.

--
Ciaran McCreesh
Re: Re: RFC: lzma tarball usage [ In reply to ]
Doug Goldstein wrote:
> Ciaran McCreesh wrote:
>> On Thu, 08 May 2008 09:17:08 -0400
>> Doug Goldstein <cardoe@gentoo.org> wrote:
>>
>>> It's troubling to me that projects are using lzma when it's on disk
>>> format isn't even final and the project has security issues.
>>>
>>
>> You mean projects like 'GNU tar'?
>>
>>
> As far as I know Ciaran, all GNU projects have switched or are in the
> process of switching to lzma over bzip2. I believe the issue in
> question which prompted this original e-mail was due to coreutils. But
> I could be wrong.
Additionally to follow myself up, I believe one of the security issues
was execution of arbitrary data either when untarred or just
decompressed (assuming a specially crafted lzma file).

Some of the other fun bits are lzma requires autotools but autotools are
going to be compressed with lzma. So if we ever need to autoreconf, we
have a chicken/egg issue.
--
gentoo-dev@lists.gentoo.org mailing list
Re: Re: RFC: lzma tarball usage [ In reply to ]
Ciaran McCreesh wrote:
> On Thu, 08 May 2008 09:32:34 -0400
> Doug Goldstein <cardoe@gentoo.org> wrote:
>
>> Ciaran McCreesh wrote:
>>
>>> On Thu, 08 May 2008 09:17:08 -0400
>>> Doug Goldstein <cardoe@gentoo.org> wrote:
>>>
>>>> It's troubling to me that projects are using lzma when it's on disk
>>>> format isn't even final and the project has security issues.
>>>>
>>> You mean projects like 'GNU tar'?
>>>
>>>
>> As far as I know Ciaran, all GNU projects have switched or are in the
>> process of switching to lzma over bzip2. I believe the issue in
>> question which prompted this original e-mail was due to coreutils.
>> But I could be wrong.
>>
>
> You miss my point. GNU tar sometimes changes its on disk format (and
> will be doing so again at some point for xattrs), and it's had security
> issues.
>
>
Fair enough. However, newer GNU tar's are able to untar the older
formats. If you read the lzma changelogs, it appears to imply that newer
ones won't be able to read older formats. The changelog specifically
states if a user they are handling the issue "gracefully" by telling the
user to upgrade or downgrade their lzma.
--
gentoo-dev@lists.gentoo.org mailing list
Re: Re: RFC: lzma tarball usage [ In reply to ]
On Thursday 08 May 2008, Doug Goldstein wrote:
> Additionally to follow myself up, I believe one of the security
> issues was execution of arbitrary data either when untarred or just
> decompressed (assuming a  specially crafted lzma file).

Can you please point me to the location where this is mentioned. I read
through the lzma git log, and I all I could find was data corruption
(which usually is not a security issue) and the mention of the
word "security" inside the announcement.

Thanks,
Robert
Re: RFC: lzma tarball usage [ In reply to ]
On K, 2008-05-07 at 15:34 +0200, Fabian Groffen wrote:
> On 07-05-2008 16:23:12 +0300, Mart Raudsepp wrote:
> > This is a plea and also a request for comments on the matter of
> > using .tar.lzma tarballs or not, and for what packages this is
> > acceptable and for what not.
>
> Just as a little background:
> GNU chose to switch from bzip2 to lzma, for it produces smaller files
> (less bandwith) and decompresses faster.
>
> They no longer provide the bzip2 versions of archives for newer releases
> IIRC, so it's either tar.gz or tar.lzma.
>
> > I'd be happy if some other unpacker is used than lzma-utils - one that
> > does not depend on libstdc++ - I'm sure it can be done, heck it's done
> > in integrated form in some other projects in less than a couple
> > kilobytes of code for the unpacking from a VFS. Meanwhile please
> > consider using the upstream provided .tar.gz tarballs instead and not
> > roll patchsets in .lzma just cause you can.
>
> See above why it might not just be "'cause you can".

"and not roll patchsets in .lzma just cause you can". Cause you can
applies to patchsets mostly. But using .tar.lzma instead of .tar.gz is
also a "because they are available and therefore I can use it"
neglecting the issues of

a) on-disk format is supposedly not even finalized; high potential
breakage of packages in existing ebuilds once lzma-utils gets updated
b) The currently used decompressor package links to libstdc++ (and
portage uses lzma, not lzmadec) unconditionally for most components
c) Potential security issues; details needed, but for other reasons it
makes sense to ban .tar.lzma's until a new C only rewritten lzma-utils
comes along anyway
d) too early adoption in critical system packages - once above issues
are solved, higher levels should be using it first, before critical
system packages (for example shows in the circular dep hell with m4)
e) It has been suggested the support should have been added with new
EAPI instead of local build deps (some of which are missing, for
instance in the hand-rolled for-no-reason-whatsoever .tar.lzma format
net-tools doesn't have a dep in addition to using lzma for no good
reason)

Probably some more.
Base-system, please stop using .tar.lzma for now, thank you.


--
Mart Raudsepp
Gentoo Developer
Mail: leio@gentoo.org
Weblog: http://planet.gentoo.org/developers/leio
Re: RFC: lzma tarball usage [ In reply to ]
On 08-05-2008 21:45:00 +0300, Mart Raudsepp wrote:
> d) too early adoption in critical system packages - once above issues
> are solved, higher levels should be using it first, before critical
> system packages (for example shows in the circular dep hell with m4)

been there, done that.

> e) It has been suggested the support should have been added with new
> EAPI instead of local build deps (some of which are missing, for
> instance in the hand-rolled for-no-reason-whatsoever .tar.lzma format
> net-tools doesn't have a dep in addition to using lzma for no good
> reason)

Chill, relax and cool down. Instead, just ask those who decided to
follow upstream why and if they have even thought about the issues you
brought up.


--
Fabian Groffen
Gentoo on a different level
--
gentoo-dev@lists.gentoo.org mailing list
Re: RFC: lzma tarball usage [ In reply to ]
On Thu, May 8, 2008 at 9:09 PM, Fabian Groffen <grobian@gentoo.org> wrote:
>
> > e) It has been suggested the support should have been added with new
> > EAPI instead of local build deps (some of which are missing, for
> > instance in the hand-rolled for-no-reason-whatsoever .tar.lzma format
> > net-tools doesn't have a dep in addition to using lzma for no good
> > reason)
>
> Chill, relax and cool down. Instead, just ask those who decided to
> follow upstream why and if they have even thought about the issues you
> brought up.
>

Note that we're also speaking about downstream lzma archives. Like in
sys-apps/net-tools, where lzma hasn't been adopted even by upstream.

Regards,
--
Santiago M. Mola
Jabber ID: cooldwind@gmail.com
--
gentoo-dev@lists.gentoo.org mailing list

1 2  View All