Mailing List Archive

mount portage from squashfs
Proposal, create snapshots of portage as squashfs iso, to be used in place
of /usr/portage directory.
prior art: see #1

Already working here: one central server which keep the portage full tree
and that after a sync create "portage.sqsh" a squashfs 4.0 iso.

Advantages are mainly:
- cleaner root directory (ext4: du -sh /usr/portage ~= 600M | find
/g/portage | wc -l ~= 130000)
- faster `emerge --sync` with fast connections
- faster `emerge -uDpvN world`
- less cpu/disk load on the server (if not serving from memory)

Disadvantages
- need to mount portage, or to use autofs
- need kernel >= 2.6.30
- bigger rsync transfer size (?= 2x) #2
- bigger initial transfer size, lzma snapshot currently weight 30.8M,
portage.sqsh 45M

How it's done here:
Currently on the dispatcher the following run after every emerge --sync:

mksquashfs /usr/portage /srv/portage.sqsh \
-noappend -no-exports -no-recovery -force-uid 250 -force-gid 250

The clients run from cron the following:
umount /g/portage 2>/dev/null \
; cp /srv/server/portage.sqsh /var/tmp/portage.sqsh \
&& mount /usr/portage

/etc/fstab:
/srv/server/portage.sqsh /usr/portage squashfs loop,ro,noauto 1 1

some real data:

stats for a portage sync, one day

Number of files: 136429
Number of files transferred: 326
Total file size: 180345216 bytes
Total transferred file size: 1976658 bytes
Literal data: 1976658 bytes
Matched data: 0 bytes
File list size: 3377038
File list generation time: 0.001 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 47533
Total bytes received: 4120255

sent 47533 bytes received 4120255 bytes 124411.58 bytes/sec
total size is 180345216 speedup is 43.27

stats for a portage.sqsh sync, one day

Number of files: 1
Number of files transferred: 1
Total file size: 46985216 bytes
Total transferred file size: 46985216 bytes
Literal data: 8430976 bytes
Matched data: 38554240 bytes
File list size: 27
File list generation time: 0.001 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 48096
Total bytes received: 8454837

sent 48096 bytes received 8454837 bytes 5668622.00 bytes/sec
total size is 46985216 speedup is 5.53



#1 http://forums.gentoo.org/viewtopic-p-2218914.html
http://www.mail-archive.com/gentoo-dev@gentoo.org/msg05240.html

#2 May be mitigated by mksquashfs '--sort' option, to be tested

- francesco (vivo)
Re: mount portage from squashfs [ In reply to ]
I'm doing more or less the same here for a few years already, works
pretty well. I think others are as well.
Doing the sync only on one system (and applying patches to the tree
where needed) and then locally copying the image to systems.

Big advantage I see is more efficient access to portage tree and less
disk usage on all systems (even the master system except for the
duration of the sync)

On master system I do the following:
dd if=/dev/zero of=/tmp/portage.img size=384m
mkfs.ext2 /tmp/portage.img
mkdir /tmp/portage
mount /tmp/portage.img /tmp/portage -o loop
cp -a /usr/portage/* /tmp/portage
umount /usr/portage
mount /tmp/portage /usr/portage --move
emerge --sync
(apply patches)
mksquashfs /usr/portage /tmp/portage.sqsh $options
umount /usr/portage
rm -f /tmp/portage.img
mv /tmp/portage.sqsh $target
mount $target /usr/portage -o loop

This is pretty quick (and ext2 image on tmpfs uses less memory at the
end of the day than what would be needed for putting portage tree on
tmpfs)
After that I can fetch the sqsh image from any client system when I do
updates over there (I even keep a few weeks worth of images on the
master in case one would be needed)

Only thing to take care of is mounting RW filesystem
over /usr/portage/distfiles and /usr/portage/packages or adjusting
make.conf so portage can download distfiles and if asked to same
binpkgs.

If there is some interest, I can share my script which does the whole
process.

Having tree snapshots that are currently available as tarballs also
available as squashfs images would be nice, more or less same download
size but easier to access (e.g. no need to unpack)

Bruno

On Wed, 12 August 2009 Francesco R <vivo75@gmail.com> wrote:
> Proposal, create snapshots of portage as squashfs iso, to be used in
> place of /usr/portage directory.
> prior art: see #1
>
> Already working here: one central server which keep the portage full
> tree and that after a sync create "portage.sqsh" a squashfs 4.0 iso.
>
> Advantages are mainly:
> - cleaner root directory (ext4: du -sh /usr/portage ~= 600M | find
> /g/portage | wc -l ~= 130000)
> - faster `emerge --sync` with fast connections
> - faster `emerge -uDpvN world`
> - less cpu/disk load on the server (if not serving from memory)
>
> Disadvantages
> - need to mount portage, or to use autofs
> - need kernel >= 2.6.30
> - bigger rsync transfer size (?= 2x) #2
> - bigger initial transfer size, lzma snapshot currently weight 30.8M,
> portage.sqsh 45M
>
> How it's done here:
> Currently on the dispatcher the following run after every emerge
> --sync:
>
> mksquashfs /usr/portage /srv/portage.sqsh \
> -noappend -no-exports -no-recovery -force-uid 250
> -force-gid 250
>
> The clients run from cron the following:
> umount /g/portage 2>/dev/null \
> ; cp /srv/server/portage.sqsh /var/tmp/portage.sqsh \
> && mount /usr/portage
>
> /etc/fstab:
> /srv/server/portage.sqsh /usr/portage squashfs loop,ro,noauto 1 1
>
> some real data:
>
> stats for a portage sync, one day
>
> Number of files: 136429
> Number of files transferred: 326
> Total file size: 180345216 bytes
> Total transferred file size: 1976658 bytes
> Literal data: 1976658 bytes
> Matched data: 0 bytes
> File list size: 3377038
> File list generation time: 0.001 seconds
> File list transfer time: 0.000 seconds
> Total bytes sent: 47533
> Total bytes received: 4120255
>
> sent 47533 bytes received 4120255 bytes 124411.58 bytes/sec
> total size is 180345216 speedup is 43.27
>
> stats for a portage.sqsh sync, one day
>
> Number of files: 1
> Number of files transferred: 1
> Total file size: 46985216 bytes
> Total transferred file size: 46985216 bytes
> Literal data: 8430976 bytes
> Matched data: 38554240 bytes
> File list size: 27
> File list generation time: 0.001 seconds
> File list transfer time: 0.000 seconds
> Total bytes sent: 48096
> Total bytes received: 8454837
>
> sent 48096 bytes received 8454837 bytes 5668622.00 bytes/sec
> total size is 46985216 speedup is 5.53
>
>
>
> #1 http://forums.gentoo.org/viewtopic-p-2218914.html
> http://www.mail-archive.com/gentoo-dev@gentoo.org/msg05240.html
>
> #2 May be mitigated by mksquashfs '--sort' option, to be tested
>
> - francesco (vivo)
Re: mount portage from squashfs [ In reply to ]
On Wed, Aug 12, 2009 at 05:17:55PM +0200, Francesco R wrote:
> Proposal, create snapshots of portage as squashfs iso, to be used in
> place of /usr/portage directory.
To all of these suggestions, I'd like to point out that if you're
willing to pay the same cost in administration (maintaining a separate
filesystem for /usr/portage), then you can have EVERYTHING in the
advantages list, and none of the things in the disadvantages list by
simply using a small reiserfs space for /usr/portage, with tail-packing
enabled.

For the rsync.g.o main rotation servers, we actually do that, just
RAM-backed to serve files as fast as possible without hitting disk.

When you removed bandwidth limitations and disk limitations on the
client side, I believe the record time for a emerge --sync that was 24
hours out of date was somewhere around 23 seconds.

If you really wanted to get the rsync transfer size down, see what you
can do about the 'file list size' section, which is eating up a lot of
the download gains with the classical rsync:// sync.

--
Robin Hugh Johnson
Gentoo Linux Developer & Infra Guy
E-Mail : robbat2@gentoo.org
GnuPG FP : 11AC BA4F 4778 E3F6 E4ED F38E B27B 944E 3488 4E85
Re: mount portage from squashfs [ In reply to ]
On Wed, Aug 12, 2009 at 10:51 PM, Robin H. Johnson <robbat2@gentoo.org>wrote:

> On Wed, Aug 12, 2009 at 05:17:55PM +0200, Francesco R wrote:
> > Proposal, create snapshots of portage as squashfs iso, to be used in
> > place of /usr/portage directory.
> To all of these suggestions, I'd like to point out that if you're
> willing to pay the same cost in administration (maintaining a separate
> filesystem for /usr/portage), then you can have EVERYTHING in the
> advantages list, and none of the things in the disadvantages list by
> simply using a small reiserfs space for /usr/portage, with tail-packing
> enabled.


well squashfs has a pair of advantages over reiserfs, duplicate file
detection, compression and a bright future.

>
>
> For the rsync.g.o main rotation servers, we actually do that, just
> RAM-backed to serve files as fast as possible without hitting disk.
>

only possibility to cope with the load they have I think


>
> When you removed bandwidth limitations and disk limitations on the
> client side, I believe the record time for a emerge --sync that was 24
> hours out of date was somewhere around 23 seconds.


23 seconds are ... a lot without bandwidth and disk limitation, disk time
for 50 MB is 1 sec (or even much less), and it transfer the whole iso in
that time


>
> If you really wanted to get the rsync transfer size down, see what you
> can do about the 'file list size' section, which is eating up a lot of
> the download gains with the classical rsync:// sync.
>

IMHKnowledge the only way to do this is to have one index file (or files)
the file should contain a triple ctime, status and file name (ordered by
ctime possibly descending), and provide a cheap way to retrieve the list of
files changed in a certain amount of time,
status would be needed mainly for deleted files, but it can be modified or
added too.
Portage already has timestamp in /metadata so that skew of the client clock
are not a problem, skew on the server would be.

As a side advantage this could be served by an http server, still having
rsync as an option. Currently rsync already use the option --whole-file and
does only time/size check, if those differ, it downloat the full (little)
file. Right ?

This would be interesting too, but what happen if the timestamp in the
client is too old or absent? fallback to rsync? How much time or how much
size would the index file be allowed to grow?

P.S. as a http client curl would be more useful than wget because it permit
to download more file in one session