Mailing List Archive

1 2  View All
Re: Re: rsync speed and space taken [ In reply to ]
Travis Tilley posted <416A6264.9020805@gentoo.org>, excerpted below, on
Mon, 11 Oct 2004 06:37:24 -0400:

> Duncan wrote:
>> .. About 3 and a half minutes. I just timed it.
>
> rm -rf /usr/portage and time it again.

Well:

#ll /usr/portage
lrwxrwxrwx 1 root root 6 Sep 26 18:02 /usr/portage -> /mnt/p/

I don't have /usr/portage, except as a symlink, for those apps that can't
follow PORTDIR=/mnt/p in /etc/make.conf, so unless emerge sync is one of
those (and let's /hope/ not), your suggestion above would have little
impact, here.

However, removing $PORTDIR is what I assume you meant, so after umounting
/mnt/p/src ($DISTDIR) and /mnt/p/pkg ($PKGDIR) so as not to lose them (I
learned my lesson when I presumed portage would know enough not to attempt
to rsync its $DISTDIR and $PKGDIR =:^(, but that's been covered in
bugzilla already), I backed up /mnt/p and deleted it, then timed an emerge
sync:

real 3m1.038s
user 0m6.512s
sys 0m25.001s

To get a better direct comparison, I then deleted it again, restored my
backup copy, and (after a few minutes lag time to prevent being called on
the carpet for syncing to quickly) did a standard emerge sync.

real 2m27.357s
user 0m30.395s
sys 0m10.033s

Two and a half minutes, that time, as compared to three minutes total
(from blank) sync, and three and a half(ish) minutes yesterday normal
sync. Note that while I consider all three timings "reasonable", I /do/
occasionally get stuck with a sync mirror that's feeding the initial
file at speeds turning over the tens and hundreds files counter, rather
than the thousands and ten-thousands files counters, in which case I
usually ^C abort the sync and try again, for a different rsync server.
The point is, there's enough variability at that end, that no conclusion
can be drawn due to the 30-second-ish differences in timings that I
measured.

In any case, 45 minutes on a 5Mbit connection as claimed by the upline
poster, definitely means something other than raw portage tree size or
local pipe bandwidth is the problem. Someone mentioned it took them about
that long with a 100MHz Pentium and 128M memory, which is what I suggested
the problem may be earlier, local machine performance, not tree size or
bandwidth limitations.

--
Duncan - List replies preferred. No HTML msgs.
"They that can give up essential liberty to obtain a little
temporary safety, deserve neither liberty nor safety." --
Benjamin Franklin



--
gentoo-dev@gentoo.org mailing list
Re: rsync speed and space taken [ In reply to ]
Allen Parker wrote:

>For the person asking, yes, the machine is I/O limited... cel 2.4 128M
>ram 40G disk and LOTS of blog sites :(
>
>
Why not just having a directory / index and base the download / rsync
on what is needed ?

ls -1R /usr/portage is about 2.1MB ...

Then just fetch the (to be)installed ebuilds on a needed basis ?

>~45 minutes nonetheless is a *long* time to wait for an emerge sync to
>complete (or glsa-check for that matter). I second the suggestion of a
>seperate portion of the rsync mirror SPECIFICALLY for patches. perhaps
>with soft/hard-links from ${PN}/${FILESDIR} to ${PATCHDIR}/{$PN} ?
>shouldn't be too much harder to just grab what is needed on-use...
>carpaski? any ideas? perhaps a feature request for .52?
>
>

glsa should have it's own server/client system.

Other thing is, for production, no one NEEDS to upgrade stuff every
single day, if it works, it works. If you are a developer then you
probably already know how to use CVS, PORTAGE_OVERLAY etc etc and you
can put your emerge sync in a cron job at 3:00 in the morning...

For sysadmins, it is anyway a good idea to follow the glsa, I put it on
my startup page of my browser...

Phil

--
gentoo-dev@gentoo.org mailing list
Re: rsync speed and space taken [ In reply to ]
On Mon, 2004-10-11 at 19:46, Paul de Vrieze wrote:
> On Monday 11 October 2004 18:14, Ned Ludd wrote:
> > On Mon, 2004-10-11 at 01:19, Brian Jackson wrote:
> Please also add a check for bzipped gzipped files. CVS doesn't play nice with
> them, and neither does rsync. Those shouldn't be in the tree (except for the
> rescue portage)

Why should rescue portage be in the tree?
anyway, it's gone. but it might be nice to have it as an ebuild, and
slip it in as a system package.


--
Eldad Zack <eldad@gentoo.org>
Re: rsync speed and space taken [ In reply to ]
On 10/12/04 Eldad Zack wrote:

> On Mon, 2004-10-11 at 19:46, Paul de Vrieze wrote:
> > On Monday 11 October 2004 18:14, Ned Ludd wrote:
> > > On Mon, 2004-10-11 at 01:19, Brian Jackson wrote:
> > Please also add a check for bzipped gzipped files. CVS doesn't play
> > nice with them, and neither does rsync. Those shouldn't be in the
> > tree (except for the rescue portage)
>
> Why should rescue portage be in the tree?
> anyway, it's gone. but it might be nice to have it as an ebuild, and
> slip it in as a system package.

What would be the point of that? The rescue portage is only a
pre-packaged version of portage in cases of a corrupted portage install
where you can't use it anymore.

Marius
Re: rsync speed and space taken [ In reply to ]
I don't really see the problem with rsync speeds. I am able to rsync
in a couple minutes. However, the portage cache update takes over an
hour on ppc-macos!! Is this normal, or is there some kind of
workaround? I haven't tried in Linux since my hardware isn't
supported yet (iMac G5.)


Chris

--
gentoo-dev@gentoo.org mailing list
Re: rsync speed and space taken [ In reply to ]
Hi,

2004.3/PPC supports the G5 imac (and also features a Rendezvous distcc
enabled darwin/macos cross compiler) :-)

Updating the cache shouldn't take that long. Could you let me know off
list what filesystem (UFS or HFS+) and Mac OS X release you're using?
Can you send me the output of emerge info?

Best regards,

Pieter Van den Abeele


On 13 Oct 2004, at 19:18, Chris L. Mason wrote:

> I don't really see the problem with rsync speeds. I am able to rsync
> in a couple minutes. However, the portage cache update takes over an
> hour on ppc-macos!! Is this normal, or is there some kind of
> workaround? I haven't tried in Linux since my hardware isn't
> supported yet (iMac G5.)
>
>
> Chris
>
> --
> gentoo-dev@gentoo.org mailing list
>


--
gentoo-dev@gentoo.org mailing list
Re: rsync speed and space taken [ In reply to ]
Jason Stubbs <jstubbs <at> gentoo.org> writes:

> G. RSYNC_EXCLUDEFROM
>

Just a small, possibly stupid suggestion for the future of Portage: would it or
should it be possible to make portage obey USE flags?

for example, putting "-X" in your USE could prevent the synchronisation of
portage/x11-* , portage/kde-* , portage/gnome-* ...

--
Ben XO :: dogsonacid.com




--
gentoo-dev@gentoo.org mailing list
Re: rsync speed and space taken [ In reply to ]
Ben XO posted <loom.20041019T023042-535@post.gmane.org>, excerpted below,
on Tue, 19 Oct 2004 00:34:18 +0000:

> Jason Stubbs <jstubbs <at> gentoo.org> writes:
>
>> G. RSYNC_EXCLUDEFROM
>>
>>
> Just a small, possibly stupid suggestion for the future of Portage: would
> it or should it be possible to make portage obey USE flags?
>
> for example, putting "-X" in your USE could prevent the synchronisation of
> portage/x11-* , portage/kde-* , portage/gnome-* ...

NO! Use flags are NOT DEPEND flags. USE=-X does NOT mean that X won't be
emerged, if a package requiring it is emerged. What it DOES mean is that
packages that have optional features that can be used with X, can be tied
to that X use flag, and will NOT be enabled, if USE=-X.

Thus, USE=-X means for example that links will be built without its X
support (said support /does/ require X libraries be installed and linkable
at runtime, as I found out the hard way, when I was having problems with
X), because it uses the X flag and -X turns off that support for the links
ebuilds. However, it does NOT mean that KDE won't be installed if you
require it, or that it won't install X because it requires it.

Thus, preventing X from syncing based on the -X use flag is
counter-purpose to what the use flags are for.

--
Duncan - List replies preferred. No HTML msgs.
"They that can give up essential liberty to obtain a little
temporary safety, deserve neither liberty nor safety." --
Benjamin Franklin



--
gentoo-dev@gentoo.org mailing list
Re: rsync speed and space taken [ In reply to ]
Philippe Trottier <tchiwam <at> gentoo.org> writes:
> Why not just having a directory / index and base the download / rsync
> on what is needed ?

I submitted a patch (a long time ago) that only rsynced the portage tree for
the installed ebuilds. Worked great. Would have to be extended to fetch new
ebuilds when USE variables or dependencies change.

http://bugs.gentoo.org/show_bug.cgi?id=44526

A really cool benefit would be that you could collect what people installed
from the rsync server logs.

take care
tim



--
gentoo-dev@gentoo.org mailing list
Re: Re: rsync speed and space taken [ In reply to ]
--On Wednesday, October 20, 2004 01:19:28 +0000 Tim Cera
<timcera@earthlink.net> wrote:

> Philippe Trottier <tchiwam <at> gentoo.org> writes:
>> Why not just having a directory / index and base the download / rsync
>> on what is needed ?
>
> I submitted a patch (a long time ago) that only rsynced the portage tree
> for the installed ebuilds. Worked great. Would have to be extended to
> fetch new ebuilds when USE variables or dependencies change.
>
> http://bugs.gentoo.org/show_bug.cgi?id=44526
>
> A really cool benefit would be that you could collect what people
> installed from the rsync server logs.

I'm not sure that I would call that a cool benefit. It seems to come close
to an egregious violation of privacy. I know that there is no promise of
confidentiality in the use of the portage rsync servers, but to actively
and publicly start collecting data about who is using what seems to only
invite more paranoia.

Andy
(JFMuggs)



--
gentoo-dev@gentoo.org mailing list
Re: Re: rsync speed and space taken [ In reply to ]
On Friday 22 October 2004 9:56 pm, Andrew Fant wrote:
> I'm not sure that I would call that a cool benefit. It seems to come close
> to an egregious violation of privacy. I know that there is no promise of
> confidentiality in the use of the portage rsync servers, but to actively
> and publicly start collecting data about who is using what seems to only
> invite more paranoia.
Except that it won't be able to reliably collect the "who" part, only the
"what". Sure, you could log IPs, but many users IPs change more frequently
than they sync.
--
Luke-Jr
Developer, Utopios
http://utopios.org/
Re: Re: rsync speed and space taken [ In reply to ]
On Fri, 22 Oct 2004 22:00:55 +0000 Luke-Jr <luke-jr@utopios.org> wrote:
| On Friday 22 October 2004 9:56 pm, Andrew Fant wrote:
| > I'm not sure that I would call that a cool benefit. It seems to
| > come close to an egregious violation of privacy. I know that there
| > is no promise of confidentiality in the use of the portage rsync
| > servers, but to actively and publicly start collecting data about
| > who is using what seems to only invite more paranoia.
| Except that it won't be able to reliably collect the "who" part, only
| the "what". Sure, you could log IPs, but many users IPs change more
| frequently than they sync.

Naah. You've got more than enough data to figure out who's doing what,
even if you don't look at IP at all. You know roughly when someone last
synced and you can track by what packages they previously had installed.
There's a looooot of information in that...

--
Ciaran McCreesh : Gentoo Developer (Vim, Fluxbox, Sparc, Mips)
Mail : ciaranm at gentoo.org
Web : http://dev.gentoo.org/~ciaranm
Re: rsync speed and space taken [ In reply to ]
On Fri, 22 Oct 2004 22:00:55 +0000, Luke-Jr <luke-jr@utopios.org> wrote:


> On Friday 22 October 2004 9:56 pm, Andrew Fant wrote:
> > I'm not sure that I would call that a cool benefit. It seems to come close
> > to an egregious violation of privacy. I know that there is no promise of
> > confidentiality in the use of the portage rsync servers, but to actively
> > and publicly start collecting data about who is using what seems to only
> > invite more paranoia.
> Except that it won't be able to reliably collect the "who" part, only the
> "what". Sure, you could log IPs, but many users IPs change more frequently
> than they sync.

Even the what won't be reliable, as it won't show all of us who use
emerge-webrsync due to firewall restrictions.

Mike

--
gentoo-dev@gentoo.org mailing list
Re: Re: rsync speed and space taken [ In reply to ]
On Friday 22 October 2004 10:13 pm, Mike wrote:
> On Fri, 22 Oct 2004 22:00:55 +0000, Luke-Jr <luke-jr@utopios.org> wrote:
> > On Friday 22 October 2004 9:56 pm, Andrew Fant wrote:
> > > I'm not sure that I would call that a cool benefit. It seems to come
> > > close to an egregious violation of privacy. I know that there is no
> > > promise of confidentiality in the use of the portage rsync servers, but
> > > to actively and publicly start collecting data about who is using what
> > > seems to only invite more paranoia.
> >
> > Except that it won't be able to reliably collect the "who" part, only the
> > "what". Sure, you could log IPs, but many users IPs change more
> > frequently than they sync.
>
> Even the what won't be reliable, as it won't show all of us who use
> emerge-webrsync due to firewall restrictions.
It would be a decent sample, though. Which is a positive thing, IMO...
I recall when I was a developer I was wondering whether a pkg was ready for
stable or if simply nobody had tried it... In that case, it was a
not-so-common server, so I wouldn't be surprised if all the people using it
stuck w/ stable version...
--
Luke-Jr
Developer, Utopios
http://utopios.org/
Re: Re: rsync speed and space taken [ In reply to ]
Luke-Jr (luke-jr@utopios.org) scribbled:
> On Friday 22 October 2004 10:13 pm, Mike wrote:
> > On Fri, 22 Oct 2004 22:00:55 +0000, Luke-Jr <luke-jr@utopios.org> wrote:
> > > On Friday 22 October 2004 9:56 pm, Andrew Fant wrote:
> > > > I'm not sure that I would call that a cool benefit. It seems to come
> > > > close to an egregious violation of privacy. I know that there is no
> > > > promise of confidentiality in the use of the portage rsync servers, but
> > > > to actively and publicly start collecting data about who is using what
> > > > seems to only invite more paranoia.
> > >
> > > Except that it won't be able to reliably collect the "who" part, only the
> > > "what". Sure, you could log IPs, but many users IPs change more
> > > frequently than they sync.
> >
> > Even the what won't be reliable, as it won't show all of us who use
> > emerge-webrsync due to firewall restrictions.
> It would be a decent sample, though. Which is a positive thing, IMO...
> I recall when I was a developer I was wondering whether a pkg was ready for
> stable or if simply nobody had tried it... In that case, it was a
> not-so-common server, so I wouldn't be surprised if all the people using it
> stuck w/ stable version...

I'm also wary of the idea of logging what people have installed on their
machines... It's too prone to abuse. There are other solutions to your
dilemma above; say, asking on gentoo-user ;)

I don't think the potential for abuse is worth the knowledge gain.

Cooper.

--
gentoo-dev@gentoo.org mailing list
Re: Re: rsync speed and space taken [ In reply to ]
How about getting only the installed ebuilds + *all* the ebuilds that
may have potential patent violation by default? :P

On Sat, 23 Oct 2004 09:29:00 -0400, Jason Cooper <gentoo@lakedaemon.net> wrote:
> Luke-Jr (luke-jr@utopios.org) scribbled:
>
>
> > On Friday 22 October 2004 10:13 pm, Mike wrote:
> > > On Fri, 22 Oct 2004 22:00:55 +0000, Luke-Jr <luke-jr@utopios.org> wrote:
> > > > On Friday 22 October 2004 9:56 pm, Andrew Fant wrote:
> > > > > I'm not sure that I would call that a cool benefit. It seems to come
> > > > > close to an egregious violation of privacy. I know that there is no
> > > > > promise of confidentiality in the use of the portage rsync servers, but
> > > > > to actively and publicly start collecting data about who is using what
> > > > > seems to only invite more paranoia.
> > > >
> > > > Except that it won't be able to reliably collect the "who" part, only the
> > > > "what". Sure, you could log IPs, but many users IPs change more
> > > > frequently than they sync.
> > >
> > > Even the what won't be reliable, as it won't show all of us who use
> > > emerge-webrsync due to firewall restrictions.
> > It would be a decent sample, though. Which is a positive thing, IMO...
> > I recall when I was a developer I was wondering whether a pkg was ready for
> > stable or if simply nobody had tried it... In that case, it was a
> > not-so-common server, so I wouldn't be surprised if all the people using it
> > stuck w/ stable version...
>
> I'm also wary of the idea of logging what people have installed on their
> machines... It's too prone to abuse. There are other solutions to your
> dilemma above; say, asking on gentoo-user ;)
>
> I don't think the potential for abuse is worth the knowledge gain.
>
> Cooper.
>
>
>
> --
> gentoo-dev@gentoo.org mailing list
>
>

--
gentoo-dev@gentoo.org mailing list
Re: Re: rsync speed and space taken [ In reply to ]
Roman Gaufman wrote:

>How about getting only the installed ebuilds + *all* the ebuilds that
>may have potential patent violation by default? :P
>
>
That is a terrible argument - patent violating ebuilds. The ebuilds are
not the applications, just the instructions on how to download, compile
and install on to a Gentoo system. So even if the ebuilds provide
instructions on how to download them, I really don't think anyone could
argue you infringed any patent until you downloaded, installed and used
the application.


--
gentoo-dev@gentoo.org mailing list
Re: Re: rsync speed and space taken [ In reply to ]
>That is a terrible argument - patent violating ebuilds. The ebuilds are
>not the applications, just the instructions on how to download, compile
>and install on to a Gentoo system. So even if the ebuilds provide
>instructions on how to download them, I really don't think anyone could
>argue you infringed any patent until you downloaded, installed and used
>the application.

ebuilds that may have *potential* patent violation. -- Even clearer,
ebuilds that install patent infrinding software.

In anycase, what I mean is ebuilds like those that (optionally) fetch w32codecs
are downloaded by default along side new versions of the installed
ebuilds -- then the collected data cannot be used against anyone but
still be useful for general analysis and takes a load of the rsync
mirrors.

I wasnt serious about the suggestion either way though, hense the
smiley, so relax!

--
gentoo-dev@gentoo.org mailing list
Re: Re: rsync speed and space taken [ In reply to ]
On Wed, 20 Oct 2004, Tim Cera wrote:

> I submitted a patch (a long time ago) that only rsynced the portage
> tree for the installed ebuilds. Worked great. Would have to be
> extended to fetch new ebuilds when USE variables or dependencies
> change.
>
> http://bugs.gentoo.org/show_bug.cgi?id=44526
>
> A really cool benefit would be that you could collect what people installed
> from the rsync server logs.

I'd also like it to be extended so that it could also sync an additional
watchlist of packages. The main purpose of this is, "I'm planning on
installing Foo on Saturday. I want my portage to be up to date before I
start the install, but I don't want to sync twice in 24 hours, nor do I
want to sync every package." Of course, since one could simply sync the
full set, possibly minus a hand-crafted excludefrom list, this is a much
lesser item.

I'm impressed; it applied cleanly after all this time (offset 334
lines on the second chunk.)

Personally, I don't think that the rsync logs would be that significant.
If I did, I wouldn't contemplate using this option. Of course, I'm
assuming a watch list, which would throw a certain amount of very needed
sand into those stats.

Ed

--
gentoo-dev@gentoo.org mailing list
Re: rsync speed and space taken [ In reply to ]
Andrew Fant <andrew.fant <at> tufts.edu> writes:
> I'm not sure that I would call that a cool benefit. It seems to come close
> to an egregious violation of privacy. I know that there is no promise of
> confidentiality in the use of the portage rsync servers, but to actively
> and publicly start collecting data about who is using what seems to only
> invite more paranoia.

Let's say that limiting the rsync to installed packages has a side-effect.
Whether it is a benefit or not depends on many factors.

I would say that the collection of installed package statistics is NOT a reason
to rsync on installed packages only. The reasons to rsync on installed packages
is to reduce the load on the rsync servers and to make the rsync faster (rsync
speed was a definite problem on my old laptop). When I was using rsync to run
against installed packages, I was bringing down about 15,000 files. Note that
the total size of the portage tree is irrelevant.

I took for granted that the ONLY statistics that would be collected would be
statistics on the entire community.
For example:
x% of gentoo users install metalog
y% of gentoo users install syslogd
z% of gentoo users install syslog-ng
...etc.

I also imagined that, like in the patch I submitted, the rsync against installed
packages was an option. The default would be a full rsync. Just like
gentoo-stats is an option (in fact gentoo-stats sends a bunch of data, and you
can choose anonymity if you wish).

take care
tim


--
gentoo-dev@gentoo.org mailing list

1 2  View All