Mailing List Archive

1 2  View All
Re: Portage load control [ In reply to ]
On 5/12/23 09:46, Peter Humphrey wrote:
> On Friday, 12 May 2023 00:08:03 BST Mark Knecht wrote:
>> On Thu, May 11, 2023 at 3:07?PM Peter Humphrey <peter@prh.myzen.co.uk>
>>
>> wrote:
>>> On Thursday, 11 May 2023 17:18:17 BST Mark Knecht wrote:
>> <SNIP>
>>
>>>> The ''problem' is this can easily hit 100% of the cores you have in the
>>>> machine if not sensibly set. (You choose what's 'sensible')
>>> Once again, --load-average is being ignored. Why is it there? Surely, it
>>> must be to mitigate the worst effects of that N*K, but it isn't doing so.
>> From your description, yeah, it's weird, but possibly it's managing it over
>> (for instance) over much longer time frames or something like that.
>>
>> Or possibly it just doesn't work.
> That's it, I'm sure.
>
>> Or possibly whoever wrote the man page misunderstood.
> Load-average has been around for a long time.
>
>> Poking around a bit this morning I took the path at the bottom of the
>> link I gave you to the Portage niceness page. It says scheduling policy
>> control started with portage-3.0.35 which on paper sounds sort of recent.
>> Possibly a bug crept in, but I was curious as to what you have for
>> PORTAGE_SCHEDULING_POLICY, if any, and whether you need to enable some
>> sort of scheduling to get this under control?
>>
>> https://wiki.gentoo.org/wiki/Portage_niceness
> I have no PORTAGE_SCHEDULING_POLICY, or not that I can find. It seems to me
> that such a policy is to do with the running of portage in the OS, rather than
> how it launches its own emerge jobs. Is that right?
>
>> Anyway, I feel for ya.
> :)
>
You can read /usr/share/portage/config/make.conf.example for an
explanation. All children processes will use that. I can run portage and
play games on the same system with my settings.
Re: Portage load control [ In reply to ]
On Friday, 12 May 2023 14:37:13 BST Jack wrote:

> I still see two separate issues. First, you are saying that emerge
> still launches new jobs when the load is over what is set with
> --load-average. A possible way to test this directly is to run or
> create some job that pushed the load average to over some number, say
> 5.

I have tested it, directly, with emerge. I reported what happened at the start
of this thread. To recap:

I was running 'emerge -e @world'; no extras, no ifs, no buts. Make.conf had
EMERGE_DEFAULT_OPTS="--jobs --load-average=40 ... ; MAKEOPTS was not
specified.

As the six-hour job proceeded and portage was working with larger packages in
the plasma group, the load average rose to "72 75 75", clearly much higher
than the 40 I'd specified and continuing over at least 15 minutes. Yet portage
was still starting more emerge jobs to keep the load that high.

--->8

> The second issues is whether MAKEOPTS --load-average is actually getting
> passed to each job and whether make is then observing that limit.
> Whether this is the case or not is independent of the first issue. I
> suppose this could be tested without even involving emerge. Given you
> observed an actual load of 72 (do I remember correctly?) with both
> --load-averages set significantly below this, you could test, as long as
> you have a single compile which is busy enough.

I had no MAKEOPTS, so I assume -j took the default value of num_cores=24. I
don't know what the -l default is.

Here's the rub, though: whatever values were taken by -j, -l or --jobs, the
value of --load-average should not have been exceeded so grossly and
persistently.

I'd like to thank everyone who's offered ideas and suggestions, but I'm just
going to have to wait for the outcome of the bug I reported.

--
Regards,
Peter.
Re: Portage load control [ In reply to ]
On Fri, May 12, 2023 at 6:46?AM Peter Humphrey <peter@prh.myzen.co.uk>
wrote:
>
> On Friday, 12 May 2023 00:08:03 BST Mark Knecht wrote:
> > On Thu, May 11, 2023 at 3:07?PM Peter Humphrey <peter@prh.myzen.co.uk>
> >
> > wrote:
> > > On Thursday, 11 May 2023 17:18:17 BST Mark Knecht wrote:
> > <SNIP>
> >
> > > > The ''problem' is this can easily hit 100% of the cores you have in
the
> > > > machine if not sensibly set. (You choose what's 'sensible')
> > >
> > > Once again, --load-average is being ignored. Why is it there? Surely,
it
> > > must be to mitigate the worst effects of that N*K, but it isn't doing
so.
> >
> > From your description, yeah, it's weird, but possibly it's managing it
over
> > (for instance) over much longer time frames or something like that.
> >
> > Or possibly it just doesn't work.
>
> That's it, I'm sure.
>
> > Or possibly whoever wrote the man page misunderstood.
>
> Load-average has been around for a long time.
>
> > Poking around a bit this morning I took the path at the bottom of the
> > link I gave you to the Portage niceness page. It says scheduling policy
> > control started with portage-3.0.35 which on paper sounds sort of
recent.
> > Possibly a bug crept in, but I was curious as to what you have for
> > PORTAGE_SCHEDULING_POLICY, if any, and whether you need to enable some
> > sort of scheduling to get this under control?
> >
> > https://wiki.gentoo.org/wiki/Portage_niceness
>
> I have no PORTAGE_SCHEDULING_POLICY, or not that I can find. It seems to
me
> that such a policy is to do with the running of portage in the OS, rather
than
> how it launches its own emerge jobs. Is that right?
>
> > Anyway, I feel for ya.

Peter,
My point about PORTAGE_SCHEDULING_POLICY is that it *might* have
an effect on what you're seeing, not that it *would* have an effect.

If there's one thing I most distrust about the Open Source world, it's
documentation...

WRT the floating point issue, the Gentoo Catalyst page

https://wiki.gentoo.org/wiki/Catalyst

in the "Jobs and load average" section states:

<QUOTE>
FILE /etc/catalyst/catalyst.confcatalyst.conf

# Integral value passed to emerge as the parameter to --jobs and is used to
# define MAKEOPTS during the target build.
jobs = 4

# Floating-point value passed to emerge as the parameter to --load-average
and
# is used to define MAKEOPTS during the target build.
# load-average = 4.0

</QUOTE>

so once again, the use of floating point is documented as (you choose)
either
important or required.

My opinion: load-average probably works, but we are misunderstanding
the documentation. Note that the example using 4.0 is a pretty load number.

Good luck,
Mark
Re: Portage load control [ In reply to ]
On Friday, 12 May 2023 15:06:21 BST Michael Cook wrote:

> You can read /usr/share/portage/config/make.conf.example for an
> explanation. All children processes will use that. I can run portage and
> play games on the same system with my settings.

That example says nothing about any of the emerge default options. I have no
problem with system responsiveness (except, recently, running BOINC).

--
Regards,
Peter.
Re: Portage load control [ In reply to ]
On Friday, 12 May 2023 15:13:08 BST Mark Knecht wrote:

> My opinion: load-average probably works, but we are misunderstanding
> the documentation.

That's what bothers me the most - that I have a mental block somewhere. :(

--
Regards,
Peter.
Re: Portage load control [ In reply to ]
On Fri, May 12, 2023 at 7:27?AM Peter Humphrey <peter@prh.myzen.co.uk>
wrote:
>
> On Friday, 12 May 2023 15:13:08 BST Mark Knecht wrote:
>
> > My opinion: load-average probably works, but we are misunderstanding
> > the documentation.
>
> That's what bothers me the most - that I have a mental block somewhere.
:(
>
> --
> Regards,
> Peter.

Just for clarity, how are you measuring 'load average'? Just looking at what
is reported in top or something else that takes stats?

So if it's either a documentation issue, or an understanding the
documentation
issue, possibly set up a 'design of experiments' set of tests? For
instance:

1) Pick 1 semi-large package that spawns a few extra jobs to get built
2) Remove the binaries from your system
3) Ensure all the source is prefetched

4) Build the package with no options measuring load-average

Repeat 2 - 4 using a few different options:

-j 1
-j1 --load-average=40
-j1 --load-aveeage=40.0
-j1 --load-average=4.0
-j1 --load-average=0.4
-j10 --load-average=0.4

etc., and see what happens?
Re: Portage load control [ In reply to ]
On 2023.05.12 11:27, Mark Knecht wrote:
> On Fri, May 12, 2023 at 7:27?AM Peter Humphrey <peter@prh.myzen.co.uk>
> wrote:
> >
> > On Friday, 12 May 2023 15:13:08 BST Mark Knecht wrote:
> >
> > > My opinion: load-average probably works, but we are
> misunderstanding
> > > the documentation.
> >
> > That's what bothers me the most - that I have a mental block
> somewhere.
> :(
> >
> > --
> > Regards,
> > Peter.
>
> Just for clarity, how are you measuring 'load average'? Just looking
> at what
> is reported in top or something else that takes stats?
>
> So if it's either a documentation issue, or an understanding the
> documentation
> issue, possibly set up a 'design of experiments' set of tests? For
> instance:
>
> 1) Pick 1 semi-large package that spawns a few extra jobs to get built
> 2) Remove the binaries from your system
> 3) Ensure all the source is prefetched
>
> 4) Build the package with no options measuring load-average
>
> Repeat 2 - 4 using a few different options:
>
> -j 1
> -j1 --load-average=40
> -j1 --load-aveeage=40.0
> -j1 --load-average=4.0
> -j1 --load-average=0.4
> -j10 --load-average=0.4
>
> etc., and see what happens?
--load-average controls whether or not emerge starts another
job/package, so testing by emerging a single package will not actually
test this. That's why I suggested running some application to get the
load up to 10 (arbitrary number) and then emerging a larger number of
small packages. If --load-average is set to anything less than the
actual load, it should only launch one package at a time. Having that
simple example to add to the bug would give the developers an easy way
to test.

I think the fact that Peter's actual load went over 70 is because each
individual job/package had no limit on the number of parallel compiles
make could kick off. There is likely no bug there. The real problem
(as Peter keeps pointing out) is that with the load that high, emerge
still starts additional jobs.
Re: Portage load control [ In reply to ]
On Fri, May 12, 2023 at 9:08?AM Jack <ostroffjh@users.sourceforge.net>
wrote:
>
> > -j 1
> > -j1 --load-average=40
> > -j1 --load-aveeage=40.0
> > -j1 --load-average=4.0
> > -j1 --load-average=0.4
> > -j10 --load-average=0.4
> >
> > etc., and see what happens?
> --load-average controls whether or not emerge starts another
> job/package, so testing by emerging a single package will not actually
> test this. That's why I suggested running some application to get the
> load up to 10 (arbitrary number) and then emerging a larger number of
> small packages. If --load-average is set to anything less than the
> actual load, it should only launch one package at a time. Having that
> simple example to add to the bug would give the developers an easy way
> to test.
>
> I think the fact that Peter's actual load went over 70 is because each
> individual job/package had no limit on the number of parallel compiles
> make could kick off. There is likely no bug there. The real problem
> (as Peter keeps pointing out) is that with the load that high, emerge
> still starts additional jobs.

Jack,
I totally agree, as long as nothing is broken, but yeah, the list I
provided was more meant to engender ideas for Peter.

One interesting point is that the first Gentoo page I found to
look at the emerge man page shows LOAD as the value provided
to the --load-average option, but nowhere does it specify anything
other than it's a floating point value:

https://dev.gentoo.org/~zmedico/portage/doc/man/emerge.1.html

For clarification reading other sites, my understanding is that a
load average value of 1 in the top application is meant to
represent 1 CPU core operating at 100%. Assuming that's
true, then on Peter's 24 core machine, with LOAD=40, he's
telling emerge it's ok to use more cores than his machine has.

Is that consistent with your (or others) understanding?

I think the mistake is one of those easy to make ones where
the human things 40% (hence 40) and the machine things
40% (hence 0.4)

Cheers,
Mark
Re: Portage load control [ In reply to ]
On 2023.05.12 12:23, Mark Knecht wrote:
[snip .....]
> One interesting point is that the first Gentoo page I found to
> look at the emerge man page shows LOAD as the value provided
> to the --load-average option, but nowhere does it specify anything
> other than it's a floating point value:
I suspect the specification of floating point implies that it CAN take
digits after the decimal point, but not that they are required,
although that should be easy enough to test.
>
> https://dev.gentoo.org/~zmedico/portage/doc/man/emerge.1.html
>
> For clarification reading other sites, my understanding is that a
> load average value of 1 in the top application is meant to
> represent 1 CPU core operating at 100%. Assuming that's
> true, then on Peter's 24 core machine, with LOAD=40, he's
> telling emerge it's ok to use more cores than his machine has.
>
> Is that consistent with your (or others) understanding?
Close, but not quite. (See
https://en.wikipedia.org/wiki/Load_(computing) for more details.) I
think your understanding will match any observations, but I see the
definition as different. I understand the load (instantaneous, not
average) is the number of processed in the "r" state, i.e., running or
waiting for a CPU slice. That excludes any process explicitly sleeping
or waiting for IO. Since it can change so quickly, the point load is
not very useful, so it is more commonly presented as a value averaged
over a period of time. Top shows 1, 5, and 15 minute averages.

Again, --load-average tells emerge whether it can start a new
job/package, but has no control over how high the load will get based
on the already started jobs. If emerge starts new jobs when the load
is over that specified by --load-average, that does smell like a bug in
emerge.

>
> I think the mistake is one of those easy to make ones where
> the human things 40% (hence 40) and the machine things
> 40% (hence 0.4)
>
> Cheers,
> Mark
>
Re: Portage load control [ In reply to ]
On Friday, 12 May 2023 17:58:46 BST Jack wrote:

> Again, --load-average tells emerge whether it can start a new
> job/package, but has no control over how high the load will get based
> on the already started jobs. If emerge starts new jobs when the load
> is over that specified by --load-average, that does smell like a bug in
> emerge.

Hooray!

:)

--
Regards,
Peter.
Re: Portage load control [ In reply to ]
On Fri, May 12, 2023 at 9:59?AM Jack <ostroffjh@users.sourceforge.net>
wrote:
>
> On 2023.05.12 12:23, Mark Knecht wrote:
> [snip .....]
> > One interesting point is that the first Gentoo page I found to
> > look at the emerge man page shows LOAD as the value provided
> > to the --load-average option, but nowhere does it specify anything
> > other than it's a floating point value:
> I suspect the specification of floating point implies that it CAN take
> digits after the decimal point, but not that they are required,
> although that should be easy enough to test.
> >
> > https://dev.gentoo.org/~zmedico/portage/doc/man/emerge.1.html
> >
> > For clarification reading other sites, my understanding is that a
> > load average value of 1 in the top application is meant to
> > represent 1 CPU core operating at 100%. Assuming that's
> > true, then on Peter's 24 core machine, with LOAD=40, he's
> > telling emerge it's ok to use more cores than his machine has.
> >
> > Is that consistent with your (or others) understanding?
> Close, but not quite. (See
> https://en.wikipedia.org/wiki/Load_(computing) for more details.) I
> think your understanding will match any observations, but I see the
> definition as different. I understand the load (instantaneous, not
> average) is the number of processed in the "r" state, i.e., running or
> waiting for a CPU slice. That excludes any process explicitly sleeping
> or waiting for IO. Since it can change so quickly, the point load is
> not very useful, so it is more commonly presented as a value averaged
> over a period of time. Top shows 1, 5, and 15 minute averages.
>
> Again, --load-average tells emerge whether it can start a new
> job/package, but has no control over how high the load will get based
> on the already started jobs. If emerge starts new jobs when the load
> is over that specified by --load-average, that does smell like a bug in
> emerge.
>
> >
> > I think the mistake is one of those easy to make ones where
> > the human things 40% (hence 40) and the machine things
> > 40% (hence 0.4)
> >
> > Cheers,
> > Mark
> >

OK, I find that all reasonable. One point about the Wikipedia
description for anyone following who may not actually read it
is that the average is accomplished with an exponential moving
average and therefore is not, by definition, linear over time.

As a little experiment that anyone can run I'll include a little
AI generated batch file people can use to actually
see more of what's going on in top, htop and btop. Note
on my system the CPU affinity didn't work and I don't care
to debug it. However this loops continuously until you hit ctrl-C.

If you watch CPU load you'll
see it climb quickly at first and then more slowly until
you get up to 1.0. It will go a little higher (1.03 in my case)
which is likely the CPU load from the programs monitoring
the system and other background junk.

None the less, 1 core running continuously generates
as load of 1 after some period of time.

As with all code on the Internet I take no responsibility
for any damage caused my this code and neither
does Google's Bard.

#!/bin/bash

# This batch program loops until you hit Ctrl-C.

# Get the current processor affinity.
affinity=$(cat /proc/self/cpuset)

# Set the processor affinity to a single core.
echo $affinity | sudo tee /proc/self/cpuset

while true; do

# Do nothing.
:

done

# Reset the processor affinity to the default.
echo "" | sudo tee /proc/self/cpuset
Re: Portage load control [ In reply to ]
On Fri, May 12, 2023 at 10:42?AM Peter Humphrey <peter@prh.myzen.co.uk>
wrote:
>
> On Friday, 12 May 2023 17:58:46 BST Jack wrote:
>
> > Again, --load-average tells emerge whether it can start a new
> > job/package, but has no control over how high the load will get based
> > on the already started jobs. If emerge starts new jobs when the load
> > is over that specified by --load-average, that does smell like a bug in
> > emerge.
>
> Hooray!
>

Peter,
I agree with Jack's response, but the keyword & potential issue is
all based around that one word - "If". The way I see this is unless
you have tracked down realtime what processes are running and
where the CPU usage is going, and can further be sure that it's a
process emerge itself started, then we don't really know what is
causing the problem. My concern is what happens if emerge is
honoring --load-average but you're seeing system usage created
by some tool emerge called that doesn't understand --jobs and
emerge doesn't know about at that level? Think some Rust code
getting built by a rust compiler, or some deep make system.

Anyway, I had a couple of thoughts:

1) If it's really a bug then as others have said report it up the
chain and hope for a fix.

2) If I wanted to solve the problem today(ish) then I'd build
a Gentoo VM in Virtualbox, dedicate some number of cores
to it, build everything with binary packages and probably
run an NFS server in the VM which I mount in the host
machine. I then update the host machine from the binary
packages and Virtualbox manages to never use more cores
than I give it. That fix is more or less guaranteed to work.

3) As a question for the far more knowledgeable system
folks I'd ask "Can this problem be solved by cgroups?" If
I have a cgroup with 10 processors in it, can I start emerge
in the host environment and then just transfer the emerge
process ID to a cgroup that I've set up for this purpose?
Isn't that what cgroups is supposed to be used for?

Anyway, just thoughts.

Good luck,
Mark
Re: Portage load control [ In reply to ]
On Saturday, 13 May 2023 00:53:49 BST Mark Knecht wrote:

> Anyway, I had a couple of thoughts:
>
> 1) If it's really a bug then as others have said report it up the
> chain and hope for a fix.

https://bugs.gentoo.org/905933

> 2) If I wanted to solve the problem today(ish) then I'd build
> a Gentoo VM in Virtualbox, dedicate some number of cores
> to it, build everything with binary packages and probably
> run an NFS server in the VM which I mount in the host
> machine. I then update the host machine from the binary
> packages and Virtualbox manages to never use more cores
> than I give it. That fix is more or less guaranteed to work.

Sounds like a lot of work. :(

> 3) As a question for the far more knowledgeable system
> folks I'd ask "Can this problem be solved by cgroups?" If
> I have a cgroup with 10 processors in it, can I start emerge
> in the host environment and then just transfer the emerge
> process ID to a cgroup that I've set up for this purpose?
> Isn't that what cgroups is supposed to be used for?

Interesting idea, that.

> Anyway, just thoughts.

All grist to the mill...

--
Regards,
Peter.
Re: Portage load control [ In reply to ]
On 5/12/23 20:08, Peter Humphrey wrote:
> On Saturday, 13 May 2023 00:53:49 BST Mark Knecht wrote:
>
>> Anyway, I had a couple of thoughts:
>>
>> 1) If it's really a bug then as others have said report it up the
>> chain and hope for a fix.
> https://bugs.gentoo.org/905933
>
>> 2) If I wanted to solve the problem today(ish) then I'd build
>> a Gentoo VM in Virtualbox, dedicate some number of cores
>> to it, build everything with binary packages and probably
>> run an NFS server in the VM which I mount in the host
>> machine. I then update the host machine from the binary
>> packages and Virtualbox manages to never use more cores
>> than I give it. That fix is more or less guaranteed to work.
> Sounds like a lot of work. :(
A new thought on an easier test.  With -j any higher than 1, doesn't
emerge put out a fairly constant stream of how many out of how many jobs
are complete, how many are currently running, and the load average?  If
it launches new jobs when it's own display of load average is above what
you set, that should be pretty compelling to the developers.
>> 3) As a question for the far more knowledgeable system
>> folks I'd ask "Can this problem be solved by cgroups?" If
>> I have a cgroup with 10 processors in it, can I start emerge
>> in the host environment and then just transfer the emerge
>> process ID to a cgroup that I've set up for this purpose?
>> Isn't that what cgroups is supposed to be used for?
> Interesting idea, that.
>
>> Anyway, just thoughts.
> All grist to the mill...
>

1 2  View All