Mailing List Archive: Controlling emerges

Controlling emerges

Sep 18, 2023, 5:00 AM

Post #1 of 25 (412 views)

Hello list,

We've had a few discussions here on how to balance the parameters to emerge
to make the most of the resources available. Here's another idea:

One the one hand, big jobs should be able to use the maximum CPU
performance and RAM capacity, but on the other we don't want to flood the
system.

Therefore, I think it would be useful to be able to specify in env and
package.env that a job should be run on its own - if any other emerge jobs are
scheduled, wait until they're finished. Combine that with a specific MAKEOPTS,
and we'd have a more flexible deployment of resouces.

Is this feasible? What have I not thought of?

--
Regards,
Peter.

Re: Controlling emerges [ In reply to ]

ostroffjh at users

Sep 18, 2023, 5:59 AM

Post #2 of 25 (412 views)

Permalink

On 9/18/23 08:00, Peter Humphrey wrote:
> Hello list,
>
> We've had a few discussions here on how to balance the parameters to emerge
> to make the most of the resources available. Here's another idea:
>
> One the one hand, big jobs should be able to use the maximum CPU
> performance and RAM capacity, but on the other we don't want to flood the
> system.
>
> Therefore, I think it would be useful to be able to specify in env and
> package.env that a job should be run on its own - if any other emerge jobs are
> scheduled, wait until they're finished. Combine that with a specific MAKEOPTS,
> and we'd have a more flexible deployment of resouces.
>
> Is this feasible? What have I not thought of?

I've had exactly the same thought for some time now. My guess is that
it is theoretically possible to add some USE flag or ENV var for portage
to recognize, but I don't know the portage internals well enough to
guess how much effort it would be. Given that portage orders ebuilds in
a single emerge session based on some dependency graph, that seems like
a good place to put the necessary hooks.

As a starting point, one option might be to create a special/magic
ebuild and make it a dependency of those jobs that need to be run alone,
and have something about it that won't run if anything else is still
running. But, I don't know if those pre-checks (such as checking for
enough RAM and/or disk space) can be run at build time and not just at
portage startup time. The other possible problem with that approach
would be to be sure that ebuild gets run separately for each other
ebuild that depends on it - not all of them depending on it being run
once.Also, those blocking ebuilds have work so that if several of them
are queued (and running their "wait for everything else to finish"
scripts - exactly one of them needs to start. I don't know if those
pre-check scripts count as running before or within the ebuild itself.

Re: Controlling emerges [ In reply to ]

peter at prh

Sep 18, 2023, 6:44 AM

Post #3 of 25 (412 views)

Permalink

On Monday, 18 September 2023 13:59:03 BST Jack wrote:
> On 9/18/23 08:00, Peter Humphrey wrote:
> > Hello list,
> >
> > We've had a few discussions here on how to balance the parameters to
> > emerge
> > to make the most of the resources available. Here's another idea:
> >
> > One the one hand, big jobs should be able to use the maximum CPU
> > performance and RAM capacity, but on the other we don't want to flood the
> > system.
> >
> > Therefore, I think it would be useful to be able to specify in env and
> > package.env that a job should be run on its own - if any other emerge jobs
> > are scheduled, wait until they're finished. Combine that with a specific
> > MAKEOPTS, and we'd have a more flexible deployment of resouces.
> >
> > Is this feasible? What have I not thought of?
>
> I've had exactly the same thought for some time now. My guess is that
> it is theoretically possible to add some USE flag or ENV var for portage
> to recognize, but I don't know the portage internals well enough to
> guess how much effort it would be. Given that portage orders ebuilds in
> a single emerge session based on some dependency graph, that seems like
> a good place to put the necessary hooks.
>
> As a starting point, one option might be to create a special/magic
> ebuild and make it a dependency of those jobs that need to be run alone,
> and have something about it that won't run if anything else is still
> running. But, I don't know if those pre-checks (such as checking for
> enough RAM and/or disk space) can be run at build time and not just at
> portage startup time. The other possible problem with that approach
> would be to be sure that ebuild gets run separately for each other
> ebuild that depends on it - not all of them depending on it being run
> once.Also, those blocking ebuilds have work so that if several of them
> are queued (and running their "wait for everything else to finish"
> scripts - exactly one of them needs to start. I don't know if those
> pre-check scripts count as running before or within the ebuild itself.

It may be less complex than you think, Jack. I envisage a package being marked
as solitary, and when portage reaches that package, it waits until all current
jobs have finished, then it starts the solitary package with the environment
specified for it, and it doesn't start the next one until that one has finished.
The dependency calculation shouldn't need to be changed.

It seems simple the way I see it.

--
Regards,
Peter.

Re: Controlling emerges [ In reply to ]

alan.mckinnon at gmail

Sep 18, 2023, 6:48 AM

Post #4 of 25 (412 views)

Permalink

On Mon, Sep 18, 2023 at 3:44?PM Peter Humphrey <peter@prh.myzen.co.uk>
wrote:

>
>
> It may be less complex than you think, Jack. I envisage a package being
> marked
> as solitary, and when portage reaches that package, it waits until all
> current
> jobs have finished, then it starts the solitary package with the
> environment
> specified for it, and it doesn't start the next one until that one has
> finished.
> The dependency calculation shouldn't need to be changed.
>
> It seems simple the way I see it.
>

How does that improve emerge performance overall?

--
Alan McKinnon
alan dot mckinnon at gmail dot com

Re: Controlling emerges [ In reply to ]

peter at prh

Sep 18, 2023, 9:03 AM

Post #5 of 25 (412 views)

Permalink

On Monday, 18 September 2023 14:48:46 BST Alan McKinnon wrote:
> On Mon, Sep 18, 2023 at 3:44?PM Peter Humphrey <peter@prh.myzen.co.uk>
>
> wrote:
> > It may be less complex than you think, Jack. I envisage a package being
> > marked
> > as solitary, and when portage reaches that package, it waits until all
> > current
> > jobs have finished, then it starts the solitary package with the
> > environment
> > specified for it, and it doesn't start the next one until that one has
> > finished.
> > The dependency calculation shouldn't need to be changed.
> >
> > It seems simple the way I see it.
>
> How does that improve emerge performance overall?

By allocating all the system resources to huge packages while not flooding the
system with lesser ones. For example, I can set -j20 for webkit-gtk today
without overflowing the 64GB RAM, and still have 4 CPU threads available to
other tasks. The change I've proposed should make the whole operation more
efficient overall and take less time.

As things stand today, I have to make do with -j12 or so, wasting time and
resources. I have load-average set at 32, so if I were to set -j20 generally
I'd run out of RAM in no time. I've had many instances of packages failing to
compile in a large update, but going just fine on their own; and I've had
mysterious operational errors resulting, I suspect, from otherwise undetected
miscompilation.

Previous threads have more detail of what I've tried already.

--
Regards,
Peter.

Re: Controlling emerges [ In reply to ]

alan.mckinnon at gmail

Sep 18, 2023, 9:13 AM

Post #6 of 25 (412 views)

Permalink

On Mon, Sep 18, 2023 at 6:03?PM Peter Humphrey <peter@prh.myzen.co.uk>
wrote:

> On Monday, 18 September 2023 14:48:46 BST Alan McKinnon wrote:
> > On Mon, Sep 18, 2023 at 3:44?PM Peter Humphrey <peter@prh.myzen.co.uk>
> >
> > wrote:
> > > It may be less complex than you think, Jack. I envisage a package being
> > > marked
> > > as solitary, and when portage reaches that package, it waits until all
> > > current
> > > jobs have finished, then it starts the solitary package with the
> > > environment
> > > specified for it, and it doesn't start the next one until that one has
> > > finished.
> > > The dependency calculation shouldn't need to be changed.
> > >
> > > It seems simple the way I see it.
> >
> > How does that improve emerge performance overall?
>
> By allocating all the system resources to huge packages while not flooding
> the
> system with lesser ones. For example, I can set -j20 for webkit-gtk today
> without overflowing the 64GB RAM, and still have 4 CPU threads available
> to
> other tasks. The change I've proposed should make the whole operation more
> efficient overall and take less time.
>
> As things stand today, I have to make do with -j12 or so, wasting time and
> resources. I have load-average set at 32, so if I were to set -j20
> generally
> I'd run out of RAM in no time. I've had many instances of packages failing
> to
> compile in a large update, but going just fine on their own; and I've had
> mysterious operational errors resulting, I suspect, from otherwise
> undetected
> miscompilation.
>
> Previous threads have more detail of what I've tried already.
>
>
> I did read all those but no matter how you move things around you still
have only X resources available all the time.
Whether you just let emerge do it's thing or try get it to do big packages
on their own, everything is still going to use the same number of cpu
cycles overall and you will save nothing.

If webkit-gtk is the only big package, have you considered:

emerge -1v webkit-gtk && emerge -avuND @world?

What you have is not a portage problem. It is a orthodox parallelism
problem, and I think you are thinking your constraint is unique in the work
- it isn't.
With parallelism, trying to fiddle single nodes to improve things overall
never really works out.

Just my $0.02

Alan

--
Alan McKinnon
alan dot mckinnon at gmail dot com

RE: Controlling emerges [ In reply to ]

lperkins at openeye

Sep 18, 2023, 10:54 AM

Post #7 of 25 (412 views)

Permalink

> From: Alan McKinnon alan.mckinnon@gmail.com<mailto:alan.mckinnon@gmail.com>
> Sent: Monday, September 18, 2023 9:13 AM
> To: gentoo-user@lists.gentoo.org<mailto:gentoo-user@lists.gentoo.org>
> Subject: Re: [gentoo-user] Controlling emerges
>
>
>
> On Mon, Sep 18, 2023 at 6:03?PM Peter Humphrey peter@prh.myzen.co.uk<mailto:peter@prh.myzen.co.uk> wrote:
> On Monday, 18 September 2023 14:48:46 BST Alan McKinnon wrote:
> > On Mon, Sep 18, 2023 at 3:44?PM Peter Humphrey peter@prh.myzen.co.uk<mailto:peter@prh.myzen.co.uk>
> >
> > wrote:
> > > It may be less complex than you think, Jack. I envisage a package being
> > > marked
> > > as solitary, and when portage reaches that package, it waits until all
> > > current
> > > jobs have finished, then it starts the solitary package with the
> > > environment
> > > specified for it, and it doesn't start the next one until that one has
> > > finished.
> > > The dependency calculation shouldn't need to be changed.
> > >
> > > It seems simple the way I see it.
> >
> > How does that improve emerge performance overall?
>
> By allocating all the system resources to huge packages while not flooding the
> system with lesser ones. For example, I can set -j20 for webkit-gtk today
> without overflowing the 64GB RAM, and still have 4 CPU threads available to
> other tasks. The change I've proposed should make the whole operation more
> efficient overall and take less time.
>
> As things stand today, I have to make do with -j12 or so, wasting time and
> resources. I have load-average set at 32, so if I were to set -j20 generally
> I'd run out of RAM in no time. I've had many instances of packages failing to
> compile in a large update, but going just fine on their own; and I've had
> mysterious operational errors resulting, I suspect, from otherwise undetected
> miscompilation.
>
> Previous threads have more detail of what I've tried already.
>
> I did read all those but no matter how you move things around you still have only X resources available all the time.
> Whether you just let emerge do it's thing or try get it to do big packages on their own, everything is still going to use the same number of cpu cycles overall and you will save nothing.
>
> If webkit-gtk is the only big package, have you considered:
>
> emerge -1v webkit-gtk && emerge -avuND @world?
>
>
> What you have is not a portage problem. It is a orthodox parallelism problem, and I think you are thinking your constraint is unique in the work - it isn't.
> With parallelism, trying to fiddle single nodes to improve things overall never really works out.
>
> Just my $0.02
>
>
> Alan
>
> --
> Alan McKinnon
> alan dot mckinnon at gmail dot com
>

Note that on my systems I just make heavy use of the various load-average limiting options and as long as two of the big packages don't start within seconds of each other it does a pretty good job of letting them run by themselves.

If things do get in a snarl, you can always use kill -18/19 to suspend a few compile jobs until the system stops thrashing and resume them as capacity permits.

LMP

Re: Controlling emerges [ In reply to ]

confabulate at kintzios

Sep 18, 2023, 10:59 AM

Post #8 of 25 (412 views)

Permalink

On Monday, 18 September 2023 17:13:04 BST Alan McKinnon wrote:
> On Mon, Sep 18, 2023 at 6:03?PM Peter Humphrey <peter@prh.myzen.co.uk>
>
> wrote:
> > On Monday, 18 September 2023 14:48:46 BST Alan McKinnon wrote:
> > > On Mon, Sep 18, 2023 at 3:44?PM Peter Humphrey <peter@prh.myzen.co.uk>
> > >
> > > wrote:
> > > > It may be less complex than you think, Jack. I envisage a package
> > > > being
> > > > marked
> > > > as solitary, and when portage reaches that package, it waits until all
> > > > current
> > > > jobs have finished, then it starts the solitary package with the
> > > > environment
> > > > specified for it, and it doesn't start the next one until that one has
> > > > finished.
> > > > The dependency calculation shouldn't need to be changed.
> > > >
> > > > It seems simple the way I see it.
> > >
> > > How does that improve emerge performance overall?
> >
> > By allocating all the system resources to huge packages while not flooding
> > the
> > system with lesser ones. For example, I can set -j20 for webkit-gtk today
> > without overflowing the 64GB RAM, and still have 4 CPU threads available
> > to
> > other tasks. The change I've proposed should make the whole operation more
> > efficient overall and take less time.
> >
> > As things stand today, I have to make do with -j12 or so, wasting time and
> > resources. I have load-average set at 32, so if I were to set -j20
> > generally
> > I'd run out of RAM in no time. I've had many instances of packages failing
> > to
> > compile in a large update, but going just fine on their own; and I've had
> > mysterious operational errors resulting, I suspect, from otherwise
> > undetected
> > miscompilation.
> >
> > Previous threads have more detail of what I've tried already.
> >
> >
> > I did read all those but no matter how you move things around you still
>
> have only X resources available all the time.
> Whether you just let emerge do it's thing or try get it to do big packages
> on their own, everything is still going to use the same number of cpu
> cycles overall and you will save nothing.
>
> If webkit-gtk is the only big package, have you considered:
>
> emerge -1v webkit-gtk && emerge -avuND @world?
>
>
> What you have is not a portage problem. It is a orthodox parallelism
> problem, and I think you are thinking your constraint is unique in the work
> - it isn't.
> With parallelism, trying to fiddle single nodes to improve things overall
> never really works out.
>
> Just my $0.02
>
>
> Alan

I think there is a level of complexity involved which will make (m)any
attempts on optimisation difficult, because EMERGE_DEFAULT_OPTS competes for
resources against MAKEOPTS, resulting in a trade-off between their optimal
settings. Parallelisation becomes difficult to maximise on the basis of some
presets when not all updates have the same combination of small Vs large
packages, dependent packages queue up before dependencies are built, various
emerge stages are processed linearly, some versions of gcc may get hungrier
for RAM and whatever else I haven't accounted for.

Someone with a PhD on multivariate stochastic analysis could probably come up
with some nifty code to include in portage? ;-)

Re: Controlling emerges [ In reply to ]

john.blinka at gmail

Sep 18, 2023, 11:10 AM

Post #9 of 25 (412 views)

Permalink

On Mon, Sep 18, 2023 at 12:13 PM Alan McKinnon <alan.mckinnon@gmail.com>
wrote:

>
>
> If webkit-gtk is the only big package, have you considered:
>
> emerge -1v webkit-gtk && emerge -avuND @world?
>
>
> What you have is not a portage problem. It is a orthodox parallelism
> problem, and I think you are thinking your constraint is unique in the work
> - it isn't.
> With parallelism, trying to fiddle single nodes to improve things overall
> never really works out.
>
> Just my $0.02
>
>
> Alan
>

I use this idea, but it requires (for me) a more sophisticated
implementation. As is, it pulls in webkit-gtk-x.y.z and
webkit-gtk-x.y.z-r410 simultaneously - for my portage setup. I don’t have
the memory to handle both at the same time. It’s guaranteed to crash on my
system.

Instead, I do a preliminary emerge -p<etc>, saving the specific package
builds to a file. I then inspect the file to see what portage wants to do.
Too often, the file contains webkit-gtk-x.y.z and webkit-gtk-x.y.z-r410 in
sequence, usually preceded and followed by other packages. Portage always
wants to build both versions simultaneously - guaranteed crash for me.

Instead of invoking emerge, I write a little bash script to emerge the
preceding packages in parallel, followed by a serial webkit-gtk-x.y.z,
followed by a serial webkit-gtk-x.y.z-r410, and then finally all the
remaining packages. Four emerge invocations in sequence. The script builds
specific versions, ie, =net-libs/webkit-gtk-x.y.z, to ensure it builds only
1 package at a time. It’s trivial to write.

A problem arises when splitting up builds as you suggest. Emerge has its
own ideas about what it’s going to do - and in what sequence. When you try
to impose a build order not of its making, emerge will often do something
unintuitive and frustrating to you. I’ve learned to respect its sequencing.
This technique keeps portage happy and predictable by using its sequencing.
It gives me reliable overnight unattended upgrades.

John Blinka

>

Re: Controlling emerges [ In reply to ]

rdalek1967 at gmail

Sep 18, 2023, 11:18 AM

Post #10 of 25 (412 views)

Permalink

Alan McKinnon wrote:
>
>
> On Mon, Sep 18, 2023 at 6:03?PM Peter Humphrey <peter@prh.myzen.co.uk
> <mailto:peter@prh.myzen.co.uk>> wrote:
>
> On Monday, 18 September 2023 14:48:46 BST Alan McKinnon wrote:
> > On Mon, Sep 18, 2023 at 3:44?PM Peter Humphrey
> <peter@prh.myzen.co.uk <mailto:peter@prh.myzen.co.uk>>
> >
> > wrote:
> > > It may be less complex than you think, Jack. I envisage a
> package being
> > > marked
> > > as solitary, and when portage reaches that package, it waits
> until all
> > > current
> > > jobs have finished, then it starts the solitary package with the
> > > environment
> > > specified for it, and it doesn't start the next one until that
> one has
> > > finished.
> > > The dependency calculation shouldn't need to be changed.
> > >
> > > It seems simple the way I see it.
> >
> > How does that improve emerge performance overall?
>
> By allocating all the system resources to huge packages while not
> flooding the
> system with lesser ones. For example, I can set -j20 for
> webkit-gtk today
> without overflowing the 64GB RAM, and still have 4 CPU threads
> available to
> other tasks. The change I've proposed should make the whole
> operation more
> efficient overall and take less time.
>
> As things stand today, I have to make do with -j12 or so, wasting
> time and
> resources. I have load-average set at 32, so if I were to set -j20
> generally
> I'd run out of RAM in no time. I've had many instances of packages
> failing to
> compile in a large update, but going just fine on their own; and
> I've had
> mysterious operational errors resulting, I suspect, from otherwise
> undetected
> miscompilation.
>
> Previous threads have more detail of what I've tried already.
>
>
> I did read all those but no matter how you move things around you
> still have only X resources available all the time.
> Whether you just let emerge do it's thing or try get it to do big
> packages on their own, everything is still going to use the same
> number of cpu cycles overall and you will save nothing.
>
> If webkit-gtk is the only big package, have you considered:
>
> emerge -1v webkit-gtk && emerge -avuND @world?
>
>
> What you have is not a portage problem. It is a orthodox parallelism
> problem, and I think you are thinking your constraint is unique in the
> work - it isn't.
> With parallelism, trying to fiddle single nodes to improve things
> overall never really works out.
>
> Just my $0.02
>
>
> Alan
>
> --
> Alan McKinnon
> alan dot mckinnon at gmail dot com

I have to admit, I wish I could tell emerge to compile certain packages
on their own as well. LOo, that qtweb package and a few others.
Sometimes they end up naturally compiling on their own but sometimes, I
end up with LOo, Seamonkey or Firefox, or that qtweb package trying to
compile at the same time in some combination. Sometimes, all four hit
at once. It's bad enough when it is just two of them but when they all
hit, it causes problems. It would be nice if we could set up a list
that tells emerge to emerge only one at a time just like we tell it not
to use tmpfs for certain builds.

While just emerging them first might work, it also limits emerge to just
doing that package instead of the whole update. It also could have
dependencies that also want a lot of resources. I don't know about most
people but I run my updates while I sleep. Having the option to set
that up would be nice. It's not like packages are getting any smaller
either. This is a growing problem.

I have no idea how to do this but I do like the idea.

Dale

:-) :-)

Re: Controlling emerges [ In reply to ]

rich0 at gentoo

Sep 18, 2023, 11:49 AM

Post #11 of 25 (412 views)

Permalink

On Mon, Sep 18, 2023 at 12:13?PM Alan McKinnon <alan.mckinnon@gmail.com> wrote:
>
> Whether you just let emerge do it's thing or try get it to do big packages on their own, everything is still going to use the same number of cpu cycles overall and you will save nothing.

That is true of CPU, but not RAM. The problem with large parallel
builds is that for 95% of packages they're fine, and for a few
packages they'll eat up all the RAM in the system until the OOM killer
kicks in, or the system just goes into a swap storm (which can cause
panics with some less-than-perfect kernel drivers).

I'm not aware of any simple solutions. I do have some packages set to
just build with a small number of jobs, but that won't prevent other
packages from being built alongside them. Usually that is enough
though. It is just frustrating to watch a package take all day to
build because I can't use more than -j2 or so without running out of
RAM, usually just at one step of the build process.

I can't see anybody bothering with this, but in theory packages could
have a variable to hint at the max RAM consumed per job, and the max
number of jobs it will run. Then the package manager could take the
lesser of -j and the max jobs the package can run, multiply it by the
RAM requirement, and compare that to available memory (or have a
setting to limit max RAM). Basically treat RAM as a resource and let
the package manager reduce -j to manage it if necessary.

Hmm, I guess a workaround would be to set ulimits on the portage user
so that emerge is killed before RAM use gets too out of hand. That
won't help complete builds, but it would at least keep it from killing
the system.

--
Rich

Re: Controlling emerges [ In reply to ]

billk at iinet

Sep 18, 2023, 3:44 PM

Post #12 of 25 (400 views)

Permalink

per package env variables?

https://wiki.gentoo.org/wiki//etc/portage/package.env

BillK

Re: Controlling emerges [ In reply to ]

peter at prh

Sep 19, 2023, 2:09 AM

Post #13 of 25 (400 views)

Permalink

On Monday, 18 September 2023 23:44:50 BST William Kenworthy wrote:
> per package env variables?
>
> https://wiki.gentoo.org/wiki//etc/portage/package.env

Apropos of what?

--
Regards,
Peter.

Re: Controlling emerges [ In reply to ]

billk at iinet

Sep 19, 2023, 2:14 AM

Post #14 of 25 (400 views)

Permalink

That is where you set per package compiler parameters by overriding
make.conf settings.

BillK

On 19/9/23 17:09, Peter Humphrey wrote:
> On Monday, 18 September 2023 23:44:50 BST William Kenworthy wrote:
>> per package env variables?
>>
>> https://wiki.gentoo.org/wiki//etc/portage/package.env
> Apropos of what?
>

Re: Controlling emerges [ In reply to ]

finkandreas at web

Sep 19, 2023, 2:37 AM

Post #15 of 25 (400 views)

Permalink

On Tue, 19 Sep 2023 17:14:42 +0800
William Kenworthy <billk@iinet.net.au> wrote:

> That is where you set per package compiler parameters by overriding
> make.conf settings.
>
> BillK
>
>
I would argue, that per package compiler parameters is not what is
needed, because in the example of chromium 99% of the compile time can
be done with -j16 on my machine, but at a very short time I would need
to run with -j1, because I otherwise run out of memory otherwise.
In short: I want to run with as many jobs as I have cores, as long as
I do not run out of memory, and when I run out of memory I want to run
with as little jobs as possible until the pressure on the memory is
gone. Then I want to continue with as many jobs as possible.

And this is not something that make / ninja provide. They have a
concept of global number of jobs, which in this concept must be set to
the maximum number that your RAM can take at the very short period in
time where you have a high watermark on your RAM, but that number would
be at 99% of the compilation time way too low.

FWIW, I have a hacky solution that I use privately, but I never
published it anywhere, because it could break some builds, and at the
moment I'm not ready to support it.

Basically it tries to run with as many jobs as the number of CPU cores
at all times. It watches memory pressure in the background and
kills build jobs as soon as a high watermark is reached.
At this point, make would normally exit, because a build job failed.
However my hacky solution overrides the exec-family of system calls,
and if a job fails, it is being retried exclusively, i.e. no other
build job is allowed to run at the same time as the failed job.
It fails ultimately, when the second and exclusive run fails too.
This way, if the job failed only because of lack of memory, it will be
retried exclusively and succeeds. If it failed due to a programming
error, it will fail also the second time, and then the error is
forwarded to make.

Re: Controlling emerges [ In reply to ]

peter at prh

Sep 19, 2023, 2:48 AM

Post #16 of 25 (400 views)

Permalink

(I assume this was addressed to me, though it was a reply to someone else.)

On Tuesday, 19 September 2023 10:14:42 BST William Kenworthy wrote:
> That is where you set per package compiler parameters by overriding
> make.conf settings.

And which make.conf setting might achieve what I want? Careful reading of the
make.conf man page hasn't revealed anything relevant.

--
Regards,
Peter.

Re: Controlling emerges [ In reply to ]

rich0 at gentoo

Sep 19, 2023, 3:06 AM

Post #17 of 25 (400 views)

Permalink

On Tue, Sep 19, 2023 at 5:48?AM Peter Humphrey <peter@prh.myzen.co.uk> wrote:
>
> On Tuesday, 19 September 2023 10:14:42 BST William Kenworthy wrote:
> > That is where you set per package compiler parameters by overriding
> > make.conf settings.
>
> And which make.conf setting might achieve what I want? Careful reading of the
> make.conf man page hasn't revealed anything relevant.
>

There isn't one. At best there is -l which regulates jobs by system
load, but there is nothing that takes into account RAM use.

I just use package.env to limit jobs on packages that I know are RAM-hungry.

Right now my list includes:
calligra
qtwebengine
qtwebkit
ceph
nodejs
passwdqc
scipy
pandas
spidermonkey

(It has been ages since I've pruned the list, and of course what is
"too much RAM" will vary.)

The other thing I will tweak is avoiding building in a tmpfs.
Obviously anything that is RAM constrained is a good contender for not
using a tmpfs, but there are also packages that just have really large
build directories that otherwise don't need to much RAM when building.

--
Rich

Re: Controlling emerges [ In reply to ]

billk at iinet

Sep 19, 2023, 3:13 AM

Post #18 of 25 (400 views)

Permalink

MAKEOPTS - for example I have a laptop that locks up (heat) on long compiles so reduce the number of jobs (rust and webgtk). The discussion asks about how to control emerge - appropriate per package -j and -l for the heavy packages should go a long way to doing what you want.

On 19 September 2023 5:48:39 pm AWST, Peter Humphrey <peter@prh.myzen.co.uk> wrote:
>(I assume this was addressed to me, though it was a reply to someone else.)
>
>On Tuesday, 19 September 2023 10:14:42 BST William Kenworthy wrote:
>> That is where you set per package compiler parameters by overriding
>> make.conf settings.
>
>And which make.conf setting might achieve what I want? Careful reading of the
>make.conf man page hasn't revealed anything relevant.
>
>--
>Regards,
>Peter.
>
>
>
>
>
>

Re: Controlling emerges [ In reply to ]

peter at prh

Sep 19, 2023, 5:28 AM

Post #19 of 25 (400 views)

Permalink

On Monday, 18 September 2023 17:13:04 BST Alan McKinnon wrote:

> I did read all those but no matter how you move things around you still
> have only X resources available all the time.
> Whether you just let emerge do it's thing or try get it to do big packages
> on their own, everything is still going to use the same number of cpu
> cycles overall and you will save nothing.

That isn't the point. The point is that it takes twice as long, and it wastes
the machine's resources while I twiddle my thumbs waiting for it.

> If webkit-gtk is the only big package, have you considered:
>
> emerge -1v webkit-gtk && emerge -avuND @world?

Of course.

> What you have is not a portage problem. It is a orthodox parallelism
> problem, and I think you are thinking your constraint is unique in the work
> - it isn't.

No, I think my problem has not been tackled by the portage developers.

> With parallelism, trying to fiddle single nodes to improve things overall
> never really works out.

See above.

--
Regards,
Peter.

Re: Controlling emerges [ In reply to ]

antlists at youngman

Sep 20, 2023, 3:06 PM

Post #20 of 25 (400 views)

Permalink

On 18/09/2023 17:13, Alan McKinnon wrote:
>
>
> On Mon, Sep 18, 2023 at 6:03?PM Peter Humphrey <peter@prh.myzen.co.uk
> <mailto:peter@prh.myzen.co.uk>> wrote:
>
> On Monday, 18 September 2023 14:48:46 BST Alan McKinnon wrote:
> > On Mon, Sep 18, 2023 at 3:44?PM Peter Humphrey
> <peter@prh.myzen.co.uk <mailto:peter@prh.myzen.co.uk>>
> >
> > wrote:
> > > It may be less complex than you think, Jack. I envisage a
> package being
> > > marked
> > > as solitary, and when portage reaches that package, it waits
> until all
> > > current
> > > jobs have finished, then it starts the solitary package with the
> > > environment
> > > specified for it, and it doesn't start the next one until that
> one has
> > > finished.
> > > The dependency calculation shouldn't need to be changed.
> > >
> > > It seems simple the way I see it.
> >
> > How does that improve emerge performance overall?
>
> By allocating all the system resources to huge packages while not
> flooding the
> system with lesser ones. For example, I can set -j20 for webkit-gtk
> today
> without overflowing the 64GB RAM, and still have 4 CPU threads
> available to
> other tasks. The change I've proposed should make the whole
> operation more
> efficient overall and take less time.
>
> As things stand today, I have to make do with -j12 or so, wasting
> time and
> resources. I have load-average set at 32, so if I were to set -j20
> generally
> I'd run out of RAM in no time. I've had many instances of packages
> failing to
> compile in a large update, but going just fine on their own; and
> I've had
> mysterious operational errors resulting, I suspect, from otherwise
> undetected
> miscompilation.
>
> Previous threads have more detail of what I've tried already.
>
>
> I did read all those but no matter how you move things around you still
> have only X resources available all the time.
> Whether you just let emerge do it's thing or try get it to do big
> packages on their own, everything is still going to use the same number
> of cpu cycles overall and you will save nothing.

Except a big chunk off your power bill ... a system under stress uses
more energy for the same amount of work.
>
> If webkit-gtk is the only big package, have you considered:
>
> emerge -1v webkit-gtk && emerge -avuND @world?
>
>
> What you have is not a portage problem. It is a orthodox parallelism
> problem, and I think you are thinking your constraint is unique in the
> work - it isn't.
> With parallelism, trying to fiddle single nodes to improve things
> overall never really works out.
>
A big problem you are missing is that portage does not have control of
the system. It can control its usage of the system, but if I want emerge
to use as much SPARE resource IN THE BACKGROUND as it can without
impacting on on-line responsiveness, that is HARD.

I would like to be able to tell portage "these programs are resource
hogs, don't parallelise them". If portage has loads of little jobs, it
can fire them off one after the other as resource becomes available. If
it fires a hog (or worse, two) off at the same time, the system can
rapidly collapse under load.

Even better, if portage knew roughly how much resource each job
required, it could (within constraints) start with the jobs that
required least resource and run loads of them, and by firing jobs off in
order of increasing demandingness, the number of jobs running in
parallel would naturally tail off.

At the end of the day, if the computer takes an extra 20% time, I'm not
bothered. If I'm sat at the computer 20% time extra because the system
isn't responding because emerge has bogged it down, then I do care. And
when I'm building things like webkit-gtk, llvm, LO, FF and TB, they do
hammer my system. If they're running in parallel, my system would be
near unusable.

Cheers,
Wol

RE: Controlling emerges [ In reply to ]

lperkins at openeye

Sep 21, 2023, 12:26 PM

Post #21 of 25 (396 views)

Permalink

> -----Original Message-----
> From: Wol <antlists@youngman.org.uk>
> Sent: Wednesday, September 20, 2023 3:07 PM
> To: gentoo-user@lists.gentoo.org
> Subject: Re: [gentoo-user] Controlling emerges
>
> > What you have is not a portage problem. It is a orthodox parallelism
> > problem, and I think you are thinking your constraint is unique in the
> > work - it isn't.
> > With parallelism, trying to fiddle single nodes to improve things
> > overall never really works out.
> >
> A big problem you are missing is that portage does not have control of the system. It can control its usage of the system, but if I want emerge to use as much SPARE resource IN THE BACKGROUND as it can without impacting on on-line responsiveness, that is HARD.
>
> I would like to be able to tell portage "these programs are resource hogs, don't parallelise them". If portage has loads of little jobs, it can fire them off one after the other as resource becomes available. If it fires a hog (or worse, two) off at the same time, the system can rapidly collapse under load.
>
> Even better, if portage knew roughly how much resource each job required, it could (within constraints) start with the jobs that required least resource and run loads of them, and by firing jobs off in order of increasing demandingness, the number of jobs running in parallel would naturally tail off.
>
> At the end of the day, if the computer takes an extra 20% time, I'm not bothered. If I'm sat at the computer 20% time extra because the system isn't responding because emerge has bogged it down, then I do care. And when I'm building things like webkit-gtk, llvm, LO, FF and TB, they do hammer my system. If they're running in parallel, my system would be near unusable.
>
> Cheers,
> Wol

Maybe take a look at "cpulimit" out of the repos. I used to use it on one of my low-power systems to control how much load the various compilers were allowed to put on the system so that it could keep doing other tasks.

I think there are some other, similar tools as well.

LMP

Re: Controlling emerges [ In reply to ]

mcp_reznor at hotmail

Sep 21, 2023, 1:30 PM

Post #22 of 25 (396 views)

Permalink

So I feel I should add my own 2 cents to the pile....or possibly 25 cents due to inflation.

PORTAGE_IONICE_COMMAND="ionice -c 3 -p \${PID}"
PORTAGE_SCHEDULING_POLICY="idle"

Those 2 together in make.conf have had a noticeable effect on multitasking for me. I still wouldn't recommend allocating all of your cores to emerge, but emerging with idle priority keeps your tasks a little higher up in the mix.

________________________________________
From: Laurence Perkins <lperkins@openeye.net>
Sent: Thursday, September 21, 2023 3:26 PM
To: gentoo-user@lists.gentoo.org
Subject: RE: [gentoo-user] Controlling emerges

> -----Original Message-----
> From: Wol <antlists@youngman.org.uk>
> Sent: Wednesday, September 20, 2023 3:07 PM
> To: gentoo-user@lists.gentoo.org
> Subject: Re: [gentoo-user] Controlling emerges
>
> > What you have is not a portage problem. It is a orthodox parallelism
> > problem, and I think you are thinking your constraint is unique in the
> > work - it isn't.
> > With parallelism, trying to fiddle single nodes to improve things
> > overall never really works out.
> >
> A big problem you are missing is that portage does not have control of the system. It can control its usage of the system, but if I want emerge to use as much SPARE resource IN THE BACKGROUND as it can without impacting on on-line responsiveness, that is HARD.
>
> I would like to be able to tell portage "these programs are resource hogs, don't parallelise them". If portage has loads of little jobs, it can fire them off one after the other as resource becomes available. If it fires a hog (or worse, two) off at the same time, the system can rapidly collapse under load.
>
> Even better, if portage knew roughly how much resource each job required, it could (within constraints) start with the jobs that required least resource and run loads of them, and by firing jobs off in order of increasing demandingness, the number of jobs running in parallel would naturally tail off.
>
> At the end of the day, if the computer takes an extra 20% time, I'm not bothered. If I'm sat at the computer 20% time extra because the system isn't responding because emerge has bogged it down, then I do care. And when I'm building things like webkit-gtk, llvm, LO, FF and TB, they do hammer my system. If they're running in parallel, my system would be near unusable.
>
> Cheers,
> Wol

Maybe take a look at "cpulimit" out of the repos. I used to use it on one of my low-power systems to control how much load the various compilers were allowed to put on the system so that it could keep doing other tasks.

I think there are some other, similar tools as well.

LMP

Re: Controlling emerges [ In reply to ]

rdalek1967 at gmail

Sep 21, 2023, 6:13 PM

Post #23 of 25 (396 views)

Permalink

Tsukasa Mcp_Reznor wrote:
> So I feel I should add my own 2 cents to the pile....or possibly 25 cents due to inflation.
>
>
> PORTAGE_IONICE_COMMAND="ionice -c 3 -p \${PID}"
> PORTAGE_SCHEDULING_POLICY="idle"
>
> Those 2 together in make.conf have had a noticeable effect on multitasking for me. I still wouldn't recommend allocating all of your cores to emerge, but emerging with idle priority keeps your tasks a little higher up in the mix.
>
> ________________________________________
> From: Laurence Perkins <lperkins@openeye.net>
> Sent: Thursday, September 21, 2023 3:26 PM
> To: gentoo-user@lists.gentoo.org
> Subject: RE: [gentoo-user] Controlling emerges
>

I had the first one, little different for my rig, but I added the second
one just now. I'll be testing this tomorrow or Sunday, depending on
packages, maybe both. lol

Sometimes I wish they would announce when they add features. Rich, you
frequent this list. If you hear of something new, could you post it?
This may not be NEW but it is new to me. No idea when it got added.

Dale

:-) :-)

Re: Controlling emerges [ In reply to ]

confabulate at kintzios

Sep 22, 2023, 12:13 AM

Post #24 of 25 (396 views)

Permalink

On Friday, 22 September 2023 02:13:08 BST Dale wrote:
> Tsukasa Mcp_Reznor wrote:
> > So I feel I should add my own 2 cents to the pile....or possibly 25 cents
> > due to inflation.
> >
> >
> > PORTAGE_IONICE_COMMAND="ionice -c 3 -p \${PID}"
> > PORTAGE_SCHEDULING_POLICY="idle"
> >
> > Those 2 together in make.conf have had a noticeable effect on multitasking
> > for me. I still wouldn't recommend allocating all of your cores to
> > emerge, but emerging with idle priority keeps your tasks a little higher
> > up in the mix.
> >
> > ________________________________________
> > From: Laurence Perkins <lperkins@openeye.net>
> > Sent: Thursday, September 21, 2023 3:26 PM
> > To: gentoo-user@lists.gentoo.org
> > Subject: RE: [gentoo-user] Controlling emerges
>
> I had the first one, little different for my rig, but I added the second
> one just now. I'll be testing this tomorrow or Sunday, depending on
> packages, maybe both. lol
>
> Sometimes I wish they would announce when they add features. Rich, you
> frequent this list. If you hear of something new, could you post it?
> This may not be NEW but it is new to me. No idea when it got added.
>
> Dale
>
> :-) :-)

Loads of tweaks are described here, which I wasn't aware of:

https://wiki.gentoo.org/wiki/Portage_niceness

As well as the man pages for make.conf and sched. I'm not sure what default
values are, if these variables are not set in make.conf.

Re: Controlling emerges [ In reply to ]

rich0 at gentoo

Sep 22, 2023, 3:07 AM

Post #25 of 25 (382 views)

Permalink

On Thu, Sep 21, 2023 at 9:13?PM Dale <rdalek1967@gmail.com> wrote:
>
> Sometimes I wish they would announce when they add features. Rich, you
> frequent this list. If you hear of something new, could you post it?

Sure, if a relevant topic comes up and I'm aware of it. However, I
doubt this setting is going to do much that nice doesn't already do.

The original focus seemed to be on memory use, and niceness will not
have any impact on the memory use of a build. The only thing that
will is reducing the number of parallel jobs. There really isn't any
way to get portage to regulate memory use short of letting it be
killed (which isn't helpful), maybe letting it being stopped when
things get out of hand (which will help as the memory could at least
be swapped, but the build might not be salvageable without jumping
through a lot of hoops), or if the package maintainer provides some
kind of hinting to the package manager so that it can anticipate how
much memory it will use. Otherwise trying to figure out how much
memory a build system will use without just trying it is like solving
the halting problem.

--
Rich

Mailing List Archive

Attached Files:

Attached Files: