Mailing List Archive

Genlop wonky again
Hello list,

I've just had some strange output from genlop on my 16-thread i5 box, thus:

# genlop -t libreoffice | /bin/grep minute
merge time: 37 minutes and 38 seconds.
merge time: 52 minutes and 59 seconds.
merge time: 46 minutes and 17 seconds.

# genlop -c

Currently merging 11 out of 11

* app-office/libreoffice-7.5.9.2

current merge time: 4 minutes and 3 seconds.
ETA: 1 hour, 4 minutes and 24 seconds.

### Then, once the update finished:

# genlop -t libreoffice | /bin/grep minute
merge time: 37 minutes and 38 seconds.
merge time: 52 minutes and 59 seconds.
merge time: 46 minutes and 17 seconds.
merge time: 38 minutes and 40 seconds.

I know genlop is, shall we say, not perfect, but how can it be so grossly
wrong as that?

I have this in make.conf, and it hasn't changed since I built the machine:

grep '\-j' /etc/portage/make.conf
EMERGE_DEFAULT_OPTS="--jobs --load-average=12
MAKEOPTS="-j12 -l12"

--
Regards,
Peter.
Re: Genlop wonky again [ In reply to ]
On Fri, Jan 5, 2024 at 6:52 PM Peter Humphrey <peter@prh.myzen.co.uk> wrote:

> Hello list,
>
> I've just had some strange output from genlop on my 16-thread i5 box, thus:
>
> # genlop -t libreoffice | /bin/grep minute
> merge time: 37 minutes and 38 seconds.
> merge time: 52 minutes and 59 seconds.
> merge time: 46 minutes and 17 seconds.
>
> # genlop -c
>
> Currently merging 11 out of 11
>
> * app-office/libreoffice-7.5.9.2
>
> current merge time: 4 minutes and 3 seconds.
> ETA: 1 hour, 4 minutes and 24 seconds.
>
> ### Then, once the update finished:
>
> # genlop -t libreoffice | /bin/grep minute
> merge time: 37 minutes and 38 seconds.
> merge time: 52 minutes and 59 seconds.
> merge time: 46 minutes and 17 seconds.
> merge time: 38 minutes and 40 seconds.
>
> I know genlop is, shall we say, not perfect, but how can it be so grossly
> wrong as that?
>
> I have this in make.conf, and it hasn't changed since I built the machine:
>
> grep '\-j' /etc/portage/make.conf
> EMERGE_DEFAULT_OPTS="--jobs --load-average=12
> MAKEOPTS="-j12 -l12"
>
> --
> Regards,
> Peter.



I’ve often found that it gives one estimate when multiple packages are
being built, then a much longer estimate for still-in-progress builds once
some of the builds have finished.

That result defies common sense. Less remaining work has to take less, not
more (much more), time.

This observation tells me that the algorithm is very fundamentally broken.
The only way to answer how it can be so grossly wrong is to examine its
algorithm. That’s been on my to-do list for ages, but the thought of
debugging it has so far not risen to worth-the-effort status.

I use nearly the same build options as you, so perhaps we’re triggering the
same problem. But my less-work-implies-longer-time observations suggests to
me that the problem is more fundamental than details of jobs/threads/etc.

John Blinka

>
Re: Genlop wonky again [ In reply to ]
On 06/01/2024 00:54, John Blinka wrote:
> I’ve often found that it gives one estimate when multiple packages are
> being built, then a much longer estimate for still-in-progress builds
> once some of the builds have finished.
>
> That result defies common sense. Less remaining work has to take less,
> not more (much more), time.

Common sense isn't common and, well, often doesn't make sense.

If there's a bunch of small builds skewing the "time per build" estimate
down, as they drop off the list the estimated time per build will go up,
and if the skew is serious enough it can even make the total estimated
time go up ...

Cheers,
Wol
Re: Genlop wonky again [ In reply to ]
On Sat, Jan 6, 2024 at 3:56 AM Wols Lists <antlists@youngman.org.uk> wrote:

> On 06/01/2024 00:54, John Blinka wrote:
> > I’ve often found that it gives one estimate when multiple packages are
> > being built, then a much longer estimate for still-in-progress builds
> > once some of the builds have finished.
> >
> > That result defies common sense. Less remaining work has to take less,
> > not more (much more), time.
>
> Common sense isn't common and, well, often doesn't make sense.
>
> If there's a bunch of small builds skewing the "time per build" estimate
> down, as they drop off the list the estimated time per build will go up,
> and if the skew is serious enough it can even make the total estimated
> time go up ...


I don’t follow you. What is the source of this “skew”? Why should more
available processing power/less load cause builds to run more slowly? I’d
really like to understand your point.

I have observed what I reported above many times, often when there are 2
builds running, a long one and a shorter one. Once the shorter one ends ,
the longer one’s time estimate via genlop increases , sometimes by 2x. And
it doesn’t actually take 2x longer - the new estimate is just grossly
wrong. Invoking skew or common sense being uncommon/wrong doesn’t change my
and the original poster’s observations that genlop sometimes gives really
bad time estimates. Something’s not right.

Respectfully

John

>
Re: Genlop wonky again [ In reply to ]
On 06/01/2024 16:12, John Blinka wrote:
> And it doesn’t actually take 2x longer - the new estimate is just
> grossly wrong.

I presume that the old estimate was also wrong.

And it's nothing to do with more power or whatever, it's down to simple
statistics. If genloop guesses the statistical spread wrongly, it's
going to mess up its estimates.

If you have a double-peak distribution, with a large short-lived peak,
and a small long-lived peak, you can get some weird results, especially
if you have assumed a bell curve (almost always wrong) or an exponential
decay (which is generally, NOT ALWAYS, a good choice).

Cheers,
Wol
Re: Genlop wonky again [ In reply to ]
Am 5. Januar 2024 23:51:39 UTC schrieb Peter Humphrey <peter@prh.myzen.co.uk>:
>Hello list,
>
>I've just had some strange output from genlop on my 16-thread i5 box, thus:
>
># genlop -t libreoffice | /bin/grep minute
> merge time: 37 minutes and 38 seconds.
> merge time: 52 minutes and 59 seconds.
> merge time: 46 minutes and 17 seconds.
>
># genlop -c
>
> Currently merging 11 out of 11
>
> * app-office/libreoffice-7.5.9.2
>
> current merge time: 4 minutes and 3 seconds.
> ETA: 1 hour, 4 minutes and 24 seconds.
>
>### Then, once the update finished:
>
># genlop -t libreoffice | /bin/grep minute
> merge time: 37 minutes and 38 seconds.
> merge time: 52 minutes and 59 seconds.
> merge time: 46 minutes and 17 seconds.
> merge time: 38 minutes and 40 seconds.
>
>I know genlop is, shall we say, not perfect, but how can it be so grossly
>wrong as that?
>
>I have this in make.conf, and it hasn't changed since I built the machine:
>
>grep '\-j' /etc/portage/make.conf
>EMERGE_DEFAULT_OPTS="--jobs --load-average=12
>MAKEOPTS="-j12 -l12"
>

There are not by chance binary merges which took less than a minute? That might explain the differences.
What is the output wihout the grep or filtering by merge time instead.

--
Best regards
Daniel
Re: Genlop wonky again [ In reply to ]
On 1/6/24 11:21, Wols Lists wrote:
> On 06/01/2024 16:12, John Blinka wrote:
>> And it doesn’t actually take 2x longer - the new estimate is just
>> grossly wrong.
>
> I presume that the old estimate was also wrong.
>
> And it's nothing to do with more power or whatever, it's down to
> simple statistics. If genloop guesses the statistical spread wrongly,
> it's going to mess up its estimates.
>
> If you have a double-peak distribution, with a large short-lived peak,
> and a small long-lived peak, you can get some weird results,
> especially if you have assumed a bell curve (almost always wrong) or
> an exponential decay (which is generally, NOT ALWAYS, a good choice).
>
> Cheers,
> Wol

I think there is a slightly deeper question also involved. First, I'll
assume (safe or not) that genlop's assumption of total build time for a
package depends solely on the previous build times, with all the foibles
Wol implies in that.  However, that estimate then gets adjusted as the
build progresses.  Clearly, experience shows up that the estimated
remaining time is NOT simply the estimated build time minus the time
spent so far, except possibly when an emerge is only for one package. 
What else contributes to that estimate?  If that adjustment includes
using the number of other builds going on at the same time, and their
original and estimated build times, I can see lots of opportunity for
shenanigans

Jack.
Re: Genlop wonky again [ In reply to ]
Re: Genlop wonky again [ In reply to ]
On Saturday, 6 January 2024 16:21:30 GMT Wols Lists wrote:

> ... it's nothing to do with more power or whatever, it's down to simple
> statistics. If genloop guesses the statistical spread wrongly, it's
> going to mess up its estimates.

Aren't you exaggerating genlop's complexity? I wasn't aware of any use of
statistics in it, other than a simple arithmetic mean to estimate the time
remaining. It certainly seems to do that, anyway.

> If you have a double-peak distribution, with a large short-lived peak,
> and a small long-lived peak, you can get some weird results, especially
> if you have assumed a bell curve (almost always wrong) or an exponential
> decay (which is generally, NOT ALWAYS, a good choice).

I doubt it does any of that.

--
Regards,
Peter.
Re: Genlop wonky again [ In reply to ]
On Saturday, 6 January 2024 16:26:49 GMT Daniel Pielmeier wrote:
> Am 5. Januar 2024 23:51:39 UTC schrieb Peter Humphrey
<peter@prh.myzen.co.uk>:
> >Hello list,
> >
> >I've just had some strange output from genlop on my 16-thread i5 box, thus:
> >
> ># genlop -t libreoffice | /bin/grep minute
> >
> > merge time: 37 minutes and 38 seconds.
> > merge time: 52 minutes and 59 seconds.
> > merge time: 46 minutes and 17 seconds.
> >
> ># genlop -c
> >
> > Currently merging 11 out of 11
> >
> > * app-office/libreoffice-7.5.9.2
> >
> > current merge time: 4 minutes and 3 seconds.
> > ETA: 1 hour, 4 minutes and 24 seconds.
> >
> >### Then, once the update finished:
> >
> ># genlop -t libreoffice | /bin/grep minute
> >
> > merge time: 37 minutes and 38 seconds.
> > merge time: 52 minutes and 59 seconds.
> > merge time: 46 minutes and 17 seconds.
> > merge time: 38 minutes and 40 seconds.
> >
> >I know genlop is, shall we say, not perfect, but how can it be so grossly
> >wrong as that?
> >
> >I have this in make.conf, and it hasn't changed since I built the machine:
> >
> >grep '\-j' /etc/portage/make.conf
> >EMERGE_DEFAULT_OPTS="--jobs --load-average=12
> >MAKEOPTS="-j12 -l12"
>
> There are not by chance binary merges which took less than a minute? That
> might explain the differences.

That would skew the prediction downwards, not up.

> What is the output wihout the grep or filtering by merge time instead.

The same.

--
Regards,
Peter.
Re: Genlop wonky again [ In reply to ]
On 06/01/2024 17:59, Peter Humphrey wrote:
> On Saturday, 6 January 2024 16:21:30 GMT Wols Lists wrote:
>
>> ... it's nothing to do with more power or whatever, it's down to simple
>> statistics. If genloop guesses the statistical spread wrongly, it's
>> going to mess up its estimates.

> Aren't you exaggerating genlop's complexity? I wasn't aware of any use of
> statistics in it, other than a simple arithmetic mean to estimate the time
> remaining. It certainly seems to do that, anyway.

Other than a simple arithmetic mean !!! Other than a simple arithmetic
mean !!!

If that's the case, you've just confirmed my statement - genloop is
almost certainly using the wrong statistics for the job !!!

If you take the average (arithmetic mean) of a power-law (exponential
decay) distribution, your results are going to be garbage.

Statistics is one of those areas where, if you don't know what you're
doing and you use the wrong maths, then you are going to get stupid results.

"Statistics tell you how to get from A to B. What they don't tell you is
that you're all at C".

Cheers,
Wol
Re: Genlop wonky again [ In reply to ]
On Saturday, 6 January 2024 19:28:05 GMT Wols Lists wrote:

> Statistics is one of those areas where, if you don't know what you're
> doing and you use the wrong maths, then you are going to get stupid results.
>
> "Statistics tell you how to get from A to B. What they don't tell you is
> that you're all at C".

I took a module on statistics in my Open University maths degree 40-odd years
ago. I was bemused. They seemed to say that the subject was founded on two
basic principles; then they proceeded to define each of them in terms of the
other.

I'm still waiting for the entire edifice to come crashing down around our ears.
:)

--
Regards,
Peter.
Re: Genlop wonky again [ In reply to ]
On 07/01/2024 00:52, Peter Humphrey wrote:
> On Saturday, 6 January 2024 19:28:05 GMT Wols Lists wrote:
>
>> Statistics is one of those areas where, if you don't know what you're
>> doing and you use the wrong maths, then you are going to get stupid results.
>>
>> "Statistics tell you how to get from A to B. What they don't tell you is
>> that you're all at C".
>
> I took a module on statistics in my Open University maths degree 40-odd years
> ago. I was bemused. They seemed to say that the subject was founded on two
> basic principles; then they proceeded to define each of them in terms of the
> other.
>
Weird! I took a module on statistics in my Open University (Chemistry)
degree 40-odd years ago. Probably the same one? I've still got the
modules as a reference work, though I probably couldn't lay my hands on
them easily now ...

> I'm still waiting for the entire edifice to come crashing down around our ears.
> :)
>
Nah - it's been abused for so long nobody's noticed it came down
centuries ago :-)

Cheers,
Wol
Re: Genlop wonky again [ In reply to ]
On 2024-01-05, Peter Humphrey wrote:

> Hello list,
>
> I've just had some strange output from genlop on my 16-thread i5 box, thus:
>
> # genlop -t libreoffice | /bin/grep minute
> merge time: 37 minutes and 38 seconds.
> merge time: 52 minutes and 59 seconds.
> merge time: 46 minutes and 17 seconds.
>
> # genlop -c
>
> Currently merging 11 out of 11
>
> * app-office/libreoffice-7.5.9.2
>
> current merge time: 4 minutes and 3 seconds.
> ETA: 1 hour, 4 minutes and 24 seconds.
>

Is this an off-by-one?

While I'm not acquainted with perl,
https://raw.githubusercontent.com/gentoo-perl/genlop/master/genlop has
this:

"For a better prediction we only consider the last 10 merges", followed
by a max() with the number 9, suggesting zero-based indices that would
need to be incremented for the average, but then

"$tm_secondi = sum(@merge_times) / $#merge_times;"

(That said, I also wonder if the "slicing off" part needs adjustment
too, can the (zero-based?) length be greater than 9 after it was
shortened to be 9? Or am I misunderstanding the code?)

Summing the three merge times and dividing by two I get, if I've not
messed up my calculations, 68 minutes and 27 seconds, matching your
"Currently merging" output.

--
Nuno Silva
Re: Genlop wonky again [ In reply to ]
On 07/01/2024 00:52, Peter Humphrey wrote:
> They seemed to say that the subject was founded on two
> basic principles; then they proceeded to define each of them in terms of the
> other.

I should add, I dug into this sort of stuff, and you do know the entire
edifice of Peano (ie number theory), thanks to Godel, is built on the
edifice that " true == false " :-) ?

Basically, no matter how hard you try, you cannot escape the Cretan Paradox.

To quote some famous mathematician - "If you define a religion as the
irrational belief in the unprovable, then Mathematics is the only
religion that can prove it is one".

That's why the Ancient Philosophers debated how many Angels can dance on
the Head of a Pin. Set aside your prejudices, your beliefs that "that
*must* be stupid", read Terry Pratchett's "Science of Diskworld", and
realise that it doesn't matter WHERE you start, the application of logic
and reason will lead you down the Rabbit Hole into Wonderland.

And modern man is no better at avoiding that trap than the ancients.

Cheers,
Wol
Re: Genlop wonky again [ In reply to ]
On Sunday, 7 January 2024 08:34:15 GMT Wols Lists wrote:

> Weird! I took a module on statistics in my Open University (Chemistry)
> degree 40-odd years ago. Probably the same one? I've still got the
> modules as a reference work, though I probably couldn't lay my hands on
> them easily now ...

Could have been the same. It was M100, the first version of their maths
foundation course, in 1976.

--
Regards,
Peter.
Re: Genlop wonky again [ In reply to ]
On 09/01/2024 03:35, Peter Humphrey wrote:
> On Sunday, 7 January 2024 08:34:15 GMT Wols Lists wrote:
>
>> Weird! I took a module on statistics in my Open University (Chemistry)
>> degree 40-odd years ago. Probably the same one? I've still got the
>> modules as a reference work, though I probably couldn't lay my hands on
>> them easily now ...
>
> Could have been the same. It was M100, the first version of their maths
> foundation course, in 1976.
>
Ah. So you predate me slightly. I took M101. But I also took the second
level statistics course, can't remember what it was ...

Cheers,
Wol