Mailing List Archive

Late(r) stop of children processes on restart
When the MPM event/worker is restarting, it first signals the
children's processes to stop (via POD), then reload the configuration,
and finally start the new generation.

This may be problematic when the reload takes some time to complete
because incoming connections are no longer processed.
A module at day $job is loading quite some regexes and JSON schemas
for each vhost, and I have seen restarts take tens of seconds to
complete with a large number of vhosts. I suppose this can happen with
many RewriteRules too.

How about we wait for the reload to complete before stopping the old
generation, like in the attached patch (MPM event only for now,
changes in worker would be quite similar)?

This is achieved by creating the PODs and listeners buckets from a
generation pool (gen_pool), with a different lifetime than pconf.
gen_pool survives restarts and is created/cleared after the old
generation is stopped, entirely in the run_mpm hook, so the stop and
PODs and buckets handling is moved there (most changes are cut/paste).

WDYT?

Regards;
Yann.
Re: Late(r) stop of children processes on restart [ In reply to ]
Can comment really on the diff, but totally agree on the goal to minimize the unresponsive time and make graceful less disruptive.

So +1 for that.

> Am 28.06.2021 um 16:25 schrieb Yann Ylavic <ylavic.dev@gmail.com>:
>
> When the MPM event/worker is restarting, it first signals the
> children's processes to stop (via POD), then reload the configuration,
> and finally start the new generation.
>
> This may be problematic when the reload takes some time to complete
> because incoming connections are no longer processed.
> A module at day $job is loading quite some regexes and JSON schemas
> for each vhost, and I have seen restarts take tens of seconds to
> complete with a large number of vhosts. I suppose this can happen with
> many RewriteRules too.
>
> How about we wait for the reload to complete before stopping the old
> generation, like in the attached patch (MPM event only for now,
> changes in worker would be quite similar)?
>
> This is achieved by creating the PODs and listeners buckets from a
> generation pool (gen_pool), with a different lifetime than pconf.
> gen_pool survives restarts and is created/cleared after the old
> generation is stopped, entirely in the run_mpm hook, so the stop and
> PODs and buckets handling is moved there (most changes are cut/paste).
>
> WDYT?
>
> Regards;
> Yann.
> <late_children_stop.diff>
Re: Late(r) stop of children processes on restart [ In reply to ]
Am 29.06.2021 um 14:31 schrieb Stefan Eissing:
> Can comment really on the diff, but totally agree on the goal to minimize the unresponsive time and make graceful less disruptive.
>
> So +1 for that.

+1 on the intention as well.

Not sure, whether that means people would need more headroom in the
scoreboard (which would probably warrant a sentence in CHANGES or docs
about that) or whether it just means the duration during which that
headroom is used changes (which I wouldn't care about).

Thanks and regards,

Rainer

>> Am 28.06.2021 um 16:25 schrieb Yann Ylavic <ylavic.dev@gmail.com>:
>>
>> When the MPM event/worker is restarting, it first signals the
>> children's processes to stop (via POD), then reload the configuration,
>> and finally start the new generation.
>>
>> This may be problematic when the reload takes some time to complete
>> because incoming connections are no longer processed.
>> A module at day $job is loading quite some regexes and JSON schemas
>> for each vhost, and I have seen restarts take tens of seconds to
>> complete with a large number of vhosts. I suppose this can happen with
>> many RewriteRules too.
>>
>> How about we wait for the reload to complete before stopping the old
>> generation, like in the attached patch (MPM event only for now,
>> changes in worker would be quite similar)?
>>
>> This is achieved by creating the PODs and listeners buckets from a
>> generation pool (gen_pool), with a different lifetime than pconf.
>> gen_pool survives restarts and is created/cleared after the old
>> generation is stopped, entirely in the run_mpm hook, so the stop and
>> PODs and buckets handling is moved there (most changes are cut/paste).
>>
>> WDYT?
>>
>> Regards;
>> Yann.
>> <late_children_stop.diff>
Re: Late(r) stop of children processes on restart [ In reply to ]
On Tue, Jun 29, 2021 at 3:00 PM Rainer Jung <rainer.jung@kippdata.de> wrote:
>
> Am 29.06.2021 um 14:31 schrieb Stefan Eissing:
> > Can comment really on the diff, but totally agree on the goal to minimize the unresponsive time and make graceful less disruptive.
> >
> > So +1 for that.
>
> +1 on the intention as well.

Checked in trunk (r1892587 + r1892595).

>
> Not sure, whether that means people would need more headroom in the
> scoreboard (which would probably warrant a sentence in CHANGES or docs
> about that) or whether it just means the duration during which that
> headroom is used changes (which I wouldn't care about).

The restart delay between stop and start is now minimal (no reload in
between), but the headroom needed does not change AIUI.
We still have the situation where connections (worker threads) are
active for both the new and old generations of children processes, and
its duration depends mainly on the actual lifetime of the connections.
So the current tunings still hold I think.

What changes now is that for both graceful and ungraceful restarts the
main process fully consumes one CPU (to reload) while children are
actively running (the old generation keeps accepting/processing
connections during reload), whereas before the children were tearing
down thus easing the CPUs (but filling the sockets backlogs,
potentially until exhaustion..).
So there might be a greater load spike (overall) than before on reload.

A note on the headroom while at it:
mpm_event is possibly less consumer of children (hence scoreboard
slots) on restart, because when a child is dying it stops (and thus
doesn't account for) the worker threads above the remaining number of
connections, which will accurately create children of the new
generation to scale. mpm_worker never stops threads (this improvement
never made it there AFAICT), thus by accounting for inactive threads
as active it will finally create more children of the new generation
as connections arrive (eventually reaching the limits earlier, or
blocking/waiting for worker threads in the new generation of children
overflowed by incoming connections which the main process thinks are
evenly distributed across all the children, including old
generation's).
I don't know how hard/worthy it is to align mpm_worker with mpm_event
on this, just a note..


Cheers;
Yann.
Re: Late(r) stop of children processes on restart [ In reply to ]
Thanks for the headroom explanation Yann, good reading!

Rainer

Am 25.08.2021 um 13:23 schrieb Yann Ylavic:
> On Tue, Jun 29, 2021 at 3:00 PM Rainer Jung <rainer.jung@kippdata.de> wrote:
>>
>> Am 29.06.2021 um 14:31 schrieb Stefan Eissing:
>>> Can comment really on the diff, but totally agree on the goal to minimize the unresponsive time and make graceful less disruptive.
>>>
>>> So +1 for that.
>>
>> +1 on the intention as well.
>
> Checked in trunk (r1892587 + r1892595).
>
>>
>> Not sure, whether that means people would need more headroom in the
>> scoreboard (which would probably warrant a sentence in CHANGES or docs
>> about that) or whether it just means the duration during which that
>> headroom is used changes (which I wouldn't care about).
>
> The restart delay between stop and start is now minimal (no reload in
> between), but the headroom needed does not change AIUI.
> We still have the situation where connections (worker threads) are
> active for both the new and old generations of children processes, and
> its duration depends mainly on the actual lifetime of the connections.
> So the current tunings still hold I think.
>
> What changes now is that for both graceful and ungraceful restarts the
> main process fully consumes one CPU (to reload) while children are
> actively running (the old generation keeps accepting/processing
> connections during reload), whereas before the children were tearing
> down thus easing the CPUs (but filling the sockets backlogs,
> potentially until exhaustion..).
> So there might be a greater load spike (overall) than before on reload.
>
> A note on the headroom while at it:
> mpm_event is possibly less consumer of children (hence scoreboard
> slots) on restart, because when a child is dying it stops (and thus
> doesn't account for) the worker threads above the remaining number of
> connections, which will accurately create children of the new
> generation to scale. mpm_worker never stops threads (this improvement
> never made it there AFAICT), thus by accounting for inactive threads
> as active it will finally create more children of the new generation
> as connections arrive (eventually reaching the limits earlier, or
> blocking/waiting for worker threads in the new generation of children
> overflowed by incoming connections which the main process thinks are
> evenly distributed across all the children, including old
> generation's).
> I don't know how hard/worthy it is to align mpm_worker with mpm_event
> on this, just a note..
>
>
> Cheers;
> Yann.