Mailing List Archive

Re: svn commit: r1899777 - /httpd/httpd/trunk/server/mpm/event/event.c
On 4/12/22 2:08 PM, ylavic@apache.org wrote:
> Author: ylavic
> Date: Tue Apr 12 12:08:02 2022
> New Revision: 1899777
>
> URL: http://svn.apache.org/viewvc?rev=1899777&view=rev
> Log:
> mpm_event: Fix accounting of active/total processes on ungraceful restart.
>
> Children processes terminated by ap_{reclaim,relieve}_child_processes() were
> were not un-accounted for total_daemons and active_daemons, which was done in
> server_main_loop() only. This led to perform_idle_server_maintenance() thinking
> it was over the limit of children processes and never create new ones.
>
> Have this accounting right in event_note_child_{started,stopped}() which is
> called both at runtime and reload time.
>
> * server/mpm/event/event.c(struct event_retained_data):
> Rename field max_daemons_limit to max_daemon_used to better describe what
> it's about and to align with AP_MPMQ_MAX_DAEMON_USED.
>
> * server/mpm/event/event.c(event_note_child_stopped):
> Renamed from event_note_child_killed() to clarify that it's not only called
> when a child is killed (i.e. on restart) but whenever a child has stopped.
>
> * server/mpm/event/event.c(event_note_child_stopped):
> Move decrementing {active,total}_daemons and marking child's threads as
> SERVER_DEAD from server_main_loop() so that it's done both at runtime and
> reload time. Log the current number/state of daemons at APLOG_DEBUG level
> for each child stopped.
>
> * server/mpm/event/event.c(event_note_child_started):
> Move incrementing {active,total}_daemons from make_child() for symmetry,
> given that make_child() calls event_note_child_started(). Log the current
> number/state of daemons at APLOG_DEBUG level for each child started.
>
> * server/mpm/event/event.c(perform_idle_server_maintenance):
> Fix possible miscounting of retained->max_daemon_used accross the multiple
> calls to perform_idle_server_maintenance() if ListenCoresBucketsRatio > 0.
> Pass an int *max_daemon_used which starts at zero and is bumped consistently
> for all the buckets, while retained->max_daemon_used is updated only after
> all the buckets have been maintained.
>
> * server/mpm/event/event.c(perform_idle_server_maintenance):
> Use event_note_child_stopped() to handle exited children processes.
>
>
> Fixes: BZ 66004
>
>
> Modified:
> httpd/httpd/trunk/server/mpm/event/event.c
>
> Modified: httpd/httpd/trunk/server/mpm/event/event.c
> URL: http://svn.apache.org/viewvc/httpd/httpd/trunk/server/mpm/event/event.c?rev=1899777&r1=1899776&r2=1899777&view=diff
> ==============================================================================
> --- httpd/httpd/trunk/server/mpm/event/event.c (original)
> +++ httpd/httpd/trunk/server/mpm/event/event.c Tue Apr 12 12:08:02 2022

> @@ -3447,9 +3480,11 @@ static void server_main_loop(int remaini
> continue;
> }
>
> + max_daemon_used = 0;
> for (i = 0; i < num_buckets; i++) {
> - perform_idle_server_maintenance(i);
> + perform_idle_server_maintenance(i, &max_daemon_used);
> }
> + retained->max_daemon_used = max_daemon_used;

Shouldn't we do the above only when max_daemon_used > retained->max_daemon_used ? Otherwise we might shrink
retained->max_daemon_used if make_child increased it?

Regards

RĂ¼diger
Re: svn commit: r1899777 - /httpd/httpd/trunk/server/mpm/event/event.c [ In reply to ]
On Wed, Apr 13, 2022 at 4:22 PM Ruediger Pluem <rpluem@apache.org> wrote:
>
> On 4/12/22 2:08 PM, ylavic@apache.org wrote:
>
> > @@ -3447,9 +3480,11 @@ static void server_main_loop(int remaini
> > continue;
> > }
> >
> > + max_daemon_used = 0;
> > for (i = 0; i < num_buckets; i++) {
> > - perform_idle_server_maintenance(i);
> > + perform_idle_server_maintenance(i, &max_daemon_used);
> > }
> > + retained->max_daemon_used = max_daemon_used;
>
> Shouldn't we do the above only when max_daemon_used > retained->max_daemon_used ? Otherwise we might shrink
> retained->max_daemon_used if make_child increased it?

You are right that retained->max_daemon_used can be shrinked with
r1899777, it should be better now with the follow up r1899812.
But here we need to set it unconditionally (not only when <
max_daemon_used) because retained->max_daemon_used needs to decrease
too when "high" children are stopped (e.g. MaxSpareThreads), it can't
increase forever.


Regards;
Yann.
Re: svn commit: r1899777 - /httpd/httpd/trunk/server/mpm/event/event.c [ In reply to ]
> Am 13.04.2022 um 17:33 schrieb Yann Ylavic <ylavic.dev@gmail.com>:
>
> On Wed, Apr 13, 2022 at 4:22 PM Ruediger Pluem <rpluem@apache.org> wrote:
>>
>> On 4/12/22 2:08 PM, ylavic@apache.org wrote:
>>
>>> @@ -3447,9 +3480,11 @@ static void server_main_loop(int remaini
>>> continue;
>>> }
>>>
>>> + max_daemon_used = 0;
>>> for (i = 0; i < num_buckets; i++) {
>>> - perform_idle_server_maintenance(i);
>>> + perform_idle_server_maintenance(i, &max_daemon_used);
>>> }
>>> + retained->max_daemon_used = max_daemon_used;
>>
>> Shouldn't we do the above only when max_daemon_used > retained->max_daemon_used ? Otherwise we might shrink
>> retained->max_daemon_used if make_child increased it?
>
> You are right that retained->max_daemon_used can be shrinked with
> r1899777, it should be better now with the follow up r1899812.
> But here we need to set it unconditionally (not only when <
> max_daemon_used) because retained->max_daemon_used needs to decrease
> too when "high" children are stopped (e.g. MaxSpareThreads), it can't
> increase forever.
>

I have the feeling we are in need of some sort of stress tests on
the overall child management scenarios. Offering my help.

Cheers,
Stefan
Re: svn commit: r1899777 - /httpd/httpd/trunk/server/mpm/event/event.c [ In reply to ]
On Thu, Apr 14, 2022 at 9:12 AM Stefan Eissing <stefan@eissing.org> wrote:
>
> I have the feeling we are in need of some sort of stress tests on
> the overall child management scenarios. Offering my help.

Thanks Stefan for offering!

Say we have a MPM event configuration like this:
StartServers 1
ServerLimit 4 or 16
ThreadLimit 2
ThreadsPerChild 2
MinSpareThreads 2
MaxSpareThreads 4
MaxRequestWorkers 8
MaxConnectionsPerChild 0

The test would be to have 1 to 8 clients continuously requesting
something that takes e.g. 1s to complete (mod_ratelimit or mod_proxy
to a backend of yours that controls the rate?), and then issuing
restarts simultaneously (graceful and ungraceful).

Depending on ServerLimit (4 or 16) there should (or not) be spare room
in the scoreboard for children processes to be spawned on restart. If
not they will have to wait for the old generation ones to exit before
new connections get accepted, but it should still "work".

Not sure which of mod_status or the error_log is the best way to
monitor the results..

Is something like this conceivable? Or maybe someone has a better idea?


Cheers;
Yann.
Re: svn commit: r1899777 - /httpd/httpd/trunk/server/mpm/event/event.c [ In reply to ]
> Am 14.04.2022 um 10:19 schrieb Yann Ylavic <ylavic.dev@gmail.com>:
>
> On Thu, Apr 14, 2022 at 9:12 AM Stefan Eissing <stefan@eissing.org> wrote:
>>
>> I have the feeling we are in need of some sort of stress tests on
>> the overall child management scenarios. Offering my help.
>
> Thanks Stefan for offering!
>
> Say we have a MPM event configuration like this:
> StartServers 1
> ServerLimit 4 or 16
> ThreadLimit 2
> ThreadsPerChild 2
> MinSpareThreads 2
> MaxSpareThreads 4
> MaxRequestWorkers 8
> MaxConnectionsPerChild 0
>
> The test would be to have 1 to 8 clients continuously requesting
> something that takes e.g. 1s to complete (mod_ratelimit or mod_proxy
> to a backend of yours that controls the rate?), and then issuing
> restarts simultaneously (graceful and ungraceful).
>
> Depending on ServerLimit (4 or 16) there should (or not) be spare room
> in the scoreboard for children processes to be spawned on restart. If
> not they will have to wait for the old generation ones to exit before
> new connections get accepted, but it should still "work".
>
> Not sure which of mod_status or the error_log is the best way to
> monitor the results..
>
> Is something like this conceivable? Or maybe someone has a better idea?

In test/modules/core/test_002_restarts.py there is now a start of this. Invoked
specifically with:


this uses the config given and runs h2load with 6 clients, each using 5 connections
over time to perform a number of total requests. Number of clients easily tweakable.

The test in its current form fail, bc it expects all requests to succeed. However, h2load
counts the remaining requests on a connection unsuccessful if the server closes. So, maybe
we should use another test client or count 'success' differently.

Anyway, in test/gen/apache/logs/error_log one can see how mpm_event hits it limits
on workers and concurrent connections. Like

" All workers are busy or dying, will shutdown 1 keep-alive connections"
or "AH03269: Too many open connections (1), not accepting new conns in this process"

So, we need to define how we measure `success` and what we expect/not expect to see
in the error log. I guess?

Kind Regards,
Stefan

>
>
> Cheers;
> Yann.
Re: svn commit: r1899777 - /httpd/httpd/trunk/server/mpm/event/event.c [ In reply to ]
On Thu, Apr 14, 2022 at 1:43 PM Stefan Eissing <stefan@eissing.org> wrote:
>
>
> In test/modules/core/test_002_restarts.py there is now a start of this. Invoked
> specifically with:
>
> trunk> STRESS_TEST=1 pytest -vvv -k test_core_002

Thanks for writing this!

>
> this uses the config given and runs h2load with 6 clients, each using 5 connections
> over time to perform a number of total requests. Number of clients easily tweakable.

I saw h2load issue up to 12 connections with this configuration, so
raised the MPM limits a bit in r1899862 to avoid failing on
"MaxRequestWorkers reached" in the logs.

>
> The test in its current form fail, bc it expects all requests to succeed. However, h2load
> counts the remaining requests on a connection unsuccessful if the server closes. So, maybe
> we should use another test client or count 'success' differently.

It passes for me currently with the new MPM limits, probably because
kept-alive connections are not killed anymore.

>
> Anyway, in test/gen/apache/logs/error_log one can see how mpm_event hits it limits
> on workers and concurrent connections. Like
>
> " All workers are busy or dying, will shutdown 1 keep-alive connections"
> or "AH03269: Too many open connections (1), not accepting new conns in this process"
>
> So, we need to define how we measure `success` and what we expect/not expect to see
> in the error log. I guess?

I'm not sure what we want to test here, the limits or the restart?
I thought it was the latter as a first step, but yes if we want to
test the former we need to either disable keepalive to avoid confusing
h2load or use another (dedicated?) client more tolerant on how we may
early close kept-alive connections.
Measuring how we behave at the limits may also be tricky, what would
be the expectations? No unhandled connections? A minimal latency? Or?
That may depend on the running machine too..

If we want to test the reload, we don't need to stress too much (just
some active connections for each generation), but we need something
that "-k restart" and/or "-k restart" in parallel and counts the
processes or the "Child <slot> started|stopped: pid=<pid>" in the
logs.


Cheers;
Yann.
Re: svn commit: r1899777 - /httpd/httpd/trunk/server/mpm/event/event.c [ In reply to ]
> Am 14.04.2022 um 17:54 schrieb Yann Ylavic <ylavic.dev@gmail.com>:
>
> On Thu, Apr 14, 2022 at 1:43 PM Stefan Eissing <stefan@eissing.org> wrote:
>>
>>
>> In test/modules/core/test_002_restarts.py there is now a start of this. Invoked
>> specifically with:
>>
>> trunk> STRESS_TEST=1 pytest -vvv -k test_core_002
>
> Thanks for writing this!
>
>>
>> this uses the config given and runs h2load with 6 clients, each using 5 connections
>> over time to perform a number of total requests. Number of clients easily tweakable.
>
> I saw h2load issue up to 12 connections with this configuration, so
> raised the MPM limits a bit in r1899862 to avoid failing on
> "MaxRequestWorkers reached" in the logs.

It may not be totally exact in its limiting or you may have counted lingering connections?

>>
>> The test in its current form fail, bc it expects all requests to succeed. However, h2load
>> counts the remaining requests on a connection unsuccessful if the server closes. So, maybe
>> we should use another test client or count 'success' differently.
>
> It passes for me currently with the new MPM limits, probably because
> kept-alive connections are not killed anymore.
>
>>
>> Anyway, in test/gen/apache/logs/error_log one can see how mpm_event hits it limits
>> on workers and concurrent connections. Like
>>
>> " All workers are busy or dying, will shutdown 1 keep-alive connections"
>> or "AH03269: Too many open connections (1), not accepting new conns in this process"
>>
>> So, we need to define how we measure `success` and what we expect/not expect to see
>> in the error log. I guess?
>
> I'm not sure what we want to test here, the limits or the restart?
> I thought it was the latter as a first step, but yes if we want to
> test the former we need to either disable keepalive to avoid confusing
> h2load or use another (dedicated?) client more tolerant on how we may
> early close kept-alive connections.

This initial version was just a test balloon. I think we want to test
the dynamic behaviour of mpm_event in regard to child processes and
restarts. I'll take a stab on changing it for this.

> Measuring how we behave at the limits may also be tricky, what would
> be the expectations? No unhandled connections? A minimal latency? Or?
> That may depend on the running machine too..
>
> If we want to test the reload, we don't need to stress too much (just
> some active connections for each generation), but we need something
> that "-k restart" and/or "-k restart" in parallel and counts the
> processes or the "Child <slot> started|stopped: pid=<pid>" in the
> logs.
>
>
> Cheers;
> Yann.
Re: svn commit: r1899777 - /httpd/httpd/trunk/server/mpm/event/event.c [ In reply to ]
In r1899885 the test checks on dynamic child behaviour on a graceful reload.

Is this a setup that allows us to verify behaviour that troubled us in the past?

Cheers,
Stefan

> Am 15.04.2022 um 10:41 schrieb Stefan Eissing <stefan@eissing.org>:
>
>
>
>> Am 14.04.2022 um 17:54 schrieb Yann Ylavic <ylavic.dev@gmail.com>:
>>
>> On Thu, Apr 14, 2022 at 1:43 PM Stefan Eissing <stefan@eissing.org> wrote:
>>>
>>>
>>> In test/modules/core/test_002_restarts.py there is now a start of this. Invoked
>>> specifically with:
>>>
>>> trunk> STRESS_TEST=1 pytest -vvv -k test_core_002
>>
>> Thanks for writing this!
>>
>>>
>>> this uses the config given and runs h2load with 6 clients, each using 5 connections
>>> over time to perform a number of total requests. Number of clients easily tweakable.
>>
>> I saw h2load issue up to 12 connections with this configuration, so
>> raised the MPM limits a bit in r1899862 to avoid failing on
>> "MaxRequestWorkers reached" in the logs.
>
> It may not be totally exact in its limiting or you may have counted lingering connections?
>
>>>
>>> The test in its current form fail, bc it expects all requests to succeed. However, h2load
>>> counts the remaining requests on a connection unsuccessful if the server closes. So, maybe
>>> we should use another test client or count 'success' differently.
>>
>> It passes for me currently with the new MPM limits, probably because
>> kept-alive connections are not killed anymore.
>>
>>>
>>> Anyway, in test/gen/apache/logs/error_log one can see how mpm_event hits it limits
>>> on workers and concurrent connections. Like
>>>
>>> " All workers are busy or dying, will shutdown 1 keep-alive connections"
>>> or "AH03269: Too many open connections (1), not accepting new conns in this process"
>>>
>>> So, we need to define how we measure `success` and what we expect/not expect to see
>>> in the error log. I guess?
>>
>> I'm not sure what we want to test here, the limits or the restart?
>> I thought it was the latter as a first step, but yes if we want to
>> test the former we need to either disable keepalive to avoid confusing
>> h2load or use another (dedicated?) client more tolerant on how we may
>> early close kept-alive connections.
>
> This initial version was just a test balloon. I think we want to test
> the dynamic behaviour of mpm_event in regard to child processes and
> restarts. I'll take a stab on changing it for this.
>
>> Measuring how we behave at the limits may also be tricky, what would
>> be the expectations? No unhandled connections? A minimal latency? Or?
>> That may depend on the running machine too..
>>
>> If we want to test the reload, we don't need to stress too much (just
>> some active connections for each generation), but we need something
>> that "-k restart" and/or "-k restart" in parallel and counts the
>> processes or the "Child <slot> started|stopped: pid=<pid>" in the
>> logs.
>>
>>
>> Cheers;
>> Yann.