Mailing List Archive

[Bug 66004] Child processes disappears one by one with no web access.
https://bz.apache.org/bugzilla/show_bug.cgi?id=66004

--- Comment #1 from Yann Ylavic <ylavic.dev@gmail.com> ---
It would be useful to run the reproducer with "LogLevel mpm_event:trace6" and
attach the error log here.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org
[Bug 66004] Child processes disappears one by one with no web access. [ In reply to ]
https://bz.apache.org/bugzilla/show_bug.cgi?id=66004

--- Comment #2 from Shun F <uni.roy@gmail.com> ---
(In reply to Yann Ylavic from comment #1)
> It would be useful to run the reproducer with "LogLevel mpm_event:trace6"
> and attach the error log here.

OK.
I'm trying to reproduce now. Please wait.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org
[Bug 66004] Child processes disappears one by one with no web access. [ In reply to ]
https://bz.apache.org/bugzilla/show_bug.cgi?id=66004

--- Comment #3 from Shun F <uni.roy@gmail.com> ---
Created attachment 38243
--> https://bz.apache.org/bugzilla/attachment.cgi?id=38243&action=edit
error log

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org
[Bug 66004] Child processes disappears one by one with no web access. [ In reply to ]
https://bz.apache.org/bugzilla/show_bug.cgi?id=66004

--- Comment #4 from Shun F <uni.roy@gmail.com> ---
I reproduced with "LogLevel mpm_event:trace6" and uploaded error log.
Can you please investigae.

/////////////////////////////
# systemctl start httpd
# /usr/bin/systemctl reload httpd.service
# /usr/bin/systemctl reload httpd.service
# /usr/bin/systemctl reload httpd.service
# /usr/bin/systemctl reload httpd.service
# /usr/bin/systemctl reload httpd.service
# systemctl status httpd

? httpd.service - The Apache HTTP Server
Loaded: loaded (/etc/systemd/system/httpd.service; enabled; vendor preset:
disabled)
Active: active (running) since ? 2022-04-11 19:03:14 JST; 37s ago
Process: 15865 ExecStop=/opt/app/httpd/httpd-2/bin/httpd $OPTIONS -k
graceful-stop (code=exited, status=0/SUCCESS)
Process: 17327 ExecReload=/bin/kill -HUP $MAINPID (code=exited,
status=0/SUCCESS)
Process: 17325 ExecReload=/opt/app/httpd/httpd-2/bin/httpd -t (code=exited,
status=0/SUCCESS)
Process: 15929 ExecStart=/opt/app/httpd/httpd-2/bin/httpd $OPTIONS -k start
(code=exited, status=0/SUCCESS)
Main PID: 15930 (httpd)
CGroup: /system.slice/httpd.service
tq15930 /opt/app/httpd/httpd-2/bin/httpd -k start
tq17329 /opt/app/httpd/httpd-2/bin/httpd -k start
tq17331 /opt/app/httpd/httpd-2/bin/httpd -k start
mq17336 /opt/app/httpd/httpd-2/bin/httpd -k start


33 minutes later...


# systemctl status httpd

? httpd.service - The Apache HTTP Server
Loaded: loaded (/etc/systemd/system/httpd.service; enabled; vendor preset:
disabled)
Active: active (running) since ? 2022-04-11 19:03:14 JST; 33min ago
Process: 15865 ExecStop=/opt/app/httpd/httpd-2/bin/httpd $OPTIONS -k
graceful-stop (code=exited, status=0/SUCCESS)
Process: 17327 ExecReload=/bin/kill -HUP $MAINPID (code=exited,
status=0/SUCCESS)
Process: 17325 ExecReload=/opt/app/httpd/httpd-2/bin/httpd -t (code=exited,
status=0/SUCCESS)
Process: 15929 ExecStart=/opt/app/httpd/httpd-2/bin/httpd $OPTIONS -k start
(code=exited, status=0/SUCCESS)
Main PID: 15930 (httpd)
CGroup: /system.slice/httpd.service
tq15930 /opt/app/httpd/httpd-2/bin/httpd -k start
mq17329 /opt/app/httpd/httpd-2/bin/httpd -k start
/////////////////////////////

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org
[Bug 66004] Child processes disappears one by one with no web access. [ In reply to ]
https://bz.apache.org/bugzilla/show_bug.cgi?id=66004

--- Comment #5 from Yann Ylavic <ylavic.dev@gmail.com> ---
Created attachment 38244
--> https://bz.apache.org/bugzilla/attachment.cgi?id=38244&action=edit
Fix active/total daemons accounting on non-graceful restart

Could you please try this patch?

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org
[Bug 66004] Child processes disappears one by one with no web access. [ In reply to ]
https://bz.apache.org/bugzilla/show_bug.cgi?id=66004

--- Comment #6 from Shun F <uni.roy@gmail.com> ---
(In reply to Yann Ylavic from comment #5)
> Created attachment 38244 [details]
> Fix active/total daemons accounting on non-graceful restart
>
> Could you please try this patch?

Thank you for your quick response.
What was wrong with 2.4.53?

I will try and report the result of the trial.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org
[Bug 66004] Child processes disappears one by one with no web access. [ In reply to ]
https://bz.apache.org/bugzilla/show_bug.cgi?id=66004

--- Comment #7 from Yann Ylavic <ylavic.dev@gmail.com> ---
(In reply to Shun F from comment #6)
> What was wrong with 2.4.53?

The issue was with the non-accounting of children processes stopped by reloads
(ungraceful restarts). 2.4.53 only accounts for processes started and stopped
during runtime maintenance (namely perform_idle_server_maintenance() which
honors Min/MaxSpareThreads and restarts children exited after
MaxRequestsPerChild), while reloads also kill processes outside this mechanism
thus outside the accounting (note that graceful restarts don't use this
mechanism so are not affected by this issue).
At some point, with bad counters, httpd thinks it can't create new children
even though none is actually running (besides the main process).

The fix consists in updating the counters for started/stopped children in a
common function called by both runtime maintenance and reloads.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org
[Bug 66004] Child processes disappears one by one with no web access. [ In reply to ]
https://bz.apache.org/bugzilla/show_bug.cgi?id=66004

--- Comment #8 from Shun F <uni.roy@gmail.com> ---
Created attachment 38245
--> https://bz.apache.org/bugzilla/attachment.cgi?id=38245&action=edit
error log with "LogLevel mpm_event:trace6" after patching

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org
[Bug 66004] Child processes disappears one by one with no web access. [ In reply to ]
https://bz.apache.org/bugzilla/show_bug.cgi?id=66004

--- Comment #9 from Shun F <uni.roy@gmail.com> ---
The patch trial completed.
I tried to reproduce twice and the phenomenon didn't reproduce.

The fix will be included in the next httpd release (2.4.54)?

/////////////////////////////error log uploaded (as
error_log_after_patching0412)
systemctl start httpd

/usr/bin/systemctl reload httpd.service
/usr/bin/systemctl reload httpd.service
/usr/bin/systemctl reload httpd.service
/usr/bin/systemctl reload httpd.service
/usr/bin/systemctl reload httpd.service

systemctl status httpd

? httpd.service - The Apache HTTP Server
Loaded: loaded (/etc/systemd/system/httpd.service; enabled; vendor preset:
disabled)
Active: active (running) since ? 2022-04-12 18:00:36 JST; 58s ago
Process: 21597 ExecStop=/opt/app/httpd/httpd-2/bin/httpd $OPTIONS -k
graceful-stop (code=exited, status=0/SUCCESS)
Process: 23190 ExecReload=/bin/kill -HUP $MAINPID (code=exited,
status=0/SUCCESS)
Process: 23188 ExecReload=/opt/app/httpd/httpd-2/bin/httpd -t (code=exited,
status=0/SUCCESS)
Process: 21713 ExecStart=/opt/app/httpd/httpd-2/bin/httpd $OPTIONS -k start
(code=exited, status=0/SUCCESS)
Main PID: 21714 (httpd)
CGroup: /system.slice/httpd.service
tq21714 /opt/app/httpd/httpd-2/bin/httpd -k start
tq23192 /opt/app/httpd/httpd-2/bin/httpd -k start
tq23193 /opt/app/httpd/httpd-2/bin/httpd -k start
mq23248 /opt/app/httpd/httpd-2/bin/httpd -k start

About an hour later... 2 child processes still exists.

systemctl status httpd

? httpd.service - The Apache HTTP Server
Loaded: loaded (/etc/systemd/system/httpd.service; enabled; vendor preset:
disabled)
Active: active (running) since ? 2022-04-12 18:00:36 JST; 1h 2min ago
Process: 21597 ExecStop=/opt/app/httpd/httpd-2/bin/httpd $OPTIONS -k
graceful-stop (code=exited, status=0/SUCCESS)
Process: 23190 ExecReload=/bin/kill -HUP $MAINPID (code=exited,
status=0/SUCCESS)
Process: 23188 ExecReload=/opt/app/httpd/httpd-2/bin/httpd -t (code=exited,
status=0/SUCCESS)
Process: 21713 ExecStart=/opt/app/httpd/httpd-2/bin/httpd $OPTIONS -k start
(code=exited, status=0/SUCCESS)
Main PID: 21714 (httpd)
CGroup: /system.slice/httpd.service
tq21714 /opt/app/httpd/httpd-2/bin/httpd -k start
tq23192 /opt/app/httpd/httpd-2/bin/httpd -k start
mq26723 /opt/app/httpd/httpd-2/bin/httpd -k start
/////////////////////////////

Thank you

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org
[Bug 66004] Child processes disappears one by one with no web access. [ In reply to ]
https://bz.apache.org/bugzilla/show_bug.cgi?id=66004

Yann Ylavic <ylavic.dev@gmail.com> changed:

What |Removed |Added
----------------------------------------------------------------------------
Keywords| |FixedInTrunk

--- Comment #10 from Yann Ylavic <ylavic.dev@gmail.com> ---
Thanks for testing, checked in r1899777.

The change will be proposed for backport to 2.4, let's how it goes with others'
reviews..

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org
[Bug 66004] Child processes disappears one by one with no web access. [ In reply to ]
https://bz.apache.org/bugzilla/show_bug.cgi?id=66004

--- Comment #11 from Shun F <uni.roy@gmail.com> ---
(In reply to Yann Ylavic from comment #7)
> (In reply to Shun F from comment #6)
> > What was wrong with 2.4.53?
>
> The issue was with the non-accounting of children processes stopped by
> reloads (ungraceful restarts). 2.4.53 only accounts for processes started
> and stopped during runtime maintenance (namely
> perform_idle_server_maintenance() which honors Min/MaxSpareThreads and
> restarts children exited after MaxRequestsPerChild), while reloads also kill
> processes outside this mechanism thus outside the accounting (note that
> graceful restarts don't use this mechanism so are not affected by this
> issue).
> At some point, with bad counters, httpd thinks it can't create new children
> even though none is actually running (besides the main process).
>
> The fix consists in updating the counters for started/stopped children in a
> common function called by both runtime maintenance and reloads.

Thank you for telling me.
I got the reason.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org
[Bug 66004] Child processes disappears one by one with no web access. [ In reply to ]
https://bz.apache.org/bugzilla/show_bug.cgi?id=66004

--- Comment #12 from Shun F <uni.roy@gmail.com> ---
(In reply to Yann Ylavic from comment #10)
> Thanks for testing, checked in r1899777.
>
> The change will be proposed for backport to 2.4, let's how it goes with
> others' reviews..

Thanks so much. I got it.
I hope your fix will be included in the next release.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org
[Bug 66004] Child processes disappears one by one with no web access. [ In reply to ]
https://bz.apache.org/bugzilla/show_bug.cgi?id=66004

--- Comment #13 from Shun F <uni.roy@gmail.com> ---
If MaxConnectionsPerChild is set to a non-zero value, The phenomenon seems to
reproduce.

I'm testing now.
I will report the results later.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org
[Bug 66004] Child processes disappears one by one with no web access. [ In reply to ]
https://bz.apache.org/bugzilla/show_bug.cgi?id=66004

--- Comment #14 from Yann Ylavic <ylavic.dev@gmail.com> ---
There is a fix (follow up) to r1899777, namely r1899812.

A full patch for 2.4 including all the changes is available there:
https://patch-diff.githubusercontent.com/raw/apache/httpd/pull/313.diff

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org
[Bug 66004] Child processes disappears one by one with no web access. [ In reply to ]
https://bz.apache.org/bugzilla/show_bug.cgi?id=66004

--- Comment #15 from Shun F <uni.roy@gmail.com> ---
(In reply to Yann Ylavic from comment #14)
> There is a fix (follow up) to r1899777, namely r1899812.
>
> A full patch for 2.4 including all the changes is available there:
> https://patch-diff.githubusercontent.com/raw/apache/httpd/pull/313.diff

Is r1899777 an incomplete fix?
I'd better test the r1899812, right?

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org
[Bug 66004] Child processes disappears one by one with no web access. [ In reply to ]
https://bz.apache.org/bugzilla/show_bug.cgi?id=66004

--- Comment #16 from Shun F <uni.roy@gmail.com> ---
(In reply to Shun F from comment #13)
> If MaxConnectionsPerChild is set to a non-zero value, The phenomenon seems
> to reproduce.
>
> I'm testing now.
> I will report the results later.


If MaxConnectionsPerChild is 0, the phenomenon doesn't reproduce under
unpatched 2.4.53.

The reproducibility conditions are:

[1]Using httpd 2.4.53 (2.4.52 is probably the same)
[2]Using event MPM
[3]Set a non-zero value to MaxConnectionsPerChild
[4]Reloaded regularly, such as daily (e.g. by /etc/logrotate.d/httpd)

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org
[Bug 66004] Child processes disappears one by one with no web access. [ In reply to ]
https://bz.apache.org/bugzilla/show_bug.cgi?id=66004

--- Comment #17 from Yann Ylavic <ylavic.dev@gmail.com> ---
> Is r1899777 an incomplete fix?
> I'd better test the r1899812, right?

Yes, r1899777 alone is not enough, you can test r1899777+r1899812 (both
patches) which is the same as 313.diff.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org
[Bug 66004] Child processes disappears one by one with no web access. [ In reply to ]
https://bz.apache.org/bugzilla/show_bug.cgi?id=66004

--- Comment #18 from Shun F <uni.roy@gmail.com> ---
(In reply to Yann Ylavic from comment #17)
> > Is r1899777 an incomplete fix?
> > I'd better test the r1899812, right?
>
> Yes, r1899777 alone is not enough, you can test r1899777+r1899812 (both
> patches) which is the same as 313.diff.

I've applied 313.diff to original 2.4.53 and the patch trial completed.
I've tried to reproduce twice and the phenomenon didn't reproduce.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org