Mailing List Archive

[Bug 65769] Child processes fail to start
https://bz.apache.org/bugzilla/show_bug.cgi?id=65769

Szymon ?oga?a <szymek.655@gmail.com> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |szymek.655@gmail.com

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org
[Bug 65769] Child processes fail to start [ In reply to ]
https://bz.apache.org/bugzilla/show_bug.cgi?id=65769

--- Comment #19 from Szymon ?oga?a <szymek.655@gmail.com> ---
Hi, is there any news about the next release?

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org
[Bug 65769] Child processes fail to start [ In reply to ]
https://bz.apache.org/bugzilla/show_bug.cgi?id=65769

--- Comment #20 from Stefan Eissing <icing@apache.org> ---
I plan to make a release candidate on Monday, March 7th. Giving that all goes
well, that should come out on Thursday, March 10th.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org
[Bug 65769] Child processes fail to start [ In reply to ]
https://bz.apache.org/bugzilla/show_bug.cgi?id=65769

NathanFoley <nfoley@tucasi.com> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |nfoley@tucasi.com

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org
[Bug 65769] Child processes fail to start [ In reply to ]
https://bz.apache.org/bugzilla/show_bug.cgi?id=65769

Luke-Jr <luke-jr+apachebugs@utopios.org> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |luke-jr+apachebugs@utopios.
| |org

--- Comment #21 from Luke-Jr <luke-jr+apachebugs@utopios.org> ---
FWIW, this has been happening to me also every few days.

Looking at the changes in r1896505, I suspect it may not fix it, because if I
attach during the problem condition:

(gdb) print listensocks_disabled
$2 = 0

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org
[Bug 65769] Child processes fail to start [ In reply to ]
https://bz.apache.org/bugzilla/show_bug.cgi?id=65769

--- Comment #22 from Yann Ylavic <ylavic.dev@gmail.com> ---
(In reply to Luke-Jr from comment #21)
>
> Looking at the changes in r1896505, I suspect it may not fix it, because if
> I attach during the problem condition:
>
> (gdb) print listensocks_disabled
> $2 = 0

What does:
(gdb) thread apply all bt full
say?

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org
[Bug 65769] Child processes fail to start [ In reply to ]
https://bz.apache.org/bugzilla/show_bug.cgi?id=65769

Yann Ylavic <ylavic.dev@gmail.com> changed:

What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution|--- |FIXED

--- Comment #23 from Yann Ylavic <ylavic.dev@gmail.com> ---
Backported to 2.4.x (r1897149), will be in the next release.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org
[Bug 65769] Child processes fail to start [ In reply to ]
https://bz.apache.org/bugzilla/show_bug.cgi?id=65769

--- Comment #24 from Luke-Jr <luke-jr+apachebugs@utopios.org> ---
(In reply to Yann Ylavic from comment #22)
> (In reply to Luke-Jr from comment #21)
> >
> > Looking at the changes in r1896505, I suspect it may not fix it, because if
> > I attach during the problem condition:
> >
> > (gdb) print listensocks_disabled
> > $2 = 0
>
> What does:
> (gdb) thread apply all bt full
> say?

My website was down, so I had to restart it. But from memory, strace showed it
in a poll loop with none of the listening sockets (identified using lsof). A
simple backtrace was inside an apr function, possibly used for the polling.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org
[Bug 65769] Child processes fail to start [ In reply to ]
https://bz.apache.org/bugzilla/show_bug.cgi?id=65769

--- Comment #25 from Teodor Milkov <zimage@icdsoft.com> ---
Anybody still having this issue?

Yesterday I've upgraded from 2.4.51 to 2.4.53 and started experiencing similar
symptoms. Once in a while httpd just ceases to accept new connections. On a
fleet of about 1000 web servers it occurred about 8 times in 24 hours, so quite
rare and I don't know how to reproduce it. I'll try to strace it next time.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org
[Bug 65769] Child processes fail to start [ In reply to ]
https://bz.apache.org/bugzilla/show_bug.cgi?id=65769

--- Comment #26 from Yann Ylavic <ylavic.dev@gmail.com> ---
If it reproduces could you please:

$ gdb /path/to/httpd -p <pid>
(gdb) set logging on
(gdb) set logging file /tmp/httpd-backtrace.log
(gdb) thread apply all bt full

and attach "httpd-backtrace.log" here?
(<pid> is the process id of the httpd child in this non-responsive state)

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org
[Bug 65769] Child processes fail to start [ In reply to ]
https://bz.apache.org/bugzilla/show_bug.cgi?id=65769

--- Comment #27 from Teodor Milkov <zimage@icdsoft.com> ---
(In reply to Yann Ylavic from comment #26)
> If it reproduces could you please:
> ...

We were in a hurry to get something working and so we have reverted to using
2.4.51 + mod_h2 2.0.2. We may try again 2.4.53 on a limited set of servers soon
and be ready to collect the debug information.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org
[Bug 65769] Child processes fail to start [ In reply to ]
https://bz.apache.org/bugzilla/show_bug.cgi?id=65769

--- Comment #28 from Ruediger Pluem <rpluem@apache.org> ---
(In reply to Yann Ylavic from comment #26)
> If it reproduces could you please:
>
> $ gdb /path/to/httpd -p <pid>
> (gdb) set logging on
> (gdb) set logging file /tmp/httpd-backtrace.log
> (gdb) thread apply all bt full
>
> and attach "httpd-backtrace.log" here?
> (<pid> is the process id of the httpd child in this non-responsive state)

As it seems hard to reproduce I guess it would make sense to "save" such a
process via

gcore <pid>

in a core dump first. This would enable us to ask for further debugging steps
later if the analysis of the stacktraces creates demand for it.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org
[Bug 65769] Child processes fail to start [ In reply to ]
https://bz.apache.org/bugzilla/show_bug.cgi?id=65769

--- Comment #29 from Teodor Milkov <zimage@icdsoft.com> ---
I finally managed to run a debug version and save a dump with gcore.

Here's how process list looks usually on this server:

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 50551 0.0 0.9 119520 57908 ? Ss Mar30 0:23
/apache/bin/httpd -k start
apache 55583 0.0 0.4 118344 26592 ? S 12:15 0:00 \_
/apache/bin/httpd -k start
apache 55584 0.0 0.4 119520 26948 ? S 12:15 0:00 \_
/apache/bin/httpd -k start
apache 55585 5.2 0.8 9374960 49764 ? Sl 12:15 0:00 \_
/apache/bin/httpd -k start
apache 56594 1.3 0.6 9331520 42000 ? Sl 12:15 0:00 \_
/apache/bin/httpd -k start

And here's how it looked when stuck:

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 13105 0.0 0.9 119384 57524 ? Ss 03:08 0:02
/apache/bin/httpd -k start
apache 28298 0.0 0.4 118216 27392 ? S 06:00 0:00 \_
/apache/bin/httpd -k start
apache 28299 0.0 0.4 119376 26624 ? S 06:00 0:00 \_
/apache/bin/httpd -k start

So, there are: 1. the master (root) process, 2. two "thin" processes which
probably take care of the cgi (we use suexec) and 3. two "fat" processes which
are the mpm_event workers.

Here's bt of the root process. Please let me know if I should do something
else:

gdb ./httpd-2.4.53-debug core.13105

(gdb) bt full
#0 0x000072a77d0b7a27 in __GI___select (nfds=0, readfds=0x0, writefds=0x0,
exceptfds=0x0, timeout=0x7b4d48201f70) at
../sysdeps/unix/sysv/linux/select.c:41
resultvar = 18446744073709551102
sc_ret = <optimized out>
#1 0x000072a77d1d3535 in apr_sleep () from
/usr/lib/x86_64-linux-gnu/libapr-1.so.0
No symbol table info available.
#2 0x00001e99ee736b41 in ap_wait_or_timeout (status=0x7b4d48202024,
exitcode=0x7b4d48202020, ret=0x7b4d48202000, p=0x72a77d8c4028,
s=0x72a77cd6cb48) at mpm_common.c:205
rv = 70006
#3 0x00001e99ee81d8d9 in server_main_loop (remaining_children_to_start=0) at
event.c:3026
num_buckets = 1
child_slot = 0
exitwhy = APR_PROC_SIGNAL
status = 11
processed_status = 0
pid = {pid = -1, in = 0x1e99ee769e1e <ap_log_command_line+327>, out =
0x1e99ee82b6dd, err = 0x72a778d487e8}
i = 1
#4 0x00001e99ee81df29 in event_run (_pconf=0x72a77d8c4028,
plog=0x72a77cd5f028, s=0x72a77cd6cb48) at event.c:3204
num_buckets = 1
remaining_children_to_start = 0
i = 0
#5 0x00001e99ee735f4d in ap_run_mpm (pconf=0x72a77d8c4028,
plog=0x72a77cd5f028, s=0x72a77cd6cb48) at mpm_common.c:95
pHook = 0x72a77a9082d0
n = 0
rv = -1
#6 0x00001e99ee72c0cd in main (argc=3, argv=0x7b4d48202308) at main.c:841
c = 0 '\000'
showcompile = 0
showdirectives = 0
confname = 0x1e99ee822d43 "conf/httpd.conf"
def_server_root = 0x1e99ee822d53 "/apache"
temp_error_log = 0x0
error = 0x0
process = 0x72a77d8c6118
pconf = 0x72a77d8c4028
plog = 0x72a77cd5f028
ptemp = 0x72a77cd57028
pcommands = 0x72a77cd79028
opt = 0x72a77cd79118
rv = 0
mod = 0x1e99ee86d9e8 <ap_prelinked_modules+328>
opt_arg = 0x7b4d48202170 "\250\002\215}\247r"
signal_server = 0x1e99ee77446e <ap_signal_server>
rc = 0

Looking at this backtrace I noticed it mentions signal 11 and looking in the
error_log there's indeed this:

[Wed Mar 30 06:04:37 2022] [notice] [pid 13105] mpm_unix.c(436): AH00052: child
pid 28302 exit signal Segmentation fault (11)

Now looking across error logs of all our servers it's not unusual to see this
segfault on a daily basis. Apparently, 2.4.51 handles the child crashing
gracefully, while 2.4.53 gets stuck.

It's another question why the segfault and I'm going to investigate this one
too, but IMO it'd be great if we can get back to the graceful behaviour from
pre 2.4.53.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org
[Bug 65769] Child processes fail to start [ In reply to ]
https://bz.apache.org/bugzilla/show_bug.cgi?id=65769

--- Comment #30 from Luke-Jr <luke-jr+apachebugs@utopios.org> ---
Can confirm my error_log also has a segfault at least a few times most days.

Should we reopen this bug, or create a new one?

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org
[Bug 65769] Child processes fail to start [ In reply to ]
https://bz.apache.org/bugzilla/show_bug.cgi?id=65769

--- Comment #31 from Ruediger Pluem <rpluem@apache.org> ---
I think we need:

1. thread apply all bt full from the main process.
2. thread apply all bt full from one of the "fat" processes.
3. thread apply all bt full from one of the crashed proceses.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org
[Bug 65769] Child processes fail to start [ In reply to ]
https://bz.apache.org/bugzilla/show_bug.cgi?id=65769

--- Comment #32 from Teodor Milkov <zimage@icdsoft.com> ---
1. I believe the main process is not multi-threaded. Just in case I've run
"thread apply all bt full" on the core file I have from the master process and
it looks the same as "bt full".

2/3: The crashed process is actually the fat process. I don't have core file,
because core files are disabled on my systems for security reasons. I'll see if
I can enable core files and collect one, but this may take time...

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org
[Bug 65769] Child processes fail to start [ In reply to ]
https://bz.apache.org/bugzilla/show_bug.cgi?id=65769

Romain Lapoux <manus@manusfreedom.com> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |manus@manusfreedom.com

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org
[Bug 65769] Child processes fail to start [ In reply to ]
https://bz.apache.org/bugzilla/show_bug.cgi?id=65769

--- Comment #33 from Teodor Milkov <zimage@icdsoft.com> ---
Crashing child turned out to be caused by bad mod_security rule (at least in
one of the cases). It looks the crash is unrelated, but newer apache is not
recovering from the situation.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org
[Bug 65769] Child processes fail to start [ In reply to ]
https://bz.apache.org/bugzilla/show_bug.cgi?id=65769

--- Comment #34 from Ruediger Pluem <rpluem@apache.org> ---
(In reply to Teodor Milkov from comment #33)
> Crashing child turned out to be caused by bad mod_security rule (at least in
> one of the cases). It looks the crash is unrelated, but newer apache is not
> recovering from the situation.

How do these crashes happen? Directly after the child starts or only when a
certain request was processed? We might face a situation here where one child
after another crashed due to this. Do you have an error log? These should
mention the crashed processeses.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org
[Bug 65769] Child processes fail to start [ In reply to ]
https://bz.apache.org/bugzilla/show_bug.cgi?id=65769

--- Comment #35 from Teodor Milkov <zimage@icdsoft.com> ---
(In reply to Ruediger Pluem from comment #34)
> (In reply to Teodor Milkov from comment #33)
> > Crashing child turned out to be caused by bad mod_security rule (at least in
> > one of the cases). It looks the crash is unrelated, but newer apache is not
> > recovering from the situation.
>
> How do these crashes happen? Directly after the child starts or only when a
> certain request was processed? We might face a situation here where one
> child after another crashed due to this. Do you have an error log? These
> should mention the crashed processeses.

Child crashes only when a certain request is maid. Crashing processes are
indeed mentioned in the error_log -- sometimes these are spread out, and at
other times in quick succession or same second:

[Fri Apr 01 18:46:42 2022] [notice] [pid 22993] mpm_unix.c(436): AH00052: child
pid 13864 exit signal Segmentation fault (11)
[Fri Apr 01 20:39:15 2022] [notice] [pid 22993] mpm_unix.c(436): AH00052: child
pid 58297 exit signal Segmentation fault (11)
[Fri Apr 01 20:39:15 2022] [notice] [pid 22993] mpm_unix.c(436): AH00052: child
pid 58299 exit signal Segmentation fault (11)
[Fri Apr 01 20:39:18 2022] [notice] [pid 22993] mpm_unix.c(436): AH00052: child
pid 35244 exit signal Segmentation fault (11)
[Fri Apr 01 20:45:01 2022] [notice] [pid 22993] mpm_unix.c(436): AH00052: child
pid 37311 exit signal Segmentation fault (11)
[Fri Apr 01 21:21:00 2022] [notice] [pid 22993] mpm_unix.c(436): AH00052: child
pid 29877 exit signal Segmentation fault (11)
[Fri Apr 01 23:58:43 2022] [notice] [pid 22993] mpm_unix.c(436): AH00052: child
pid 12423 exit signal Segmentation fault (11)
[Fri Apr 01 23:58:44 2022] [notice] [pid 22993] mpm_unix.c(436): AH00052: child
pid 12421 exit signal Segmentation fault (11)
[Sat Apr 02 00:13:14 2022] [notice] [pid 22993] mpm_unix.c(436): AH00052: child
pid 29774 exit signal Segmentation fault (11)
[Sat Apr 02 02:33:56 2022] [notice] [pid 22993] mpm_unix.c(436): AH00052: child
pid 30997 exit signal Segmentation fault (11)

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org
[Bug 65769] Child processes fail to start [ In reply to ]
https://bz.apache.org/bugzilla/show_bug.cgi?id=65769

--- Comment #36 from Ruediger Pluem <rpluem@apache.org> ---
Thanks. There could be some delay (1 or 2 secs) in restarting crashed
processes, but it should not remain without processes "forever". As far as I
understand the crashed processes never get replaced and you end up with just
the parent process after some time?

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org
[Bug 65769] Child processes fail to start [ In reply to ]
https://bz.apache.org/bugzilla/show_bug.cgi?id=65769

--- Comment #37 from Teodor Milkov <zimage@icdsoft.com> ---
Exactly, yes, that's what we are seeing -- just the parent process and no
workers for (at least) several minutes until the alarms ring and web server is
manually restarted.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org
[Bug 65769] Child processes fail to start [ In reply to ]
https://bz.apache.org/bugzilla/show_bug.cgi?id=65769

--- Comment #38 from Ruediger Pluem <rpluem@apache.org> ---
Please use the core dump you captured of such a process and issue the following
gdb commands:

print *retained
print *(retained->idle_spawn_rate)
print *(retained->mpm)

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org
[Bug 65769] Child processes fail to start [ In reply to ]
https://bz.apache.org/bugzilla/show_bug.cgi?id=65769

--- Comment #39 from Ruediger Pluem <rpluem@apache.org> ---
Further gdb commands:

set $i=0
while ($i<server_limit)
print ap_scoreboard_image->parent[$i++]
end

set $i=0
while($i<server_limit)
set $j=0
while($j<threads_per_child)
print ap_scoreboard_image->servers[$i][$j++]
end
set $i=$i+1
end

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org