Mailing List Archive

Broken: apache/httpd#1831 (trunk - 243c5fa)
Build Update for apache/httpd
-------------------------------------

Build: #1831
Status: Broken

Duration: 21 mins and 33 secs
Commit: 243c5fa (trunk)
Author: Yann Ylavic
Message: mpm_{event,worker,prefork}: late stop of children processes on restart.

Change how the main process handles restarts, from:
0. <restart signal>
1. stop old generation of children processes (graceful or not)
2. reload new configuration
3. start new generation of children processes
to:
0. <restart signal>
1. reload new configuration
2. stop old generation of children processes (graceful or not)
3. start new generation of children processes

The delay between stop and start is now very short and does not depend on the
reload time (which can be quite long with many vhosts and/or complex setups
with regexps or whatever third party components to compile).

Also, while reloading, the old generation of children processes keeps accepting
and handling incoming connections until the new generation is up to take over.

* os/unix/unixd.c (sig_term, sig_restart):
Set AP_MPMQ_STOPPING only once.

* server/listen.c (ap_duplicate_listeners):
Use ap_log_error() the main server instead of ap_log_perror().

* server/mpm/{event,worker,prefork}/{event,worker,prefork}.c
({event,worker,prefork}_retained_data):
Save the generation pool pointer (gen_pool) and all the buckets here, they
won't be cleared before the reload like pconf so they need a persitent
storage accross restarts (i.e. retained->gen_pool).

* server/mpm/{event,worker,prefork}/{event,worker,prefork}.c
(perform_idle_server_maintenance, child_main, make_child):
Change usage of all_buckets (previously with global/static scope) to the new
retained->buckets array.



git-svn-id: https://svn.apache.org/repos/asf/httpd/httpd/trunk@1892587 13f79535-47bb-0310-9956-ffa450edef68

View the changeset: https://github.com/apache/httpd/compare/9acfea84831e...243c5fad0a8d

View the full build log and details: https://app.travis-ci.com/github/apache/httpd/builds/236134359?utm_medium=notification&utm_source=email


--

You can unsubscribe from build emails from the apache/httpd repository going to https://app.travis-ci.com/account/preferences/unsubscribe?repository=16806660&utm_medium=notification&utm_source=email.
Or unsubscribe from *all* email updating your settings at https://app.travis-ci.com/account/preferences/unsubscribe?utm_medium=notification&utm_source=email.
Or configure specific recipients for build notifications in your .travis.yml file. See https://docs.travis-ci.com/user/notifications.
Re: Broken: apache/httpd#1831 (trunk - 243c5fa) [ In reply to ]
On Wed, Aug 25, 2021 at 1:14 AM Travis CI <builds@travis-ci.com> wrote:
>
> apache / httpd
>
> trunk
>
> Build #1831 was broken
> 21 mins and 33 secs
> Yann Ylavic
> 243c5fa CHANGESET ?
>
> mpm_{event,worker,prefork}: late stop of children processes on restart.

(unrelated to this change, Build #1819 failed the same earlier).

Here: https://app.travis-ci.com/github/apache/httpd/jobs/533578536#L4103

It seems that when exiting a stream can be destroyed while in
h2_mplx_s_task_done::s_task_done()::mst_check_data_for()
when the mplx lock is released (thus stream->id faults).

Should the caller(s) of mst_check_data_for() pass stream->id (under
the lock) instead?
This would avoid the fault but that's still a potentially destroyed
stream being fifo'ed, though we are exiting so it might not matter..

A better fix maybe, Stefan?

Cheers;
Yann.
Re: Broken: apache/httpd#1831 (trunk - 243c5fa) [ In reply to ]
> Am 25.08.2021 um 13:53 schrieb Yann Ylavic <ylavic.dev@gmail.com>:
>
> On Wed, Aug 25, 2021 at 1:14 AM Travis CI <builds@travis-ci.com> wrote:
>>
>> apache / httpd
>>
>> trunk
>>
>> Build #1831 was broken
>> 21 mins and 33 secs
>> Yann Ylavic
>> 243c5fa CHANGESET ?
>>
>> mpm_{event,worker,prefork}: late stop of children processes on restart.
>
> (unrelated to this change, Build #1819 failed the same earlier).
>
> Here: https://app.travis-ci.com/github/apache/httpd/jobs/533578536#L4103
>
> It seems that when exiting a stream can be destroyed while in
> h2_mplx_s_task_done::s_task_done()::mst_check_data_for()
> when the mplx lock is released (thus stream->id faults).
>
> Should the caller(s) of mst_check_data_for() pass stream->id (under
> the lock) instead?
> This would avoid the fault but that's still a potentially destroyed
> stream being fifo'ed, though we are exiting so it might not matter..
>
> A better fix maybe, Stefan?

I think the "stream->id" needs to be save before giving up the lock.
I'll make the change, but currently my test suite throws several
other errors in trunk. Something changed...

Btw. I am working on a real fix of this messy code in http2, using
a apr_pollset for monitoring connections and ongoing requests. Also
cleaning up other parts of the code, reducing complexity.

<https://github.com/icing/mod_h2/tree/icing/pipes>

I'll see what version of trunk still runs my tests and report back.

- Stefan

>
> Cheers;
> Yann.
Re: Broken: apache/httpd#1831 (trunk - 243c5fa) [ In reply to ]
> Am 25.08.2021 um 15:06 schrieb stefan@eissing.org:
>
>
>
>> Am 25.08.2021 um 13:53 schrieb Yann Ylavic <ylavic.dev@gmail.com>:
>>
>> On Wed, Aug 25, 2021 at 1:14 AM Travis CI <builds@travis-ci.com> wrote:
>>>
>>> apache / httpd
>>>
>>> trunk
>>>
>>> Build #1831 was broken
>>> 21 mins and 33 secs
>>> Yann Ylavic
>>> 243c5fa CHANGESET ?
>>>
>>> mpm_{event,worker,prefork}: late stop of children processes on restart.
>>
>> (unrelated to this change, Build #1819 failed the same earlier).
>>
>> Here: https://app.travis-ci.com/github/apache/httpd/jobs/533578536#L4103
>>
>> It seems that when exiting a stream can be destroyed while in
>> h2_mplx_s_task_done::s_task_done()::mst_check_data_for()
>> when the mplx lock is released (thus stream->id faults).
>>
>> Should the caller(s) of mst_check_data_for() pass stream->id (under
>> the lock) instead?
>> This would avoid the fault but that's still a potentially destroyed
>> stream being fifo'ed, though we are exiting so it might not matter..
>>
>> A better fix maybe, Stefan?
>
> I think the "stream->id" needs to be save before giving up the lock.
> I'll make the change, but currently my test suite throws several
> other errors in trunk. Something changed...
>
> Btw. I am working on a real fix of this messy code in http2, using
> a apr_pollset for monitoring connections and ongoing requests. Also
> cleaning up other parts of the code, reducing complexity.
>
> <https://github.com/icing/mod_h2/tree/icing/pipes>
>
> I'll see what version of trunk still runs my tests and report back.


Ok, r1892586 still runs the http2 tests nicely. Seems that r1892587
broke things. Note that the http2 test suite does a lot of server
reloads.

Stefan

>
> - Stefan
>
>>
>> Cheers;
>> Yann.
Re: Broken: apache/httpd#1831 (trunk - 243c5fa) [ In reply to ]
> Am 25.08.2021 um 15:15 schrieb stefan@eissing.org:
>
>
>
>> Am 25.08.2021 um 15:06 schrieb stefan@eissing.org:
>>
>>
>>
>>> Am 25.08.2021 um 13:53 schrieb Yann Ylavic <ylavic.dev@gmail.com>:
>>>
>>> On Wed, Aug 25, 2021 at 1:14 AM Travis CI <builds@travis-ci.com> wrote:
>>>>
>>>> apache / httpd
>>>>
>>>> trunk
>>>>
>>>> Build #1831 was broken
>>>> 21 mins and 33 secs
>>>> Yann Ylavic
>>>> 243c5fa CHANGESET ?
>>>>
>>>> mpm_{event,worker,prefork}: late stop of children processes on restart.
>>>
>>> (unrelated to this change, Build #1819 failed the same earlier).
>>>
>>> Here: https://app.travis-ci.com/github/apache/httpd/jobs/533578536#L4103
>>>
>>> It seems that when exiting a stream can be destroyed while in
>>> h2_mplx_s_task_done::s_task_done()::mst_check_data_for()
>>> when the mplx lock is released (thus stream->id faults).
>>>
>>> Should the caller(s) of mst_check_data_for() pass stream->id (under
>>> the lock) instead?
>>> This would avoid the fault but that's still a potentially destroyed
>>> stream being fifo'ed, though we are exiting so it might not matter..
>>>
>>> A better fix maybe, Stefan?
>>
>> I think the "stream->id" needs to be save before giving up the lock.
>> I'll make the change, but currently my test suite throws several
>> other errors in trunk. Something changed...
>>
>> Btw. I am working on a real fix of this messy code in http2, using
>> a apr_pollset for monitoring connections and ongoing requests. Also
>> cleaning up other parts of the code, reducing complexity.
>>
>> <https://github.com/icing/mod_h2/tree/icing/pipes>
>>
>> I'll see what version of trunk still runs my tests and report back.
>
>
> Ok, r1892586 still runs the http2 tests nicely. Seems that r1892587
> broke things. Note that the http2 test suite does a lot of server
> reloads.

Follow up: if I do a "time.sleep(2)" after each "apachectl" in my
test suite, it works fine.

Speculating: my test suite does commonly a GET after a reload to
see if the reloaded server is responding. And then fires requests
against the new configuration.

Now, if the server is always responding, the test sill get - sometimes -
send request to an old gen child which then answers incorrectly. So
I see 404 responses where a 200 should be as the config does not match.

In this new world of always responding servers, when is a test suite
supposed to know that now the server will respond with the newly laoded
config?

- Stefan

>
> Stefan
>
>>
>> - Stefan
>>
>>>
>>> Cheers;
>>> Yann.
Re: Broken: apache/httpd#1831 (trunk - 243c5fa) [ In reply to ]
On Wed, Aug 25, 2021 at 3:37 PM stefan@eissing.org <stefan@eissing.org> wrote:
>
> Follow up: if I do a "time.sleep(2)" after each "apachectl" in my
> test suite, it works fine.
>
> Speculating: my test suite does commonly a GET after a reload to
> see if the reloaded server is responding. And then fires requests
> against the new configuration.
>
> Now, if the server is always responding, the test sill get - sometimes -
> send request to an old gen child which then answers incorrectly. So
> I see 404 responses where a 200 should be as the config does not match.
>
> In this new world of always responding servers, when is a test suite
> supposed to know that now the server will respond with the newly laoded
> config?

Maybe we could have access to the generation number
(AP_MPMQ_GENERATION) in a variable like "MPM_GENERATION" which
mod_headers could use to set in a response header?
Or a test only module which does that specifically could be compiled
by the framework and LoadModule'd?

Cheers;
Yann.
Re: Broken: apache/httpd#1831 (trunk - 243c5fa) [ In reply to ]
> Am 25.08.2021 um 15:54 schrieb Yann Ylavic <ylavic.dev@gmail.com>:
>
> On Wed, Aug 25, 2021 at 3:37 PM stefan@eissing.org <stefan@eissing.org> wrote:
>>
>> Follow up: if I do a "time.sleep(2)" after each "apachectl" in my
>> test suite, it works fine.
>>
>> Speculating: my test suite does commonly a GET after a reload to
>> see if the reloaded server is responding. And then fires requests
>> against the new configuration.
>>
>> Now, if the server is always responding, the test sill get - sometimes -
>> send request to an old gen child which then answers incorrectly. So
>> I see 404 responses where a 200 should be as the config does not match.
>>
>> In this new world of always responding servers, when is a test suite
>> supposed to know that now the server will respond with the newly laoded
>> config?
>
> Maybe we could have access to the generation number
> (AP_MPMQ_GENERATION) in a variable like "MPM_GENERATION" which
> mod_headers could use to set in a response header?
> Or a test only module which does that specifically could be compiled
> by the framework and LoadModule'd?

I could add that in the test config as header to a certain location.

However, will all new connections get to new gen children once the
new number has been spotted?

>
> Cheers;
> Yann.
Re: Broken: apache/httpd#1831 (trunk - 243c5fa) [ In reply to ]
On Wed, Aug 25, 2021 at 3:56 PM stefan@eissing.org <stefan@eissing.org> wrote:
>
>
>
> > Am 25.08.2021 um 15:54 schrieb Yann Ylavic <ylavic.dev@gmail.com>:
> >
> > On Wed, Aug 25, 2021 at 3:37 PM stefan@eissing.org <stefan@eissing.org> wrote:
> >>
> >> Follow up: if I do a "time.sleep(2)" after each "apachectl" in my
> >> test suite, it works fine.
> >>
> >> Speculating: my test suite does commonly a GET after a reload to
> >> see if the reloaded server is responding. And then fires requests
> >> against the new configuration.
> >>
> >> Now, if the server is always responding, the test sill get - sometimes -
> >> send request to an old gen child which then answers incorrectly. So
> >> I see 404 responses where a 200 should be as the config does not match.
> >>
> >> In this new world of always responding servers, when is a test suite
> >> supposed to know that now the server will respond with the newly laoded
> >> config?
> >
> > Maybe we could have access to the generation number
> > (AP_MPMQ_GENERATION) in a variable like "MPM_GENERATION" which
> > mod_headers could use to set in a response header?
> > Or a test only module which does that specifically could be compiled
> > by the framework and LoadModule'd?
>
> I could add that in the test config as header to a certain location.
>
> However, will all new connections get to new gen children once the
> new number has been spotted?

Yes, that's guaranteed by the MPM which increases the gen number just
before creating new children (after having asked the old ones to
stop), no old child will ever be created with a new generation number.
ap_mpm_query(AP_MPMQ_GENERATION, &generation) within a child will
always return the one that was forked.
Re: Broken: apache/httpd#1831 (trunk - 243c5fa) [ In reply to ]
> Am 25.08.2021 um 16:04 schrieb Yann Ylavic <ylavic.dev@gmail.com>:
>
> On Wed, Aug 25, 2021 at 3:56 PM stefan@eissing.org <stefan@eissing.org> wrote:
>>
>>
>>
>>> Am 25.08.2021 um 15:54 schrieb Yann Ylavic <ylavic.dev@gmail.com>:
>>>
>>> On Wed, Aug 25, 2021 at 3:37 PM stefan@eissing.org <stefan@eissing.org> wrote:
>>>>
>>>> Follow up: if I do a "time.sleep(2)" after each "apachectl" in my
>>>> test suite, it works fine.
>>>>
>>>> Speculating: my test suite does commonly a GET after a reload to
>>>> see if the reloaded server is responding. And then fires requests
>>>> against the new configuration.
>>>>
>>>> Now, if the server is always responding, the test sill get - sometimes -
>>>> send request to an old gen child which then answers incorrectly. So
>>>> I see 404 responses where a 200 should be as the config does not match.
>>>>
>>>> In this new world of always responding servers, when is a test suite
>>>> supposed to know that now the server will respond with the newly laoded
>>>> config?
>>>
>>> Maybe we could have access to the generation number
>>> (AP_MPMQ_GENERATION) in a variable like "MPM_GENERATION" which
>>> mod_headers could use to set in a response header?
>>> Or a test only module which does that specifically could be compiled
>>> by the framework and LoadModule'd?
>>
>> I could add that in the test config as header to a certain location.
>>
>> However, will all new connections get to new gen children once the
>> new number has been spotted?
>
> Yes, that's guaranteed by the MPM which increases the gen number just
> before creating new children (after having asked the old ones to
> stop), no old child will ever be created with a new generation number.
> ap_mpm_query(AP_MPMQ_GENERATION, &generation) within a child will
> always return the one that was forked.

Thanks, I'll give this a shot.

Joe: as long as that is not solved, the tests will not be reliable, I'm afraid.

- Stefan
Re: Broken: apache/httpd#1831 (trunk - 243c5fa) [ In reply to ]
On Wed, Aug 25, 2021 at 4:04 PM Yann Ylavic <ylavic.dev@gmail.com> wrote:
>
> On Wed, Aug 25, 2021 at 3:56 PM stefan@eissing.org <stefan@eissing.org> wrote:
> >
> > I could add that in the test config as header to a certain location.
> >
> > However, will all new connections get to new gen children once the
> > new number has been spotted?
>
> Yes, that's guaranteed by the MPM which increases the gen number just
> before creating new children (after having asked the old ones to
> stop), no old child will ever be created with a new generation number.
> ap_mpm_query(AP_MPMQ_GENERATION, &generation) within a child will
> always return the one that was forked.

Hmm, you asked for a different question actually, the new
*connections* done after the gen bump are not guaranteed to be handled
by a new gen child, because it may take some (short) time before the
old gen listeners stop listening.
There is no way to guarantee where connections will go in the first
times of a restart I'm afraid.
At least your test client can know whether it's handled by an old or
new gen and drop the connection in the former case..
Re: Broken: apache/httpd#1831 (trunk - 243c5fa) [ In reply to ]
> Am 25.08.2021 um 16:16 schrieb Yann Ylavic <ylavic.dev@gmail.com>:
>
> On Wed, Aug 25, 2021 at 4:04 PM Yann Ylavic <ylavic.dev@gmail.com> wrote:
>>
>> On Wed, Aug 25, 2021 at 3:56 PM stefan@eissing.org <stefan@eissing.org> wrote:
>>>
>>> I could add that in the test config as header to a certain location.
>>>
>>> However, will all new connections get to new gen children once the
>>> new number has been spotted?
>>
>> Yes, that's guaranteed by the MPM which increases the gen number just
>> before creating new children (after having asked the old ones to
>> stop), no old child will ever be created with a new generation number.
>> ap_mpm_query(AP_MPMQ_GENERATION, &generation) within a child will
>> always return the one that was forked.
>
> Hmm, you asked for a different question actually, the new
> *connections* done after the gen bump are not guaranteed to be handled
> by a new gen child, because it may take some (short) time before the
> old gen listeners stop listening.
> There is no way to guarantee where connections will go in the first
> times of a restart I'm afraid.
> At least your test client can know whether it's handled by an old or
> new gen and drop the connection in the former case..

Thanks for confirming what I suspected. Concurrency is a bitch.

Hmm, hate to add timed waits here, as who knows how Travis will
work with those. Will experiment a bit.
Re: Broken: apache/httpd#1831 (trunk - 243c5fa) [ In reply to ]
Ok, brute-forced it a bit. The test suite now always does
a stop/start cycle on config changes as reload now longer
works reliably. Takes a bit more time, but not overly so
on my machine.

Cheers,
Stefan

> Am 25.08.2021 um 16:22 schrieb stefan@eissing.org:
>
>
>
>> Am 25.08.2021 um 16:16 schrieb Yann Ylavic <ylavic.dev@gmail.com>:
>>
>> On Wed, Aug 25, 2021 at 4:04 PM Yann Ylavic <ylavic.dev@gmail.com> wrote:
>>>
>>> On Wed, Aug 25, 2021 at 3:56 PM stefan@eissing.org <stefan@eissing.org> wrote:
>>>>
>>>> I could add that in the test config as header to a certain location.
>>>>
>>>> However, will all new connections get to new gen children once the
>>>> new number has been spotted?
>>>
>>> Yes, that's guaranteed by the MPM which increases the gen number just
>>> before creating new children (after having asked the old ones to
>>> stop), no old child will ever be created with a new generation number.
>>> ap_mpm_query(AP_MPMQ_GENERATION, &generation) within a child will
>>> always return the one that was forked.
>>
>> Hmm, you asked for a different question actually, the new
>> *connections* done after the gen bump are not guaranteed to be handled
>> by a new gen child, because it may take some (short) time before the
>> old gen listeners stop listening.
>> There is no way to guarantee where connections will go in the first
>> times of a restart I'm afraid.
>> At least your test client can know whether it's handled by an old or
>> new gen and drop the connection in the former case..
>
> Thanks for confirming what I suspected. Concurrency is a bitch.
>
> Hmm, hate to add timed waits here, as who knows how Travis will
> work with those. Will experiment a bit.
Re: Broken: apache/httpd#1831 (trunk - 243c5fa) [ In reply to ]
It should be fixed now with r1892599, I believe.

> Am 25.08.2021 um 13:53 schrieb Yann Ylavic <ylavic.dev@gmail.com>:
>
> On Wed, Aug 25, 2021 at 1:14 AM Travis CI <builds@travis-ci.com> wrote:
>>
>> apache / httpd
>>
>> trunk
>>
>> Build #1831 was broken
>> 21 mins and 33 secs
>> Yann Ylavic
>> 243c5fa CHANGESET ?
>>
>> mpm_{event,worker,prefork}: late stop of children processes on restart.
>
> (unrelated to this change, Build #1819 failed the same earlier).
>
> Here: https://app.travis-ci.com/github/apache/httpd/jobs/533578536#L4103
>
> It seems that when exiting a stream can be destroyed while in
> h2_mplx_s_task_done::s_task_done()::mst_check_data_for()
> when the mplx lock is released (thus stream->id faults).
>
> Should the caller(s) of mst_check_data_for() pass stream->id (under
> the lock) instead?
> This would avoid the fault but that's still a potentially destroyed
> stream being fifo'ed, though we are exiting so it might not matter..
>
> A better fix maybe, Stefan?
>
> Cheers;
> Yann.
Re: Broken: apache/httpd#1831 (trunk - 243c5fa) [ In reply to ]
Thanks, looks good, it probably needs a backport to 2.4.x since the
first travis failure was there (Build #1819).
Let's see how it works after several runs before maybe.

On Wed, Aug 25, 2021 at 5:02 PM stefan@eissing.org <stefan@eissing.org> wrote:
>
> It should be fixed now with r1892599, I believe.
>
> > Am 25.08.2021 um 13:53 schrieb Yann Ylavic <ylavic.dev@gmail.com>:
> >
> > On Wed, Aug 25, 2021 at 1:14 AM Travis CI <builds@travis-ci.com> wrote:
> >>
> >> apache / httpd
> >>
> >> trunk
> >>
> >> Build #1831 was broken
> >> 21 mins and 33 secs
> >> Yann Ylavic
> >> 243c5fa CHANGESET ?
> >>
> >> mpm_{event,worker,prefork}: late stop of children processes on restart.
> >
> > (unrelated to this change, Build #1819 failed the same earlier).
> >
> > Here: https://app.travis-ci.com/github/apache/httpd/jobs/533578536#L4103
> >
> > It seems that when exiting a stream can be destroyed while in
> > h2_mplx_s_task_done::s_task_done()::mst_check_data_for()
> > when the mplx lock is released (thus stream->id faults).
> >
> > Should the caller(s) of mst_check_data_for() pass stream->id (under
> > the lock) instead?
> > This would avoid the fault but that's still a potentially destroyed
> > stream being fifo'ed, though we are exiting so it might not matter..
> >
> > A better fix maybe, Stefan?
> >
> > Cheers;
> > Yann.
>
Re: Broken: apache/httpd#1831 (trunk - 243c5fa) [ In reply to ]
> Am 25.08.2021 um 17:06 schrieb Yann Ylavic <ylavic.dev@gmail.com>:
>
> Thanks, looks good, it probably needs a backport to 2.4.x since the
> first travis failure was there (Build #1819).
> Let's see how it works after several runs before maybe.

+1

Let's see what Joe and Travis can do to wreck it.

> On Wed, Aug 25, 2021 at 5:02 PM stefan@eissing.org <stefan@eissing.org> wrote:
>>
>> It should be fixed now with r1892599, I believe.
>>
>>> Am 25.08.2021 um 13:53 schrieb Yann Ylavic <ylavic.dev@gmail.com>:
>>>
>>> On Wed, Aug 25, 2021 at 1:14 AM Travis CI <builds@travis-ci.com> wrote:
>>>>
>>>> apache / httpd
>>>>
>>>> trunk
>>>>
>>>> Build #1831 was broken
>>>> 21 mins and 33 secs
>>>> Yann Ylavic
>>>> 243c5fa CHANGESET ?
>>>>
>>>> mpm_{event,worker,prefork}: late stop of children processes on restart.
>>>
>>> (unrelated to this change, Build #1819 failed the same earlier).
>>>
>>> Here: https://app.travis-ci.com/github/apache/httpd/jobs/533578536#L4103
>>>
>>> It seems that when exiting a stream can be destroyed while in
>>> h2_mplx_s_task_done::s_task_done()::mst_check_data_for()
>>> when the mplx lock is released (thus stream->id faults).
>>>
>>> Should the caller(s) of mst_check_data_for() pass stream->id (under
>>> the lock) instead?
>>> This would avoid the fault but that's still a potentially destroyed
>>> stream being fifo'ed, though we are exiting so it might not matter..
>>>
>>> A better fix maybe, Stefan?
>>>
>>> Cheers;
>>> Yann.
>>
Re: Broken: apache/httpd#1831 (trunk - 243c5fa) [ In reply to ]
On Wed, Aug 25, 2021 at 05:08:06PM +0200, stefan@eissing.org wrote:
>
>
> > Am 25.08.2021 um 17:06 schrieb Yann Ylavic <ylavic.dev@gmail.com>:
> >
> > Thanks, looks good, it probably needs a backport to 2.4.x since the
> > first travis failure was there (Build #1819).
> > Let's see how it works after several runs before maybe.
>
> +1
>
> Let's see what Joe and Travis can do to wreck it.

Well I could accidentally cancel your builds, how's that for starters?

(I restarted the most recent one, was trying to cancel the older PR
build, but hit the wrong button on the wrong page, sorry!)

Nice job getting this one fixed.

Regards, Joe

p.s. did you forget the APLOGNO()?
p.p.s just kidding
Re: Broken: apache/httpd#1831 (trunk - 243c5fa) [ In reply to ]
> Am 25.08.2021 um 17:34 schrieb Joe Orton <jorton@redhat.com>:
>
> On Wed, Aug 25, 2021 at 05:08:06PM +0200, stefan@eissing.org wrote:
>>
>>
>>> Am 25.08.2021 um 17:06 schrieb Yann Ylavic <ylavic.dev@gmail.com>:
>>>
>>> Thanks, looks good, it probably needs a backport to 2.4.x since the
>>> first travis failure was there (Build #1819).
>>> Let's see how it works after several runs before maybe.
>>
>> +1
>>
>> Let's see what Joe and Travis can do to wreck it.
>
> Well I could accidentally cancel your builds, how's that for starters?
>
> (I restarted the most recent one, was trying to cancel the older PR
> build, but hit the wrong button on the wrong page, sorry!)
>
> Nice job getting this one fixed.
>
> Regards, Joe
>
> p.s. did you forget the APLOGNO()?
> p.p.s just kidding

I see that we are getting into the right spirit here! ;-)
Re: Broken: apache/httpd#1831 (trunk - 243c5fa) [ In reply to ]
> Am 25.08.2021 um 17:34 schrieb Joe Orton <jorton@redhat.com>:
>
> On Wed, Aug 25, 2021 at 05:08:06PM +0200, stefan@eissing.org wrote:
>>
>>
>>> Am 25.08.2021 um 17:06 schrieb Yann Ylavic <ylavic.dev@gmail.com>:
>>>
>>> Thanks, looks good, it probably needs a backport to 2.4.x since the
>>> first travis failure was there (Build #1819).
>>> Let's see how it works after several runs before maybe.
>>
>> +1
>>
>> Let's see what Joe and Travis can do to wreck it.
>
> Well I could accidentally cancel your builds, how's that for starters?
>
> (I restarted the most recent one, was trying to cancel the older PR
> build, but hit the wrong button on the wrong page, sorry!)
>
> Nice job getting this one fixed.
>
> Regards, Joe
>
> p.s. did you forget the APLOGNO()?
> p.p.s just kidding
>

https://app.travis-ci.com/github/apache/httpd

> assert env.apache_restart() == 0
3026E assert 1 == 0
3027E + where 1 = <bound method H2TestEnv.apache_restart of <h2_env.H2TestEnv object at 0x7ffb03366a60>>()
3028E + where <bound method H2TestEnv.apache_restart of <h2_env.H2TestEnv object at 0x7ffb03366a60>> = <h2_env.H2TestEnv object at 0x7ffb03366a60>.apache_restart
3029
3030test/modules/http2/test_002_curl_basics.py:11: AssertionError
3031---------------------------- Captured stderr setup -----------------------------
3032WARNING: exit 1, stdout: , stderr: (98)Address already in use: AH00072: make_sock: could not bind to address [::]:40001
3033(98)Address already in use: AH00072: make_sock: could not bind to address 0.0.0.0:40001
3034AH03272: no listening sockets available, shutting down
3035

Looks like the stop/start cycle instead of reload is not timing safe. If httpd is no longer responding, it does not mean that the ports are available again for the new process. Reloads are just better here, as wait loops can be avoided. Hmm....thinking...
Re: Broken: apache/httpd#1831 (trunk - 243c5fa) [ In reply to ]
On Thu, Aug 26, 2021 at 09:51:07AM +0200, stefan@eissing.org wrote:
> https://app.travis-ci.com/github/apache/httpd
>
> > assert env.apache_restart() == 0
> 3026E assert 1 == 0
> 3027E + where 1 = <bound method H2TestEnv.apache_restart of <h2_env.H2TestEnv object at 0x7ffb03366a60>>()
> 3028E + where <bound method H2TestEnv.apache_restart of <h2_env.H2TestEnv object at 0x7ffb03366a60>> = <h2_env.H2TestEnv object at 0x7ffb03366a60>.apache_restart
> 3029
> 3030test/modules/http2/test_002_curl_basics.py:11: AssertionError
> 3031---------------------------- Captured stderr setup -----------------------------
> 3032WARNING: exit 1, stdout: , stderr: (98)Address already in use: AH00072: make_sock: could not bind to address [::]:40001
> 3033(98)Address already in use: AH00072: make_sock: could not bind to address 0.0.0.0:40001
> 3034AH03272: no listening sockets available, shutting down
> 3035
>
> Looks like the stop/start cycle instead of reload is not timing safe.
> If httpd is no longer responding, it does not mean that the ports are
> available again for the new process. Reloads are just better here, as
> wait loops can be avoided. Hmm....thinking...

Yeah :( Next build I've pushed has r1892598 reverted and a sleep(2)
after running apachectl, that was sufficient for me when I was testing
locally before.

Regards, Joe
Re: Broken: apache/httpd#1831 (trunk - 243c5fa) [ In reply to ]
> Am 26.08.2021 um 10:16 schrieb Joe Orton <jorton@redhat.com>:
>
> On Thu, Aug 26, 2021 at 09:51:07AM +0200, stefan@eissing.org wrote:
>> https://app.travis-ci.com/github/apache/httpd
>>
>>> assert env.apache_restart() == 0
>> 3026E assert 1 == 0
>> 3027E + where 1 = <bound method H2TestEnv.apache_restart of <h2_env.H2TestEnv object at 0x7ffb03366a60>>()
>> 3028E + where <bound method H2TestEnv.apache_restart of <h2_env.H2TestEnv object at 0x7ffb03366a60>> = <h2_env.H2TestEnv object at 0x7ffb03366a60>.apache_restart
>> 3029
>> 3030test/modules/http2/test_002_curl_basics.py:11: AssertionError
>> 3031---------------------------- Captured stderr setup -----------------------------
>> 3032WARNING: exit 1, stdout: , stderr: (98)Address already in use: AH00072: make_sock: could not bind to address [::]:40001
>> 3033(98)Address already in use: AH00072: make_sock: could not bind to address 0.0.0.0:40001
>> 3034AH03272: no listening sockets available, shutting down
>> 3035
>>
>> Looks like the stop/start cycle instead of reload is not timing safe.
>> If httpd is no longer responding, it does not mean that the ports are
>> available again for the new process. Reloads are just better here, as
>> wait loops can be avoided. Hmm....thinking...
>
> Yeah :( Next build I've pushed has r1892598 reverted and a sleep(2)
> after running apachectl, that was sufficient for me when I was testing
> locally before.

Can we add a configuration directive for this server behaviour? Like

AcceptOnGraceful on|off

So that people who do not want the server to accept new connections
during graceful reloads can influence this?

- Stefan
Re: Broken: apache/httpd#1831 (trunk - 243c5fa) [ In reply to ]
On 8/26/21 10:21 AM, stefan@eissing.org wrote:
>
>
>> Am 26.08.2021 um 10:16 schrieb Joe Orton <jorton@redhat.com>:
>>
>> On Thu, Aug 26, 2021 at 09:51:07AM +0200, stefan@eissing.org wrote:
>>> https://app.travis-ci.com/github/apache/httpd
>>>
>>>> assert env.apache_restart() == 0
>>> 3026E assert 1 == 0
>>> 3027E + where 1 = <bound method H2TestEnv.apache_restart of <h2_env.H2TestEnv object at 0x7ffb03366a60>>()
>>> 3028E + where <bound method H2TestEnv.apache_restart of <h2_env.H2TestEnv object at 0x7ffb03366a60>> = <h2_env.H2TestEnv object at 0x7ffb03366a60>.apache_restart
>>> 3029
>>> 3030test/modules/http2/test_002_curl_basics.py:11: AssertionError
>>> 3031---------------------------- Captured stderr setup -----------------------------
>>> 3032WARNING: exit 1, stdout: , stderr: (98)Address already in use: AH00072: make_sock: could not bind to address [::]:40001
>>> 3033(98)Address already in use: AH00072: make_sock: could not bind to address 0.0.0.0:40001
>>> 3034AH03272: no listening sockets available, shutting down
>>> 3035
>>>
>>> Looks like the stop/start cycle instead of reload is not timing safe.
>>> If httpd is no longer responding, it does not mean that the ports are
>>> available again for the new process. Reloads are just better here, as
>>> wait loops can be avoided. Hmm....thinking...
>>
>> Yeah :( Next build I've pushed has r1892598 reverted and a sleep(2)
>> after running apachectl, that was sufficient for me when I was testing
>> locally before.
>
> Can we add a configuration directive for this server behaviour? Like
>
> AcceptOnGraceful on|off
>
> So that people who do not want the server to accept new connections
> during graceful reloads can influence this?

What about a graceful stop first and a start afterwards?

Regards

Rüdiger
Re: Broken: apache/httpd#1831 (trunk - 243c5fa) [ In reply to ]
> Am 26.08.2021 um 10:27 schrieb Ruediger Pluem <rpluem@apache.org>:
>
>
>
> On 8/26/21 10:21 AM, stefan@eissing.org wrote:
>>
>>
>>> Am 26.08.2021 um 10:16 schrieb Joe Orton <jorton@redhat.com>:
>>>
>>> On Thu, Aug 26, 2021 at 09:51:07AM +0200, stefan@eissing.org wrote:
>>>> https://app.travis-ci.com/github/apache/httpd
>>>>
>>>>> assert env.apache_restart() == 0
>>>> 3026E assert 1 == 0
>>>> 3027E + where 1 = <bound method H2TestEnv.apache_restart of <h2_env.H2TestEnv object at 0x7ffb03366a60>>()
>>>> 3028E + where <bound method H2TestEnv.apache_restart of <h2_env.H2TestEnv object at 0x7ffb03366a60>> = <h2_env.H2TestEnv object at 0x7ffb03366a60>.apache_restart
>>>> 3029
>>>> 3030test/modules/http2/test_002_curl_basics.py:11: AssertionError
>>>> 3031---------------------------- Captured stderr setup -----------------------------
>>>> 3032WARNING: exit 1, stdout: , stderr: (98)Address already in use: AH00072: make_sock: could not bind to address [::]:40001
>>>> 3033(98)Address already in use: AH00072: make_sock: could not bind to address 0.0.0.0:40001
>>>> 3034AH03272: no listening sockets available, shutting down
>>>> 3035
>>>>
>>>> Looks like the stop/start cycle instead of reload is not timing safe.
>>>> If httpd is no longer responding, it does not mean that the ports are
>>>> available again for the new process. Reloads are just better here, as
>>>> wait loops can be avoided. Hmm....thinking...
>>>
>>> Yeah :( Next build I've pushed has r1892598 reverted and a sleep(2)
>>> after running apachectl, that was sufficient for me when I was testing
>>> locally before.
>>
>> Can we add a configuration directive for this server behaviour? Like
>>
>> AcceptOnGraceful on|off
>>
>> So that people who do not want the server to accept new connections
>> during graceful reloads can influence this?
>
> What about a graceful stop first and a start afterwards?

Just tested: "graceful-stop" seems to make it worse. Connections are
being served using the old configuration.

>
> Regards
>
> Rüdiger
>
Re: Broken: apache/httpd#1831 (trunk - 243c5fa) [ In reply to ]
Joe: with r1892615 I fixed a bug in waiting for the server to become
unresponsive after a stop. This *should* make timed waits unnecessary
in the http2 test suite.

- Stefan

> Am 26.08.2021 um 10:43 schrieb stefan@eissing.org:
>
>
>
>> Am 26.08.2021 um 10:27 schrieb Ruediger Pluem <rpluem@apache.org>:
>>
>>
>>
>> On 8/26/21 10:21 AM, stefan@eissing.org wrote:
>>>
>>>
>>>> Am 26.08.2021 um 10:16 schrieb Joe Orton <jorton@redhat.com>:
>>>>
>>>> On Thu, Aug 26, 2021 at 09:51:07AM +0200, stefan@eissing.org wrote:
>>>>> https://app.travis-ci.com/github/apache/httpd
>>>>>
>>>>>> assert env.apache_restart() == 0
>>>>> 3026E assert 1 == 0
>>>>> 3027E + where 1 = <bound method H2TestEnv.apache_restart of <h2_env.H2TestEnv object at 0x7ffb03366a60>>()
>>>>> 3028E + where <bound method H2TestEnv.apache_restart of <h2_env.H2TestEnv object at 0x7ffb03366a60>> = <h2_env.H2TestEnv object at 0x7ffb03366a60>.apache_restart
>>>>> 3029
>>>>> 3030test/modules/http2/test_002_curl_basics.py:11: AssertionError
>>>>> 3031---------------------------- Captured stderr setup -----------------------------
>>>>> 3032WARNING: exit 1, stdout: , stderr: (98)Address already in use: AH00072: make_sock: could not bind to address [::]:40001
>>>>> 3033(98)Address already in use: AH00072: make_sock: could not bind to address 0.0.0.0:40001
>>>>> 3034AH03272: no listening sockets available, shutting down
>>>>> 3035
>>>>>
>>>>> Looks like the stop/start cycle instead of reload is not timing safe.
>>>>> If httpd is no longer responding, it does not mean that the ports are
>>>>> available again for the new process. Reloads are just better here, as
>>>>> wait loops can be avoided. Hmm....thinking...
>>>>
>>>> Yeah :( Next build I've pushed has r1892598 reverted and a sleep(2)
>>>> after running apachectl, that was sufficient for me when I was testing
>>>> locally before.
>>>
>>> Can we add a configuration directive for this server behaviour? Like
>>>
>>> AcceptOnGraceful on|off
>>>
>>> So that people who do not want the server to accept new connections
>>> during graceful reloads can influence this?
>>
>> What about a graceful stop first and a start afterwards?
>
> Just tested: "graceful-stop" seems to make it worse. Connections are
> being served using the old configuration.
>
>>
>> Regards
>>
>> Rüdiger
Re: Broken: apache/httpd#1831 (trunk - 243c5fa) [ In reply to ]
On Thu, Aug 26, 2021 at 01:11:16PM +0200, stefan@eissing.org wrote:
> Joe: with r1892615 I fixed a bug in waiting for the server to become
> unresponsive after a stop. This *should* make timed waits unnecessary
> in the http2 test suite.

Great stuff, thanks for all your work Stefan.

It worked locally and once in Travis, merged to trunk in r1892620.

Regards, Joe
Re: Broken: apache/httpd#1831 (trunk - 243c5fa) [ In reply to ]
> Am 26.08.2021 um 16:44 schrieb Joe Orton <jorton@redhat.com>:
>
> On Thu, Aug 26, 2021 at 01:11:16PM +0200, stefan@eissing.org wrote:
>> Joe: with r1892615 I fixed a bug in waiting for the server to become
>> unresponsive after a stop. This *should* make timed waits unnecessary
>> in the http2 test suite.
>
> Great stuff, thanks for all your work Stefan.
>
> It worked locally and once in Travis, merged to trunk in r1892620.

\o/ Now, when I break things, everyone will see it...wait!