Mailing List Archive

close connection for request, but continue
Dear mod_perl list,

please consider my gratefulness for any hints/insight :)

I'm trying to achieve the following: when there is an incoming request, I
want to set a time limit in which an answer should be delivered to the
client, no matter what.

However, since the work triggered by the initial request (there is another
request to other site involved) might take much longer than that time
limit, I want that work to properly finish, despite the fact that the
initial request was 'served' already.

I first thought that using alarm and closing the connection would just
work, so my initial code was somewhat like


---

#!/usr/bin/perl

use strict;
use warnings;

use Apache2::RequestUtil;
use Apache2::Connection;
use Apache2::Const -compile => qw(:common :http );

local our $r = shift;

local our $t_start = time();
local $SIG{ALRM} = sub { _force_early_response( $t_start ) };

local our $it_took_too_long = 0;

our $alarm_time = 5; # seconds we allow to process an response

alarm $alarm_time;

# start working for the request
#
# work done

alarm 0
return Apache2::Const::OK;


sub _force_early_response {
my ($t1) = @_;

$it_took_too_long = 1;

my $t2 = time();

$r->assbackwards(1);
my $response_text = _sorry_but_that_will_be_ready_only_later();

$r->print( "HTTP/1.1 200 OK\n"
."Date: Wed, 20 Apr 2016 10:55:08 GMT\n"
."Server: Apache/2.2.31 (Amazon)\n"
."Content-Type: text/plain; charset=UTF-8\n");

$r->print( "\n$response_text" );

my $c = $r->connection();
my $socket = $c->client_socket;

$socket->close();
return;
}
---

That didn't work - the connection just was not closed.

A second attempt succeeded, where I changed the alarm trap to be

---
sub _force_early_response {
my ($t1) = @_;

$it_took_too_long = 1;

my $t2 = time();

$r->assbackwards(1);

my $response_text = _sorry_but_that_will_be_ready_only_later();
my $content_length = length( $response_text );

$r->print( "HTTP/1.1 200 OK\n"
."Date: Wed, 20 Apr 2016 10:55:08 GMT\n"
."Server: Apache/2.2.31 (Amazon)\n"
."Connection: close\n"
."Content-Type: text/plain; charset=UTF-8\n"
."Content-Length: $content_length\n" );

$r->print( "\n$response_text" );
return;
}
---

So while I have now something that seems to work OK and does exactly what
I wanted, I'm a bit unhappy with the code.

I did look around, but I wasn't able to find any mod_perl/library function
that would make the 'close connection and go on' code easier.

Especially for setting the status line, what I now do by

$r->print( "HTTP/1.1 200 OK\n" )

I thought that there would be some ready-made tools available.

I actually hoped to get away without having to explicitely write out
'hand-crafted' headers, but so far I wasn't able to find anything for
that.

Maybe I just didn't look into the right places?

I'm running this on an updated Amazon Linux machine, vanilla httpd and
mod_perl,

$ rpm -qi httpd
Name : httpd
Version : 2.2.31
Release : 1.7.amzn1

$ rpm -qi mod_perl
Name : mod_perl
Version : 2.0.7
Release : 7.27.amzn1

Many thanks in advance for any ideas, hints, comments.

Iosif Fettich
Re: close connection for request, but continue [ In reply to ]
On Thu, Apr 21, 2016 at 5:20 AM, Iosif Fettich <ifettich@netsoft.ro> wrote:

>
> I'm trying to achieve the following: when there is an incoming request, I
> want to set a time limit in which an answer should be delivered to the
> client, no matter what.
>
> However, since the work triggered by the initial request (there is another
> request to other site involved) might take much longer than that time
> limit, I want that work to properly finish, despite the fact that the
> initial request was 'served' already.


TMTOWTDI, but the common way to do this is to add the long-running job to a
job queue, and then redirect the user to a page that periodically checks if
the job is done by using JavaScript requests.

If you don't have a job queue and don't want to add one just for this, you
could use a cleanup handler to run the slow stuff after disconnecting:
http://perl.apache.org/docs/2.0/user/handlers/http.html#PerlCleanupHandler

That will tie up a mod_perl process though, so it's not a good way to go
for large sites.

- Perrin
Re: close connection for request, but continue [ In reply to ]
Hi Perrin,

>> I'm trying to achieve the following: when there is an incoming request, I
>> want to set a time limit in which an answer should be delivered to the
>> client, no matter what.
>>
>> However, since the work triggered by the initial request (there is another
>> request to other site involved) might take much longer than that time
>> limit, I want that work to properly finish, despite the fact that the
>> initial request was 'served' already.
>
>
> TMTOWTDI, but the common way to do this is to add the long-running job to a
> job queue, and then redirect the user to a page that periodically checks if
> the job is done by using JavaScript requests.

It's not such a typical long-running job that I'm doing. It rather goes
like this: whereas I most of the time can answer with what I have within
the acceptable answer time, I sometimes have to make another request in
the background. That too most of the time is served within acceptable
time; _sometimes_ it isn't, so only occasionally it takes more.

The clue: let's say the backend service is pay-per-use, so I definitely
don't want to throw away a started request. If I have launched a request
in the back, I'd want to get the results, even if the initial requester
was turned down in the meantime.

> If you don't have a job queue and don't want to add one just for this, you
> could use a cleanup handler to run the slow stuff after disconnecting:
> http://perl.apache.org/docs/2.0/user/handlers/http.html#PerlCleanupHandler

I'm afraid that won't fit, actually. It's not a typical Cleanup I'm after
- I actually want to not abandon the request I've started, just for
closing the incoming original request. The cleanup handler could relaunch
the slow back request - but doing so I'd pay twice for it.

> That will tie up a mod_perl process though, so it's not a good way to go
> for large sites.

I'm aware of that, but that's less of a concern for now.

Many thanks,

Iosif Fettich
Re: close connection for request, but continue [ In reply to ]
On Thu, Apr 21, 2016 at 9:48 AM, Iosif Fettich <ifettich@netsoft.ro> wrote:

> I'm afraid that won't fit, actually. It's not a typical Cleanup I'm after
> - I actually want to not abandon the request I've started, just for closing
> the incoming original request. The cleanup handler could relaunch the slow
> back request - but doing so I'd pay twice for it.


You don't have to. You can just return immediately, and do all the work in
the cleanup (or a job queue) while you let the client poll for status. It's
a little extra work for simple requests, but it means all requests are
handled the same and you never make extra requests to your expensive
backend.

If you're determined not to do polling from the client, your best bet is
probably to fork immediately and do the work in the fork, while you poll to
check if it's done in your original process. You'd have to write the
response to a database or something that the original process can pick it
up from. But forking from mod_perl is a pain and easy to mess up, so I
recommend doing one of the other approaches.

- Perrin
Re: close connection for request, but continue [ In reply to ]
A job queue is also better because it stops un-controlled forking or
excessive numbers of "dead" web connections hanging around. It will just
queue requests until resources are available.. You may find handling
multiple of these jobs in parallel eats up all your processor/memory
resources.. Where queuing you can limit the number of process running in
parallel you have. (and if your site gets bigger you may be able to hand
off some of this to a cluster of machines to handle the long running
process....)


On 4/21/2016 3:25 PM, Perrin Harkins wrote:
> On Thu, Apr 21, 2016 at 9:48 AM, Iosif Fettich <ifettich@netsoft.ro
> <mailto:ifettich@netsoft.ro>> wrote:
>
> I'm afraid that won't fit, actually. It's not a typical Cleanup
> I'm after - I actually want to not abandon the request I've
> started, just for closing the incoming original request. The
> cleanup handler could relaunch the slow back request - but doing
> so I'd pay twice for it.
>
>
> You don't have to. You can just return immediately, and do all the
> work in the cleanup (or a job queue) while you let the client poll for
> status. It's a little extra work for simple requests, but it means all
> requests are handled the same and you never make extra requests to
> your expensive backend.
>
> If you're determined not to do polling from the client, your best bet
> is probably to fork immediately and do the work in the fork, while you
> poll to check if it's done in your original process. You'd have to
> write the response to a database or something that the original
> process can pick it up from. But forking from mod_perl is a pain and
> easy to mess up, so I recommend doing one of the other approaches.
>
> - Perrin




--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
Re: close connection for request, but continue [ In reply to ]
On 21.04.2016 11:20, Iosif Fettich wrote:
> Dear mod_perl list,
>
> please consider my gratefulness for any hints/insight :)
>
> I'm trying to achieve the following: when there is an incoming request, I want to set a
> time limit in which an answer should be delivered to the client, no matter what.
>
> However, since the work triggered by the initial request (there is another request to
> other site involved) might take much longer than that time limit, I want that work to
> properly finish, despite the fact that the initial request was 'served' already.
>

[...]

In agreement with Perrin, and to expand a bit :

To go back some 20 years, let's say that the original design of HTTP and webservers was
not really thought for client requests that take a long time to process.
Browsers, when they make a request to a server, will wait for a response for a maximum of
about 5 minutes, and if by then they have not received a response, will close the
connection and display an error like "this server appears to be busy, and does not respond"..
And since the connection is now closed, whenever in the end the server would try to send
back a response, it would find no connection to send it on, and it would abort the request
processing at that point and write some error message to the error log.

But you seem to already know all that, which is probably why you are sending a response to
the browser no matter what, before this timeout occurs.

However, the way in which you are doing this (currently), is kind of a "perversion" of the
protocol, because
- you are sending a response to the browser saying that everything is ok (so for the
browser this request is terminated and it can go on with the next one (and/or close the
connection))
- but on the other hand, the request-processing process under Apache is still running, for
this request and this client.
And if that request-processing process now, for whatever reason, would have something to
send to the client (for example, some error), it would find the connection gone and be
unable to do it.

(And because what you are doing is in fact not a natural thing to do, is the reason why
you are not finding any standard module or interface or API to do that kind of thing)

The "canonical" way to do this, would be something like
- the client sends the request to the server
- the server allocates a process (or thread or whatever) to process this request
- this request-processing process "delegates" this browser request to some other,
independent-of-the-webserver process, which can take as long as necessary to fulfill the
(background part of) the request
- the request-processing process does not wait for the response or the exit of that
independent process, but returns a response right away to the client browser (such as
"Thank you for your request. It is being handled by our back-office. You will receive an
email when it's done.".)
- and then, as far as the webserver is concerned, this client request is finished
(cleanly), and the request-processing process can be re-allocated to some other incoming
request

Optionally, you could provide a way for the client to periodically enquire as to the
advancement status of his request.

The tricky bit, is to have the Apache-request-processing process in which you are originally,

- either itself start a totally independent secondary process that will go off and fulfill
the long-running part of the request. Tricky to do right, easy to overwhelm your server.

- or (probably simpler), just pass this request to an already-running independent server
process which will do this long-running part.
This is what Perrin refers to as a "job queue" system.
You can develop such a "job queue" system yourself, or you can use an already-made one.
There are such things within the Apache projects, or if you want perl, you may find some
under CPAN (see POE for example).

I would guess that this is all a bit more complicated than what you envisioned initially,
but that's the case of many such things.
Re: close connection for request, but continue [ In reply to ]
Iosif:
You will need:

[] a background state storage location (database table with unique row ID;
directory with unique ID which points to the state file).
[] Your user-facing request page accepts the request, scheduled the work,
responds with a page which auto-refreshes against the GUID which reports
status of the background request.
[] Your user-facing status page auto-refreshes to itself while the job is
in motion.
[] Your user-facing status page auto-refreshes to a "you're done; your
results are here / have been mailed to you".

Observations:
[] My corporate business users can follow along with a 15 second
auto-refresh (as long as the page clearly indicates an auto-refresh in 15
seconds). Count-down timers are probably better.
[] My technical users close the pop-up tab after the first request (not
caring for the intermediate status pages and knowing that the result has
been accomplished or mailed to them).

Asides:
[] Some of our backend jobs take a long time (lots of data to grind
through); these tend towards email status.
[] The database-based queue view (assuming you're internally facing only)
allows your support teams to observed queued jobs (things which will
happen in the future), active jobs (things running right now on some
machine), completed jobs (jobs which succeeded), failed jobs (jobs which
did not succeed).

Hopefully these implementation specifics and operational observations
assist you as you take André's excellent summary and put it all to work.
--
Frotz
EMAN
Cisco Systems, Inc.

On 2016/04/21, 07:36, "André Warnier (tomcat)" <aw@ice-sa.com> wrote:

>On 21.04.2016 11:20, Iosif Fettich wrote:
>> Dear mod_perl list,
>>
>> please consider my gratefulness for any hints/insight :)
>>
>> I'm trying to achieve the following: when there is an incoming request,
>>I want to set a
>> time limit in which an answer should be delivered to the client, no
>>matter what.
>>
>> However, since the work triggered by the initial request (there is
>>another request to
>> other site involved) might take much longer than that time limit, I
>>want that work to
>> properly finish, despite the fact that the initial request was 'served'
>>already.
>>
>
>The "canonical" way to do this, would be something like
>- the client sends the request to the server
>- the server allocates a process (or thread or whatever) to process this
>request
>- this request-processing process "delegates" this browser request to
>some other,
>independent-of-the-webserver process, which can take as long as necessary
>to fulfill the
>(background part of) the request
>- the request-processing process does not wait for the response or the
>exit of that
>independent process, but returns a response right away to the client
>browser (such as
>"Thank you for your request. It is being handled by our back-office. You
>will receive an
>email when it's done.".)
>- and then, as far as the webserver is concerned, this client request is
>finished
>(cleanly), and the request-processing process can be re-allocated to some
>other incoming
>request
>
>Optionally, you could provide a way for the client to periodically
>enquire as to the
>advancement status of his request.