Mailing List Archive

multiple concurrent CGI [newbie]
I have never used mod_perl, but I think it can do what I want. I need a
CGI program that (1) always stays in memory and (2) can process several CGI
clients at the same time (3) can check the Apache server for new CGI
requests without blocking.

I read most of Stas Beckman's "mod_perl Guide" and I have Ford's "refcard"
but I can't see how to do it. Would someone please point me in the right
direction?

The design is...

Apache
|
V
perl program "PT"
|
V
Java program

The perl program, PT, and the Java program must be never exit. The PT
sends requests to Java and gets results from Java on a socket. The PT uses
an ID number to match each result to the corresponding request. The Java
program uses threads to handle many requests concurrently, but each request
might take several seconds.

Obviously, the PT program must *not* process only one request at a time. I
want PT to be able to have many "open" CGI requests which can be stored in
a perl array (or hash table). The PT main loop should poll the socket from
Java (for results) and poll the Apache (for new requests).

Does the

$r = Apache->request();

block until a request arrives? Is there any way to set a timeout so that
it will not block?

Can I call

$r1 = Apache->request();
$r2 = Apache->request();
$r3 = Apache->request();

and then handle them in parallel, e.g.

$r1->print "Hello world";
$r2->print "Hello how are you";
$r3->print "Hello this is a test";

Once again, thanks for the help. If this is covered in a document
someplace, just give me a pointer.

John Henckel henckel@iname.com
Zumbro Falls, Minnesota, USA (507) 753-2216

http://geocities.com/jdhenckel/
Re: multiple concurrent CGI [newbie] [ In reply to ]
Hi there,

On Wed, 23 Feb 2000, John Henckel wrote:

> Each CGI request is long running, so it would be foolish to have one
> child process per request. So I am wondering if one child can do
> multiple requests simultaneously. Or does the mod_perl architecture
> require each perl child process to complete one request before it
> can start another one?

Not the mod_perl architecture, the Apache one. But basically yes,
absolutely. Have a look at pages 56-62 of the Eagle Book.

The mod_perl stuff is just a way of messing about with the way Apache
services the request, in a language that some people have grown to
love, but it doesn't change the overall approach to doing it. You can
do it all with C (I like that better), or if you were perverse enough
you could do it in Fortran I suppose. But all the same, it would be
one request, one child. There's talk of support for threads in Apache
2.0 but that's on the bleeding edge and I don't know if it will change
this or not. Somebody on the List will know (RFC!). Then maybe you'd
have to worry about your operating system too.

> The perl code does some fancy pattern matching stuff before
> forwarding the request to Java. I am trying to use perl for what it
> does best and Java for what it does best.

Hmmm. There must be another way to do it. How about sending the
request to another server, so the child can forget about it? You'd
run a second copy of Apache on the same machine, and it isn't mod_perl
enabled. It does the Java stuff and replies to the client direct,
without going back through the mod_perl server.

Alternatively, don't use mod_perl at all. Just use a plain Apache
server and CGI. Or use a regex library and sweat it in a C handler.

Can mod_rewrite do anything for you? I don't know what you want to
do, but if it's just a matter of rewriting URIs then it's worth a
look, it can do anything like that. If you're brave you could hack it
about to get it to do something else, or use it as a model for your
own stuff.

73,
Ged.
Re: multiple concurrent CGI [newbie] [ In reply to ]
Thanks for your help. I am slowly coming to terms with the fact that Perl
is not multithreaded and so it will never be able to scale the way I need
it to. It is only a couple thousand lines of code. I will have to rewrite
it as a servlet in Java or else a C program using the FastCGI protocol.

Many people thing FCGI can't handle multiple concurrent requests, however,
that is only a perl limitation. The C-FCGI interface, with multithreading,
can process hundreds of CGI request simulataneously in ONE process with ONE
socket to the Apache server.

Once again, thanks for educating me about mod_perl!!

At 09:24 AM 2/24/00 +0000, you wrote:
>Hi there,
>
>On Wed, 23 Feb 2000, John Henckel wrote:
>
> > Each CGI request is long running, so it would be foolish to have one
> > child process per request. So I am wondering if one child can do
> > multiple requests simultaneously. Or does the mod_perl architecture
> > require each perl child process to complete one request before it
> > can start another one?
>
>Not the mod_perl architecture, the Apache one. But basically yes,
>absolutely. Have a look at pages 56-62 of the Eagle Book.
>
>... [trimmed]



John Henckel henckel@iname.com
Zumbro Falls, Minnesota, USA (507) 753-2216

http://geocities.com/jdhenckel/
Re: multiple concurrent CGI [newbie] [ In reply to ]
On Thu, 24 Feb 2000, John Henckel wrote:

> Thanks for your help. I am slowly coming to terms with the fact that Perl
> is not multithreaded and so it will never be able to scale the way I need
> it to. It is only a couple thousand lines of code. I will have to rewrite

This may quickly get off-topic but let me ask why !multithreaded == no
performance?

Maybe I'm confused but I don't see the connection between the two.


-dave


/*==================
www.urth.org
We await the New Sun
==================*/
Re: multiple concurrent CGI [newbie] [ In reply to ]
Hi there,

On Thu, 24 Feb 2000, John Henckel wrote:

> Thanks for your help. I am slowly coming to terms with the fact
> that Perl is not multithreaded and so it will never be able to scale
> the way I need it to. It is only a couple thousand lines of code.
> I will have to rewrite it as a servlet in Java or else a C program
> using the FastCGI protocol.

Have a word with the Perl developers if you're going to do something
like that, I'm sure they'd be keen to hear about it. There is an effort
to support threads in Perl, but it's early days.

When you've finished, make sure to announce it on the mod_perl list?

73,
Ged.
Re: multiple concurrent CGI [newbie] [ In reply to ]
Hi there,

On Thu, 24 Feb 2000, Autarch wrote:

> This may quickly get off-topic but let me ask why !multithreaded ==
> no performance?

No, it's still kinda on-topic. It's the old "mod_perl children
swallowing all your resources" syndrome.

Mr. Henckel was asking if he could have mod_perl children serving
requests which would each be delayed three or four seconds by some
processing on another server. This would have been as bad as asking
the Apache/mod_perl server to serve all your big images. The machine
hosting the heavy server would have collapsed under a large load.

If a single multi-threaded Apache child process could have handled it
then there would have been no problem, but Apache don't work that way
on Unix. Yet.

It seems to me that there are several ways around it, and I think John
wants to try the most challenging one, because it is there...

73,
Ged.
Re: multiple concurrent CGI [newbie] [ In reply to ]
On Thu, 24 Feb 2000, G.W. Haywood wrote:

> Mr. Henckel was asking if he could have mod_perl children serving
> requests which would each be delayed three or four seconds by some
> processing on another server. This would have been as bad as asking
> the Apache/mod_perl server to serve all your big images. The machine
> hosting the heavy server would have collapsed under a large load.

Why would it collapse? No matter what, its going to have wait 3-4 seconds
to respond to the client. Threading won't change that.

> If a single multi-threaded Apache child process could have handled it
> then there would have been no problem, but Apache don't work that way

This would be different how?

And particularly on something like Linux, where threads and processes
aren't all that much different, how would it be better to have many
threads vs. many processes?


-dave


/*==================
www.urth.org
We await the New Sun
==================*/
Re: multiple concurrent CGI [newbie] [ In reply to ]
On Thu, 24 Feb 2000, Autarch wrote:

> On Thu, 24 Feb 2000, G.W. Haywood wrote:
>
> > Mr. Henckel was asking if he could have mod_perl children serving
> > requests which would each be delayed three or four seconds by some
> > processing on another server. This would have been as bad as asking
> > the Apache/mod_perl server to serve all your big images. The machine
> > hosting the heavy server would have collapsed under a large load.
>
> Why would it collapse? No matter what, its going to have wait 3-4 seconds
> to respond to the client. Threading won't change that.

It will change the amount of memory used, though. With threads you only
have to store the execution context (think that's what it's called),
whereas with processes you recopy entire processes. Shared libraries and
copy-on-write sorts of features reduce this problem a little, but they
only go so far (and with copy-on-write it seems like as more time passes,
the less that's shared between processes), so you find yourself restarting
processes more often. Threads are at least theoretically more desirable.

> > If a single multi-threaded Apache child process could have handled it
> > then there would have been no problem, but Apache don't work that way
>
> This would be different how?

See above.

> And particularly on something like Linux, where threads and processes
> aren't all that much different, how would it be better to have many
> threads vs. many processes?
>
> -dave

In Linux threads show up in PS as separate processes. This doesn't mean
they use the same amount of memory. From a CPU usage perspective using
threads as opposed to processes doesn't let you run more things at once,
but the memory issues involved do. :)

- Bill
Re: multiple concurrent CGI [newbie] [ In reply to ]
The design I tried to implement is this...

Apache --> Perl PT program --> Java program

To service each client request, the PT does some AI string comparison
stuff, and passes the request to Java via localhost socket. The Java looks
at many databases and might take 3-5 seconds per request. The PT overhead
is minimal.

Options
a. PT is plain old CGI
b. PT uses CGI::Fast
c. PT uses mod_perl
d. rewrite PT in C with C-FCGI

Suppose my linux machine has 256M memory and I get 10 client requests per
second. Each request takes ~4 sec, so on average there will be 40
simultaneous requests.

With (a) each request is a new process which takes about 8 Meg of
memory. 8 x 40 = 320 Meg. TOO MUCH.

With (b) since Sven Verdoolaege's FCGI Perl module does not allow
concurrent requests (although the FastCGI specification DOES allow
it!) the memory req'd is the same as (a).

With (c) each request is a new process which takes about 4 Meg of
memory. 4 x 40 = 160 Meg. Better than (a), but still not very scalable.

With (d) I only need ONE process running the C program. It might use 2M of
memory total. It can handle hundreds of concurrent requests. It might use
two or three threads to manage the IPC with Apache and Java.

Based on my understanding, the choice is clear. It is too bad that Sven
crippled the FCGI protocol in his library. On the other hand, it would
still be difficult to stick with perl, because I would need the two extra
threads to manage the IPC efficiently.

e. use a Java servlet

This might be even better than (d). I will investigate.



At 10:14 AM 2/24/00 -0600, Autarch wrote:
>On Thu, 24 Feb 2000, G.W. Haywood wrote:
>
> > Mr. Henckel was asking if he could have mod_perl children serving
> > requests which would each be delayed three or four seconds by some
> > processing on another server. This would have been as bad as asking
> > the Apache/mod_perl server to serve all your big images. The machine
> > hosting the heavy server would have collapsed under a large load.
>
>Why would it collapse? No matter what, its going to have wait 3-4 seconds
>to respond to the client. Threading won't change that.
>
> > If a single multi-threaded Apache child process could have handled it
> > then there would have been no problem, but Apache don't work that way
>
>This would be different how?
>
>And particularly on something like Linux, where threads and processes
>aren't all that much different, how would it be better to have many
>threads vs. many processes?
>
>
>-dave
>
>
>/*==================
>www.urth.org
>We await the New Sun
>==================*/


John Henckel henckel@iname.com
Zumbro Falls, Minnesota, USA (507) 753-2216

http://geocities.com/jdhenckel/
Re: multiple concurrent CGI [newbie] [ In reply to ]
According to John Henckel:
> The design I tried to implement is this...
>
> Apache --> Perl PT program --> Java program
>
> To service each client request, the PT does some AI string comparison
> stuff, and passes the request to Java via localhost socket. The Java looks
> at many databases and might take 3-5 seconds per request. The PT overhead
> is minimal.
>
> Options
> a. PT is plain old CGI
> b. PT uses CGI::Fast
> c. PT uses mod_perl
> d. rewrite PT in C with C-FCGI
>
> Suppose my linux machine has 256M memory and I get 10 client requests per
> second. Each request takes ~4 sec, so on average there will be 40
> simultaneous requests.
>
> With (a) each request is a new process which takes about 8 Meg of
> memory. 8 x 40 = 320 Meg. TOO MUCH.

Most of this should be shared since you really are just running
one copy of perl. However a fork/exec of perl 10 times a second
is a load for most machines and you really want to deal with peak
loads, not just averages.

> With (c) each request is a new process which takes about 4 Meg of
> memory. 4 x 40 = 160 Meg. Better than (a), but still not very scalable.

Even more memory is shared among the processes here if the requests
are mostly running the same script and you pre-load it in the
parent.

> With (d) I only need ONE process running the C program. It might use 2M of
> memory total. It can handle hundreds of concurrent requests. It might use
> two or three threads to manage the IPC with Apache and Java.

Are you only munging the URL in this process? If so, you can probably
do it in a non-mod_perl apache using mod_rewrite. Or, run squid
as the front end using a perl script as the redirector. One squid
process and one perl process will handle everything.

> e. use a Java servlet
>
> This might be even better than (d). I will investigate.

If the back end java code is your own, it would make sense to
just add the servlet wrapper and whatever additional logic you
need before handing off to the existing classes. This would
reduce the overall processing.

Les Mikesell
les@mcs.com
Re: multiple concurrent CGI [newbie] [ In reply to ]
Modperlers...

On Thu, Feb 24, 2000 at 10:14:29AM -0600, Autarch wrote:
> On Thu, 24 Feb 2000, G.W. Haywood wrote:
>
> > Mr. Henckel was asking if he could have mod_perl children serving
> > requests which would each be delayed three or four seconds by some
> > processing on another server. This would have been as bad as asking
> > the Apache/mod_perl server to serve all your big images. The machine
> > hosting the heavy server would have collapsed under a large load.
>
> Why would it collapse? No matter what, its going to have wait 3-4 seconds
> to respond to the client. Threading won't change that.

To answer the collapsing question you have to understand how the
clients are serving things up... its a "blocking" engine. Basically
you get a request in, and that thread thats answer the browser sits
there waiting until it gets the full response to the browser. Pretty
inneficient, and impossible to set any sort of delay without
collapsing the server.

There is of course a solution. Setup a "processing" thread of your
own... send the request into the mod perl engine, query the
"processing" thread (or process if you prefer :>) if its done
processing...send answer if not, then send a "Please wait... still processing",
or something similar with a auto refresh meta tag set. Basically its
a no brainer... named pipes + pthreaded application with one thread
that handles communication with the mod_perl script and adding
requests on to the stack, and giving the "answers" back, one or two
actual "processing" threads, and a module API for the mod_perl program
to access it. No problem... and I guarantee it will be faster than
the canned Java solution. Just right that other application in C, if
you want sample source code, email me.

<rant> Threading is pretty much evil. It forces more overhead due to thread
context switching... actually this is the source of much of the Apache
performance "issues". Basically in a httpd server designed to take
advantage of the newer POSIX designs with regards to rt queues, etc.
you only need one thread for each processor. You challege this?,
check out phhttpd. Threads are not good for performance. What is
good for performance is having less threads handle more clients
CONCURENTLY. Not one at a time... thats just wasteful. I think the
IBM camp is responsible for this current misnomer with there OS/2 warp
thing years ago... threads!=good. (Unless you have tons of money and
have one of those machines that has special memory in the processor to
get rid of context switching problems)</rant>

>
> > If a single multi-threaded Apache child process could have handled it
> > then there would have been no problem, but Apache don't work that way
>
> This would be different how?
>
> And particularly on something like Linux, where threads and processes
> aren't all that much different, how would it be better to have many
> threads vs. many processes?

This has nothing to do with threads vs. processes. It has everything
to do with the things that will probably be changed about the Apache
core API with version 2 is released. There is information about this
on www.kegel.com/c10k.html check it out. You have to understand how
Apache internally works before you can accurately answer this
question, but you could employ the above fix I outlined for this
particular application. (Though this is sort of like putting a
bandaid on a hemorage)

Have fun...
Shane.

(BTW: mod_perl is a VERY VERY fast CGI engine. Want to go faster...
code it by hand in c and use one of the highspeed engines as your
base. Take your pick: phhttpd, thttpd, khttpd, and some others. I've
done just that and on my measly box that handles 800 static requests in
apache, it was able to service 2000 clients with dynamic content. I
also did a quasi dynamic, pre-fetched setup, and it was able to
service 2500 clients with a CORBA object communication setup.
<these are all per second numbers, without any keep alives either
running on an AMD k62-450 with 196 megs ram, while running an X
session and a ton of other apps were running.>)



>
>
> -dave
>
>
> /*==================
> www.urth.org
> We await the New Sun
> ==================*/
>
Re: multiple concurrent CGI [newbie] [ In reply to ]
On Thu, Feb 24, 2000 at 09:13:41AM -0600, John Henckel wrote:
> Thanks for your help. I am slowly coming to terms with the fact that Perl
> is not multithreaded and so it will never be able to scale the way I need
> it to. It is only a couple thousand lines of code. I will have to rewrite
> it as a servlet in Java or else a C program using the FastCGI protocol.
>
> Many people thing FCGI can't handle multiple concurrent requests, however,
> that is only a perl limitation. The C-FCGI interface, with multithreading,
> can process hundreds of CGI request simulataneously in ONE process with ONE
> socket to the Apache server.
>
> Once again, thanks for educating me about mod_perl!!
>

Arg...! Okay, lets backup. Multithreading is not a good idea. What
your talking about is a process which takes 4 seconds to complete...,
what are we talking about, remote system communication? You have a
process which takes 4 seconds to finish, it doesn't matter what you
use as your design... it still takes 4 seconds. If you have 100
requests for a 4 second process... its going to take 400 seconds.
(Actually, due to context switching, probably more like 500)

Multi-threading ADDs, not subtracts from the load. But you see it all
depends on what sort of architecture your going to use too. If you
going to use a single processor machine, multithreading is going to
slow the whole process down. If your using a computer that has 100s
of processors, then clearly, multithreading is the approach to take.

The perl "limitation" of which is you speak is not a limitation per
se. Its a limitation if your only using one instance of the perl
interpretor. But you are free to use more than one instance of that
interpretor. (perldoc perlembed) However, it makes zero sense to
have more interpretors than processors... due the context switching
issue.

The best way to solve the problem is to have one "processing" thread
per processor. (I.e. the thread that does work on the request that is
supposed to take 4 seconds) That thread can be directly written in c,
or you can write some code for a perl interpretor to process. No big
deal.., just keep in mind, if you start opening up more processing
threads than processors you have and start cramming requests down
throat faster than once per four seconds, the things going to tumble
like dominos. The best thing you could possibly do is this:
Setup an engine of your own to handle this 4 second long process.
Initiate as many threads as you have processors.
Start a queue of processes that can be however long.
Hand out 4 second long processes to the queue.

This design will keep things from crashing. Even better would be to
have a series of computers that grab requests from others. I.e. setup
a central thread written in c, then have it act as a queue.
Processing threads grab new requests from that queue, and deliver
them. This is of course based on your effort of running a process
that takes 4 seconds.

If this is about remote communication, which I have no idea why it
wouldn't be unless your doing some strange number crunching, then what
you want to do is read up on select(), and poll(). Or if you want to
have it work really well, read up on rt signal queues, and run on the
2.3.x linux kernel, or some other unix variant.

I'll reiterate... having more threads than processors in ANY language
is a bad idea, if it can be avoided. Which it can if you use the
newer programming stuff. Moving to Java will not solve your problems,
it will create 100x worse ones in terms of performance. Moving to c
will not solve your problems. Moving to Perl will not solve your
problem. Investigating your problem clearly will help eventually
solve your problem. (BTW: No CGI request should take 4 seconds to
process, unless you are querying a database on the other side of the
planet. Unless its some sort of mathematics lab searching for prime
numbers or something. I might sound like I'm joking..., but I most
certainly am not.)

Have fun :),
Shane.
(Sorry to get crabby... its just annoying when people thing a
"particular" technology will solve everything. No... multithreading
will not solve world hunger :>. In fact, if your convinced it will,
and start up too many processes, then your kernel will spend 100% of
its time trying to figure out which process to run, and swapping
context information in and out of the processor. Then ADM won't send
it's food shipments to various countries on time. Chalk it up to
lessons learned from Mindcraft! :>)