Mailing List Archive

It's just magic, and I hate it ...
Following is a piece of Python that is making me pull out my hair. The idea
was to have a long-running process started by the user (via browser), and
have the results trickle back in without timing out the request. My strategy
was to define a per-connection file to hold results, fork off a child to
generate results and put them into the file, and have the parent return a
redirect to the results file. That way everyone is satisfied.

It works if I run it from the command line as myself. It works if I run it
from the command line as user 'WWW' group 'WWW' which is how the server will
see it. The results file is created, the redirect is printed and a shell
prompt returns, and the child has forked off and in fact updates the file
every so often until it finishes at which time it replaces the contents with
the simulated results.

When run from the browser, however, the redirect doesn't happen until the
child finishes. To recap, this is exactly the situation I need to avoid.

Please help.

+Mitchell



#! /usr/bin/env python

import sys
import cgi
import time
import os
import string
import socket

form = cgi.FieldStorage()
if form.has_key("title"):
hostname = socket.gethostbyaddr(socket.gethostname())[0]
uniq = hex(os.getpid())[2:] + hex(int(time.time()))[2:]
addr = "http://%s/~mgm/results/%s.html" % (hostname, uniq)
fn = "/home/mgm/public_html/results/%s.html" % uniq

# Generate the results file
open(fn, "w").write("""<HTML>
<HEAD>
<META HTTP-EQUIV=REFRESH CONTENT=5>
<TITLE>Results for [%s]</TITLE>
</HEAD>
<BODY>
<P>Beginning search at %s:
</BODY>
</HTML>
""" % (form["title"].value, time.asctime(time.localtime(time.time()))))

# Parent returns a redirect, child pretends to work for a while
# accumulating results
if os.fork() == 0:
for i in (1, 2, 3):
txt = open(fn, 'r').readlines()
txt.insert(len(txt)-2, "<P>Still searching\n")
open(fn, 'w').write(string.join(txt, ''))
time.sleep(10)
open(fn, 'w').write("""<HTML>
<HEAD>
<TITLE>Finished</TITLE>
</HEAD>
<BODY>
<H1>Finished</H1>
<P>If you had results, they'd be here
</BODY>
</HTML>
""")
else:
print "Status: 302 Moved"
print "Location:", addr
else:
print """Content-type: text/html

<HTML>
<HEAD>
<TITLE>Sample Form</TITLE>
</HEAD>
<BODY>
<P>
<H1>Sample Form</H1>
<FORM METHOD="POST">
Title: <INPUT TYPE="TEXT" NAME="title"><P>
<INPUT TYPE="SUBMIT" NAME="submit">
</FORM>
</BODY>
</HTML>"""



--
Mitchell Morris

A man said his credit card was stolen but he decided not to
report it because the thief was spending less than his wife
did.
It's just magic, and I hate it ... [ In reply to ]
* Mitchell Morris
|
| When run from the browser, however, the redirect doesn't happen
| until the child finishes. To recap, this is exactly the situation I
| need to avoid.

Try doing a sys.stdout.close() at the point where you want the
redirect to happen.

--Lars M.
It's just magic, and I hate it ... [ In reply to ]
In article <14190.35530.772547.166720@buffalo.fnal.gov>, Charles G Waldman
wrote:
>
>Try running Python in unbuffered mode - with a "-u" commandline flag,
>or by setting the PYTHONUNUFFERED environment variable to a non-empty
>string.

Are you saying that this worked for you? Or was this a guess?

I ask because it makes no difference when I try it.


--
Mitchell Morris

We think the new [Nintendo] system will use floppy disks and will be powered
by a steam engine.
-- Next Generation Online
It's just magic, and I hate it ... [ In reply to ]
Mitchell Morris <mmorris@mindspring.com> asked:
> Are you [Charles G Waldman] saying that this worked for you? Or
> was this a guess?

I'm going to guess that that was a guess because I was
guessing the same thing... I guess.

What you described is a classic indication of buffered output.
There are three places I can think of where your output could
be buffered:
1) in python
stdout can be buffered, unbuffered, or line buffered, depending
on the situation, eg, if output is a tty or a pipe. That's why
you would want to start python with -u since running from the
command line is running from a tty while the program talks to
the web server using a pipe. To be on the safe side, try calling
flush().

2) in the web server. From the Apache FAQ
http://www.apache.org/docs/misc/FAQ.html#nph-scripts
that's likely not the case because versions 1.3 and later don't
do an intermediate buffer, but you didn't say what web server
you are using nor the version number. That documentation may
describe what to do.

3) in the client. I don't recall if you have to give a
special header to specify that the data should be read and parsed
straight up. I've never done this so I don't know what to do
next.

Andrew
dalke@acm.org
It's just magic, and I hate it ... [ In reply to ]
[Charles G Waldman]
> Try running Python in unbuffered mode - with a "-u" commandline flag,
> or by setting the PYTHONUNUFFERED environment variable to a non-empty
> string.

[Mitchell Morris]
> Are you saying that this worked for you? Or was this a guess?
>
> I ask because it makes no difference when I try it.

Something like this popped up a while back, which made me <wink> write FAQ

4.87. Why doesn't closing sys.stdout (stdin, stderr) really close it?

This solved the problem for the person who had it then. So read the FAQ and
try what it says. If that doesn't work, I'll try to find the original
thread in DejaNews.

charles'-suggestion-was-worth-a-try-but-you're-getting-buffered-
somewhere-else-ly y'rs - tim
It's just magic, and I hate it ... [ In reply to ]
In article <37703621.B29B72CB@bioreason.com>, Andrew Dalke wrote:
[snip]
>
> 2) in the web server. From the Apache FAQ
>http://www.apache.org/docs/misc/FAQ.html#nph-scripts
> that's likely not the case because versions 1.3 and later don't
>do an intermediate buffer, but you didn't say what web server
>you are using nor the version number. That documentation may
>describe what to do.
>
[snip]
>
> Andrew
> dalke@acm.org

Aha! A clue! It's an old Apache, v1.2.6, as it turns out. You may have saved
my few remaining shreds of sanity!

Thanks for the pointer,
+Mitchell


--
Mitchell Morris

My wife and I married for better or worse ... I couldn't do better and she
couldn't do worse
It's just magic, and I hate it ... [ In reply to ]
As much as I hate to admit I'm an idiot, I got caught by the simplest of
things. When the child forks, it inherits all the open file descriptors of
the parent ... which includes sys.stdin and sys.stdout. Since these two (at
least) are attached to the socket from the browser, the browser doesn't
follow the redirect until *ALL* the fds close.

My new question is how do I close all the connections to the browser after
the fork? The obvious:
sys.stdin.close()
sys.stdout.close()
sys.stderr.close()
didn't get them all, since the browser continues to wait. Is there a
mechanism to determine all opened files?




To recap, here is the offending source

#! /usr/bin/env python

import sys
import cgi
import time
import os
import string
import socket

form = cgi.FieldStorage()
if form.has_key("title"):
hostname = socket.gethostbyaddr(socket.gethostname())[0]
uniq = hex(os.getpid())[2:] + hex(int(time.time()))[2:]
addr = "http://%s/~mgm/results/%s.html" % (hostname, uniq)
fn = "/home/mgm/public_html/results/%s.html" % uniq

# Generate the results file
open(fn, "w").write("""<HTML>
<HEAD>
<META HTTP-EQUIV=REFRESH CONTENT=5>
<TITLE>Results for [%s]</TITLE>
</HEAD>
<BODY>
<P>Beginning search at %s:
</BODY>
</HTML>
""" % (form["title"].value, time.asctime(time.localtime(time.time()))))

# Parent returns a redirect, child pretends to work for a while
# accumulating results
if os.fork() == 0:
sys.stdin.close()
sys.stdout.close()
sys.stderr.close()
for i in (1, 2, 3):
txt = open(fn, 'r').readlines()
txt.insert(len(txt)-2, "<P>Still searching\n")
open(fn, 'w').write(string.join(txt, ''))
time.sleep(10)
open(fn, 'w').write("""<HTML>
<HEAD>
<TITLE>Finished</TITLE>
</HEAD>
<BODY>
<H1>Finished</H1>
<P>If you had results, they'd be here
</BODY>
</HTML>
""")
else:
print "Status: 302 Moved"
print "Location:", addr
else:
print """Content-type: text/html

<HTML>
<HEAD>
<TITLE>Sample Form</TITLE>
</HEAD>
<BODY>
<P>
<H1>Sample Form</H1>
<FORM METHOD="POST">
Title: <INPUT TYPE="TEXT" NAME="title"><P>
<INPUT TYPE="SUBMIT" NAME="submit">
</FORM>
</BODY>
</HTML>"""


--
Mitchell Morris

If at first you don't succeed, see if the loser gets anything.
It's just magic, and I hate it ... [ In reply to ]
In article <slrn7n2aj0.n0c.mgm@unpkhswm04.bscc.bls.com>,
mmorris@mindspring.com wrote:
[snip]

In a very polite private e-mail, Tim Peters pointed out that this is a FAQ,
number 4.87 in fact. That will teach me to not check *ALL* the docs before
opening my trap.

Thanks to everyone ... I appreciate your time and efforts on my behalf,
+Mitchell
It's just magic, and I hate it ... [ In reply to ]
Mitchell Morris (mgm@unpkhswm04.bscc.bls.com) wrote:
: As much as I hate to admit I'm an idiot, I got caught by the simplest of
: things. When the child forks, it inherits all the open file descriptors of
: the parent ... which includes sys.stdin and sys.stdout. Since these two (at
: least) are attached to the socket from the browser, the browser doesn't
: follow the redirect until *ALL* the fds close.
:
: My new question is how do I close all the connections to the browser after
: the fork? The obvious:
: sys.stdin.close()
: sys.stdout.close()
: sys.stderr.close()
: didn't get them all, since the browser continues to wait. Is there a
: mechanism to determine all opened files?

I had the same problem a while ago, which resulted in FAQ #4.87:

4.87. Why doesn't closing sys.stdout (stdin, stderr)
really close it?

Python file objects are a high-level layer of abstraction on top of C
streams, which in turn are a medium-level layer of abstraction on top
of (among other things) low-level C file descriptors.

For most file objects f you create in Python via the builtin "open"
function, f.close() marks the Python file object as being closed from
Python's point of view, and also arranges to close the underlying C
stream. This happens automatically too, in f's destructor, when f
becomes garbage.

But stdin, stdout and stderr are treated specially by Python, because
of the special status also given to them by C: doing

sys.stdout.close() # ditto for stdin and stderr

marks the Python-level file object as being closed, but does not close
the associated C stream (provided sys.stdout is still bound to its default
value, which is the stream C also calls "stdout").

To close the underlying C stream for one of these three, you should
first be sure that's what you really want to do (e.g., you may confuse
the heck out of extension modules trying to do I/O). If it is, use
os.close:

os.close(0) # close C's stdin stream
os.close(1) # close C's stdout stream
os.close(2) # close C's stderr stream


--
-=-=-=-=-=-=-=-=
Clarence Gardner
AvTel Communications
Software Products and Services Division
clarence@avtel.com
It's just magic, and I hate it ... [ In reply to ]
Mitchell Morris (mmorris@mindspring.com) wrote:
: In article <slrn7n2aj0.n0c.mgm@unpkhswm04.bscc.bls.com>,
: mmorris@mindspring.com wrote:
: [snip]
:
: In a very polite private e-mail, Tim Peters pointed out that this is a FAQ,
: number 4.87 in fact. That will teach me to not check *ALL* the docs before
: opening my trap.
:
: Thanks to everyone ... I appreciate your time and efforts on my behalf,
: +Mitchell

I wonder if it will teach me to read the entire thread before following
up to a message :)

--
-=-=-=-=-=-=-=-=
Clarence Gardner
AvTel Communications
Software Products and Services Division
clarence@avtel.com