Mailing List Archive

waste of resources ?
Hi All

In my program I always get an OSError number 11 (Resource temporarely
not available). It reads in lots of files using child processes
communication with the parent. Each child also uses popen to get data
from lauched 'gunzip' process. After ~1000 processed files the program
raises OSError[11]. I'm pretty sure that all the child processes close
their pipes to the parent and that popen is also closed. The parent
stores the file descriptors connected to the children in a list but uses
the lists 'remove' method to delete descriptors that are finished (those
which the parent has already read from).

However, I'd like to get sure that's not the number of used
filedescriptors what's causing the error. How can I examine the cause of
error, I mean is there a python method or a variable that tell you the
number of currently used file descriptors or better that lists all the
resources a program is using including it's children?

Hm, maybe I'll have a hard night of debugging :-(

thanks,

Arne

--
Arne Mueller
Biomolecular Modelling Laboratory
Imperial Cancer Research Fund
44 Lincoln's Inn Fields
London WC2A 3PX, U.K.
phone : +44-(0)171 2693405 | Fax : +44-(0)171 269 3258
email : a.mueller@icrf.icnet.uk | http://www.icnet.uk/bmm/
waste of resources ? [ In reply to ]
Arne Mueller wrote:
>
> Hi All
>
> In my program I always get an OSError number 11 (Resource temporarely
> not available). It reads in lots of files using child processes
> communication with the parent. Each child also uses popen to get data
> from lauched 'gunzip' process. After ~1000 processed files the program
> raises OSError[11]. I'm pretty sure that all the child processes close
> their pipes to the parent and that popen is also closed. The parent
> stores the file descriptors connected to the children in a list but uses
> the lists 'remove' method to delete descriptors that are finished (those
> which the parent has already read from).
>
> However, I'd like to get sure that's not the number of used
> filedescriptors what's causing the error. How can I examine the cause of
> error, I mean is there a python method or a variable that tell you the
> number of currently used file descriptors or better that lists all the
> resources a program is using including it's children?

I think I found the source of error:

Im running python 1.5.2b2 on SGI power challange irix64 version 6.5.
Everytime a child process is created via fork(), it realy does it's job
but never dies, instead the number of zombie processes (as reported by
the 'top' program) steadily increases, though I call the sys.exit(0)
method for the child. Any idea how exit a process without creating a
zombie?

thanks alot,

Arne
waste of resources ? [ In reply to ]
Arne Mueller <a.mueller@icrf.icnet.uk> writes:

> Im running python 1.5.2b2 on SGI power challange irix64 version 6.5.
> Everytime a child process is created via fork(), it realy does it's
> job but never dies, instead the number of zombie processes (as
> reported by the 'top' program) steadily increases, though I call the
> sys.exit(0) method for the child. Any idea how exit a process
> without creating a zombie?

The parent should wait() for its children. See a Unix programming
manual for more details.
waste of resources ? [ In reply to ]
Arne Mueller wrote:
> Im running python 1.5.2b2 on SGI power challange irix64 version 6.5.
> Everytime a child process is created via fork(), it realy does it's job
> but never dies, instead the number of zombie processes (as reported by
> the 'top' program) steadily increases, though I call the sys.exit(0)
> method for the child. Any idea how exit a process without creating a
> zombie?
>
> thanks alot,
>
> Arne

Yes. The parent process must execute a os.wait() or os.waitpid() in
order for the child process's return code to be sent back to the parent
program. Only after that does the child process (now a zombie process)
actually get eliminated.

Hope this helps.

--
Dr. Gary Herron <gherron@aw.sgi.com>
206-287-5616
Alias | Wavefront
1218 3rd Ave, Suite 800, Seattle WA 98101
waste of resources ? [ In reply to ]
Arne Mueller wrote:

>
> Im running python 1.5.2b2 on SGI power challange irix64 version 6.5.
> Everytime a child process is created via fork(), it realy does it's job
> but never dies, instead the number of zombie processes (as reported by
> the 'top' program) steadily increases, though I call the sys.exit(0)
> method for the child. Any idea how exit a process without creating a
> zombie?

On sysV or Unix98 systems you may set the disposition of SIGCHLD to
SIG_IGN (i.e., ignore death or a child signals) The watertight method
is to catch SIGCHLD and have the handler call waitpid(...) with WNOHANG
in a loop.

--
Bryan Van de Ven
Applied Research Labs
University of Texas, Austin
waste of resources ? [ In reply to ]
Bryan VanDeVen <bryanv@arlut.utexas.edu> writes:

> On sysV or Unix98 systems you may set the disposition of SIGCHLD to
> SIG_IGN (i.e., ignore death or a child signals) The watertight
> method is to catch SIGCHLD and have the handler call waitpid(...)
> with WNOHANG in a loop.

Except that when a signal arrives during an IO operation, it will be
canceled due to the EINTR brain-damage, and Python will throw an
exception. So if you expect your program to accept SIGCHLD signals at
random intervals, you'd better encase all your IO operation in a loop
like this:

while 1:
try:
read(...) # or whatever
except (IOError, os.error), detail:
if detail.args[0] == errno.EINTR:
continue
else:
raise

This problem with signals has been my long-standing gripe with Python.
waste of resources ? [ In reply to ]
>>>>> "Hrvoje" == Hrvoje Niksic <hniksic@srce.hr> writes:

Hrvoje> Except that when a signal arrives during an IO operation,
Hrvoje> it will be canceled due to the EINTR brain-damage, and
Hrvoje> Python will throw an exception. So if you expect your
Hrvoje> program to accept SIGCHLD signals at random intervals,
Hrvoje> you'd better encase all your IO operation in a loop like
Hrvoje> this:

<snip>

Hrvoje> This problem with signals has been my long-standing gripe
Hrvoje> with Python.

If I recall correctly, Perl has just about the exact same problem.
Is there a good way to fix it?

--
Brought to you by the letters Z and I and the number 1.
"Well, I think Perl should run faster than C. :-)"
Debian GNU/Linux maintainer of Gimp and GTK+ -- http://www.debian.org/
waste of resources ? [ In reply to ]
Ben Gertzfield <che@debian.org> writes:

> If I recall correctly, Perl has just about the exact same problem.
> Is there a good way to fix it?

I don't know about Perl, but a good fix would be to have the
interpreter do the hard work for us. If a system call fails and errno
is EINTR, then the signal handler should be called, *and* the syscall
should be restarted.
waste of resources ? [ In reply to ]
On 7 Jun 1999, Hrvoje Niksic wrote:

<about syscalls throwing EINTR and aborting>
> I don't know about Perl, but a good fix would be to have the
> interpreter do the hard work for us. If a system call fails and errno
> is EINTR, then the signal handler should be called, *and* the syscall
> should be restarted.

Ummmmm......

You can do the hard-work yourself, but once and for all:

def myread(file):
while 1:
try:
return file.read()
except (IOError, os.error), detail:
if detail.args[0] != errno.EINTR: raise

And just call myread(foo) instead of foo.read().
Where's the catch-22?
--
Moshe Zadka <mzadka@geocities.com>.
QOTD: My own exit is more likely to be horizontal then perpendicular.
waste of resources ? [ In reply to ]
>>>>> "HN" == Hrvoje Niksic <hniksic@srce.hr> writes:

| except (IOError, os.error), detail:

Or you could just catch EnvironmentError. :)

-Barry
waste of resources ? [ In reply to ]
Gary Herron <gherron@aw.sgi.com> writes:

> Arne Mueller wrote:
> > Im running python 1.5.2b2 on SGI power challange irix64 version 6.5.
> > Everytime a child process is created via fork(), it realy does it's job
> > but never dies, instead the number of zombie processes (as reported by
> > the 'top' program) steadily increases, though I call the sys.exit(0)
> > method for the child. Any idea how exit a process without creating a
> > zombie?
> >
> > thanks alot,
> >
> > Arne
>
> Yes. The parent process must execute a os.wait() or os.waitpid() in
> order for the child process's return code to be sent back to the parent
> program. Only after that does the child process (now a zombie process)
> actually get eliminated.
>
> Hope this helps.
>
> --
> Dr. Gary Herron <gherron@aw.sgi.com>
> 206-287-5616
> Alias | Wavefront
> 1218 3rd Ave, Suite 800, Seattle WA 98101

Another thing to note is that when a child process dies the parent
will be sent a SIGCHLD signal (I think this also applies if the child
process was paused with SIGSTOP). If you just want to ignore the
return code of the child process, use code like this:


import signal
import os

def sigchild_handler(signum, frame):
os.wait()

signal.signal(signal.SIGCHLD, sigchild_handler)


This will run os.wait for each child process that exits. You will
need to catch the return value of os.wait and inspect it if you care
what the process returned or whether it was killed by a signal.

--
Tim Evans
waste of resources ? [ In reply to ]
"Barry A. Warsaw" <bwarsaw@cnri.reston.va.us> writes:

> >>>>> "HN" == Hrvoje Niksic <hniksic@srce.hr> writes:
>
> | except (IOError, os.error), detail:
>
> Or you could just catch EnvironmentError. :)

It probably didn't exist in Python 1.5.1, under which I was writing
the code in question.
waste of resources ? [ In reply to ]
Hi All,


>
> Another thing to note is that when a child process dies the parent
> will be sent a SIGCHLD signal (I think this also applies if the child
> process was paused with SIGSTOP). If you just want to ignore the
> return code of the child process, use code like this:
>
> import signal
> import os
>
> def sigchild_handler(signum, frame):
> os.wait()
>
> signal.signal(signal.SIGCHLD, sigchild_handler)
>
> This will run os.wait for each child process that exits. You will
> need to catch the return value of os.wait and inspect it if you care
> what the process returned or whether it was killed by a signal.
>

Hm, that doesn't work properly with my program. THe number of zombie
processes is steadily at about 20, which means there are 20 unhandled
died children.

Arne
waste of resources ? [ In reply to ]
>>>>> "Hrvoje" == Hrvoje Niksic <hniksic@srce.hr> writes:

Hrvoje> It probably didn't exist in Python 1.5.1, under which I
Hrvoje> was writing the code in question.

Oops, yes, EnvironmentError showed up in 1.5.2a1

-Barry
waste of resources ? [ In reply to ]
Arne Mueller <a.mueller@icrf.icnet.uk> writes:


[snip]

> > def sigchild_handler(signum, frame):
> > os.wait()

[snip]

> Hm, that doesn't work properly with my program. THe number of zombie
> processes is steadily at about 20, which means there are 20 unhandled
> died children.

Unix doesn't guarantee you'll get exactly one signal delivered per
event. To be sure to get all the children, you need to loop over the
processes with waitpid().

I think something like this should do it:

----------
def sigchld_handler(signum, frame):
print "Checking for processes..."
try:
while 1:
pid, sts = os.waitpid(-1, os.WNOHANG)
if pid == 0:
break
print "Process %d ended with status %04x." % (pid, sts)
except OSError, detail:
if detail.args[0] != errno.ECHILD:
raise
else:
print "No more children."
else:
print "Some children yet to finish."
----------

I can't actually see any way to keep count well enough and avoid race
conditions, to get rid of the try/except. Sometimes I wish Python had
POSIX signals, but not badly enough to actually write it.

--
Carey Evans http://home.clear.net.nz/pages/c.evans/

"I'm not a god. I've just been misquoted."
waste of resources ? [ In reply to ]
Carey Evans wrote:
>
> Arne Mueller <a.mueller@icrf.icnet.uk> writes:
>
> [snip]
>
> > > def sigchild_handler(signum, frame):
> > > os.wait()
>
> [snip]
>
> > Hm, that doesn't work properly with my program. THe number of zombie
> > processes is steadily at about 20, which means there are 20 unhandled
> > died children.
>
> Unix doesn't guarantee you'll get exactly one signal delivered per
> event. To be sure to get all the children, you need to loop over the
> processes with waitpid().
>
> I think something like this should do it:
>
> ----------
> def sigchld_handler(signum, frame):
> print "Checking for processes..."
> try:
> while 1:
> pid, sts = os.waitpid(-1, os.WNOHANG)
> if pid == 0:
> break
> print "Process %d ended with status %04x." % (pid, sts)
> except OSError, detail:
> if detail.args[0] != errno.ECHILD:
> raise
> else:
> print "No more children."
> else:
> print "Some children yet to finish."
> ----------

Hm, that's similar what I do in my program but I always have as many
zombies as I fork processes, here's a program fragment:

cpu = 4
used = 0
for fn in files:

if cpu:
if used < cpu:
r, w = os.pipe()
pid = os.fork()
used = used + 1
if pid: # parent
os.close(w)
chfds.append(r)
chpids[r] = pid # dict. with filedescriptor as key
# fork children for available processors
if used < cpu and fn != last: continue
else: # child
os.close(r)
param['files'] = [fn]
# some application specific stuff
msa = ProcessMSA(param)
msa.of = None
msa.dbkeys = {}
msa.param = {}
# pickle objecty and send it to parent
p = mpickle.dumps(msa)
r, w, o = select.select([], [w], [])
os.write(w[0], p)
os._exit(0)
# parent reads from ready file descriptors
while used >= cpu or (fn == last and used):
r, w, o = select.select(chfds, [], [])
for i in r:
msa = mpickle.loads(readPickleStream(i))
os.close(i)
stderr.write('number %d\n' % len(stats.ids.keys()))
stats.add(msa) # do something with the received
object
used = used - 1 # adjust number of used processors
chfds.remove(i)
os.waitpid(chpids[i], os.WNOHANG)
del chpids[i]

Why does that code produce zombies? Do I've to send a signal to the
child that it has to die? I mean maybe ther child exits when the parent
hasn't reached the os.waitpid?

Thanks for discussion,

Arne
waste of resources ? [ In reply to ]
Arne Mueller wrote:
>
> chfds.remove(i)
> os.waitpid(chpids[i], os.WNOHANG)
> del chpids[i]

If you're explicitly waiting for a particular child,
*don't* use WNOHANG, because you *want* to block until
that child has exited -- which it may not have done
yet, even though it has written its data.

> I mean maybe ther child exits when the parent
> hasn't reached the os.waitpid?

Other way around -- the parent can reach the waitpid
before the child has exited.

Greg
waste of resources ? [ In reply to ]
On Tue, 8 Jun 1999 01:03:03 +0300, Moshe Zadka wrote:
> On 7 Jun 1999, Hrvoje Niksic wrote:
> <about syscalls throwing EINTR and aborting>
> > I don't know about Perl, but a good fix would be to have the
> > interpreter do the hard work for us. If a system call fails and errno
> > is EINTR, then the signal handler should be called, *and* the syscall
> > should be restarted.
>

I'm agree with this also.

> Ummmmm......
>
> You can do the hard-work yourself, but once and for all:
>

Yes, but this is boring ;)

> def myread(file):
> while 1:
> try:
> return file.read()
> except (IOError, os.error), detail:
> if detail.args[0] != errno.EINTR: raise
>
> And just call myread(foo) instead of foo.read().
> Where's the catch-22?

IMHO a better approach is a function like this:

def temp_failure_retry(func, *args):
while 1:
try:
return apply(func, args)
except (IOError, os.error), detail:
if detail.args[0] != errno.EINTR: raise

(the strange function name come from the similar C macro used in the
GNU libc ;)

An example of the use of this function with open:

fd = temp_failure_retry(open, "pippo", "w");

I don't tryed it, but AFAIK it should work.

Bye,

--
d i e g o
--
To reply remove the numbers and the `x' from my address
--
Sorry for my bad English!
waste of resources ? [ In reply to ]
In article <37606565.5A93BCE9@compaq.com>,
Greg Ewing <greg.ewing@compaq.com> wrote:
>Arne Mueller wrote:
>>
>> chfds.remove(i)
>> os.waitpid(chpids[i], os.WNOHANG)
>> del chpids[i]
>
>If you're explicitly waiting for a particular child, *don't* use
>WNOHANG, because you *want* to block until that child has exited --
>which it may not have done yet, even though it has written its data.

I can't speak to Arne's situation, but I'm about to rewrite a chunk of
code to fork off a specific single child process precisely because I
*don't* want the parent to block while this code executes, but I still
want to reap the child later (and perform some final cleanup).
--
--- Aahz (@netcom.com)

Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/
Androgynous poly kinky vanilla queer het

"That doesn't stop me from wanting to work all three of them over with
the clue stick for while, with no safewords allowed." --abostick
waste of resources ? [ In reply to ]
Aahz Maruch wrote:
>
> I can't speak to Arne's situation, but I'm about to rewrite a chunk of
> code to fork off a specific single child process precisely because I
> *don't* want the parent to block while this code executes, but I still
> want to reap the child later (and perform some final cleanup).

If you have some way of knowing when the child has finished
doing its thing (such as having received results from it),
you can do a blocking wait for that particular pid at that
point, because you know the child has either exited already
or is about to do so very soon.

If you have no idea when the child is expected to exit, your
only portable option is a SIGCHLD handler which loops doing non-blocking
waits until no more children are reaped. The loop is necessary
because if more than one child exits before the handler is
entered, you won't get any extra signals.

Greg