Mailing List Archive

Do subprocess.PIPE and subprocess.STDOUT sametime
Hi @all,
i'm running a program which is still in development with subprocess.run (Python version 3.10), further i need to capture the output of the program in a python variable. The program itself runs about 2 minutes, but it can also freeze in case of new bugs.

For production i run the program with stdout=subprocess.PIPE and i can fetch than the output later. For just testing if the program works, i run with stdout=subprocess.STDOUT and I see all program output on the console, but my program afterwards crashes since there is nothing captured in the python variable. So I think I need to have the functionality of subprocess.PIPE and subprcess.STDOUT sametime.

What I tried until now:
1. Poll the the output and use Popen instead:

# Start the subprocess
process = subprocess.Popen(['./test.sh'], stdout=subprocess.PIPE, stderr=subprocess.PIPE)

captured_output = b''
process_running = True
while process_running:
process_running = (process.poll() is not None)
for pipe in [ process.stdout, process.stderr ]:
while line := pipe.readline():
print(line)
captured_output += line

print(captured_output)
return_code = process.returncode

=> But this is discouraged by the python doc, since it says that polling this way is prone to deadlocks. Instead it proposes the use of the communicate() function.

2. Use communicate() with timeout.
=> This works not at all since when the timeout occurs an exception is thrown and communicate returns at all.

3. Use threading instead
=> For being that simple and universal like subprocess you will more or less reimplement subprocess with threading, like its done in subprocess.py. Just for a debug output the effort is much to high.

#######################################################
Do you have further ideas for implementing such a behavior?
Do you think that a feature request should be done of I'm omitting something obvious?

Thanks you in advance for your suggestions,
Horst.
--
https://mail.python.org/mailman/listinfo/python-list
Re: Do subprocess.PIPE and subprocess.STDOUT sametime [ In reply to ]
On 5/9/23 12:13, Horst Koiner wrote:
> Hi @all,
> i'm running a program which is still in development with subprocess.run (Python version 3.10), further i need to capture the output of the program in a python variable. The program itself runs about 2 minutes, but it can also freeze in case of new bugs.
>
> For production i run the program with stdout=subprocess.PIPE and i can fetch than the output later. For just testing if the program works, i run with stdout=subprocess.STDOUT and I see all program output on the console, but my program afterwards crashes since there is nothing captured in the python variable. So I think I need to have the functionality of subprocess.PIPE and subprcess.STDOUT sametime.

I'm not sure you quite understood what subprocess.STDOUT is for. If you
say nothing stdout is not captured. STDOUT is used as a value for stderr
to mean send it the same place as stdout, which is useful if you set
stdout to something unusual, then you don't have to retype it if you
want stderr going the same place. The subprocess module, afaik, doesn't
even have a case for stdout=STDOUT.

>
> What I tried until now:
> 1. Poll the the output and use Popen instead:
>
> # Start the subprocess
> process = subprocess.Popen(['./test.sh'], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
>
> captured_output = b''
> process_running = True
> while process_running:
> process_running = (process.poll() is not None)
> for pipe in [ process.stdout, process.stderr ]:
> while line := pipe.readline():
> print(line)
> captured_output += line
>
> print(captured_output)
> return_code = process.returncode
>
> => But this is discouraged by the python doc, since it says that polling this way is prone to deadlocks. Instead it proposes the use of the communicate() function.
>
> 2. Use communicate() with timeout.
> => This works not at all since when the timeout occurs an exception is thrown and communicate returns at all.

Well, sure ... if you set timeout, then you need to be prepared to catch
the TimeoutExpired exception and deal with it. That should be entirely
normal.

>
> 3. Use threading instead
> => For being that simple and universal like subprocess you will more or less reimplement subprocess with threading, like its done in subprocess.py. Just for a debug output the effort is much to high.

Not sure I get what this is asking/suggesting. If you don't want to
wait for the subprocess to run, you can use async - that's been fully
implemented.

https://docs.python.org/3/library/asyncio-subprocess.html



--
https://mail.python.org/mailman/listinfo/python-list
Re: Do subprocess.PIPE and subprocess.STDOUT sametime [ In reply to ]
On 5/9/2023 2:13 PM, Horst Koiner wrote:
> Hi @all,
> i'm running a program which is still in development with subprocess.run (Python version 3.10), further i need to capture the output of the program in a python variable. The program itself runs about 2 minutes, but it can also freeze in case of new bugs.
>
> For production i run the program with stdout=subprocess.PIPE and i can fetch than the output later. For just testing if the program works, i run with stdout=subprocess.STDOUT and I see all program output on the console, but my program afterwards crashes since there is nothing captured in the python variable. So I think I need to have the functionality of subprocess.PIPE and subprcess.STDOUT sametime.
>
> What I tried until now:
> 1. Poll the the output and use Popen instead:
>
> # Start the subprocess
> process = subprocess.Popen(['./test.sh'], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
>
> captured_output = b''
> process_running = True
> while process_running:
> process_running = (process.poll() is not None)
> for pipe in [ process.stdout, process.stderr ]:
> while line := pipe.readline():
> print(line)
> captured_output += line
>
> print(captured_output)
> return_code = process.returncode
>
> => But this is discouraged by the python doc, since it says that polling this way is prone to deadlocks. Instead it proposes the use of the communicate() function.
>
> 2. Use communicate() with timeout.
> => This works not at all since when the timeout occurs an exception is thrown and communicate returns at all.
>
> 3. Use threading instead
> => For being that simple and universal like subprocess you will more or less reimplement subprocess with threading, like its done in subprocess.py. Just for a debug output the effort is much to high.
>
> #######################################################
> Do you have further ideas for implementing such a behavior?
> Do you think that a feature request should be done of I'm omitting something obvious?

I'm not sure if this exactly fits your situation, but if you use
subprocess with pipes, you can often get a deadlock because the stdout
(or stderr, I suppose) pipe has a small capacity and fills up quickly
(at least on Windows), then it blocks until it is emptied by a read.
But if you aren't polling, you don't know there is something to read so
the pipe never gets emptied. And if you don't read it before the pipe
has filled up, you may lose data.

I solved that by running communicate() on a separate thread. Let the
communicate block the thread until the process has completed, then have
the thread send the result back to the main program. Of course, this
won't work if your process doesn't end since you won't get results until
the process ends.

--
https://mail.python.org/mailman/listinfo/python-list
Re: Do subprocess.PIPE and subprocess.STDOUT sametime [ In reply to ]
Horst Koiner ha scritto:
> Hi @all,
> i'm running a program which is still in development with subprocess.run (Python version 3.10), further i need to capture the output of the program in a python variable. The program itself runs about 2 minutes, but it can also freeze in case of new bugs.
>
> For production i run the program with stdout=subprocess.PIPE and i can fetch than the output later. For just testing if the program works, i run with stdout=subprocess.STDOUT and I see all program output on the console, but my program afterwards crashes since there is nothing captured in the python variable. So I think I need to have the functionality of subprocess.PIPE and subprcess.STDOUT sametime.
>
> What I tried until now:
> 1. Poll the the output and use Popen instead:
>
> # Start the subprocess
> process = subprocess.Popen(['./test.sh'], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
>
> captured_output = b''
> process_running = True
> while process_running:
> process_running = (process.poll() is not None)
> for pipe in [ process.stdout, process.stderr ]:
> while line := pipe.readline():
> print(line)
> captured_output += line
>
> print(captured_output)
> return_code = process.returncode
>
> => But this is discouraged by the python doc, since it says that polling this way is prone to deadlocks. Instead it proposes the use of the communicate() function.
>
> 2. Use communicate() with timeout.
> => This works not at all since when the timeout occurs an exception is thrown and communicate returns at all.
>
> 3. Use threading instead
> => For being that simple and universal like subprocess you will more or less reimplement subprocess with threading, like its done in subprocess.py. Just for a debug output the effort is much to high.
>
> #######################################################
> Do you have further ideas for implementing such a behavior?
> Do you think that a feature request should be done of I'm omitting something obvious?
>
> Thanks you in advance for your suggestions,
> Horst.
>

I agree with @'thomas Passin' but I solved in a different way, I made
the Readline() not blocking. even if I believe his idea better than
mine:

os.set_blocking(process.stdout.fileno(), False)
os.set_blocking(process.stderr.fileno(), False)

--
https://mail.python.org/mailman/listinfo/python-list
Re: Do subprocess.PIPE and subprocess.STDOUT sametime [ In reply to ]
On 5/9/23, Thomas Passin <list1@tompassin.net> wrote:
>
> I'm not sure if this exactly fits your situation, but if you use
> subprocess with pipes, you can often get a deadlock because the stdout
> (or stderr, I suppose) pipe has a small capacity and fills up quickly
> (at least on Windows),

The pipe size is relatively small on Windows only because
subprocess.Popen uses the default pipe size when it calls WinAPI
CreatePipe(). The default size is 4 KiB, which actually should be big
enough for most cases. If some other pipe size is passed, the value is
"advisory", meaning that it has to be within the allowed range (but
there's no practical limit on the size) and that it gets rounded up to
an allocation boundary (e.g. a multiple of the system's virtual-memory
page size). For example, here's a 256 MiB pipe:

>>> hr, hw = _winapi.CreatePipe(None, 256*1024*1024)
>>> _winapi.WriteFile(hw, b'a' * (256*1024*1024))
(268435456, 0)
>>> data = _winapi.ReadFile(hr, 256*1024*1024)[0]
>>> len(data) == 256*1024*1024
True

> then it blocks until it is emptied by a read.
> But if you aren't polling, you don't know there is something to read so
> the pipe never gets emptied. And if you don't read it before the pipe
> has filled up, you may lose data.

If there's just one pipe, then there's no potential for deadlock, and
no potential to lose data. If there's a timeout, however, then
communicate() still has to use I/O polling or a thread to avoid
blocking indefinitely in order to honor the timeout.

Note that there's a bug in subprocess on Windows. Popen._communicate()
should create a new thread for each pipe. However, it actually calls
stdin.write() on the current thread, which could block and ignore the
specified timeout. For example, in the following case the timeout of 5
seconds is ignored:

>>> cmd = 'python -c "import time; time.sleep(20)"'
>>> t0 = time.time(); p = subprocess.Popen(cmd, stdin=subprocess.PIPE)
>>> r = p.communicate(b'a'*4097, timeout=5); t1 = time.time() - t0
>>> t1
20.2162926197052

There's a potential for deadlock when two or more pipes are accessed
synchronously by two threads (e.g. one thread in each process). For
example, reading from one of the pipes blocks one of the threads
because the pipe is empty, while at the same time writing to the other
pipe blocks the other thread because the pipe is full. However, there
will be no deadlock if at least one of the threads always polls the
pipes to ensure that they're ready (i.e. data is available to be read,
or at least PIPE_BUF bytes can be written without blocking), which is
how communicate() is implemented on POSIX. Alternatively, one of the
processes can use a separate thread for each pipe, which is how
communicate() is implemented on Windows.

Note that there are problems with the naive implementation of the
reader threads on Windows, in particular if a pipe handle leaks to
descendants of the child process, which prevents the pipe from
closing. A better implementation on Windows would use named pipes
opened in asynchronous mode on the parent side and synchronous mode on
the child side. Just implement a loop that handles I/O completion
using events, APCs, or an I/O completion port.
--
https://mail.python.org/mailman/listinfo/python-list
Re: Do subprocess.PIPE and subprocess.STDOUT sametime [ In reply to ]
Horst Koiner wrote at 2023-5-9 11:13 -0700:
> ...
>For production i run the program with stdout=subprocess.PIPE and i can fetch than the output later. For just testing if the program works, i run with stdout=subprocess.STDOUT and I see all program output on the console, but my program afterwards crashes since there is nothing captured in the python variable. So I think I need to have the functionality of subprocess.PIPE and subprcess.STDOUT sametime.

You might want to implement the functionality of the *nix programm
`tee` in Python.
`tee` reads from one file and writes the data to several files,
i.e. it multiplexes one input file to several output files.

Pyhton's `tee` would likely be implemented by a separate thread.

For your case, the input file could be the subprocess's pipe
and the output files `sys.stdout` and a pipe created by your own
used by your application in place of the subprocess's pipe.
--
https://mail.python.org/mailman/listinfo/python-list
Re: Do subprocess.PIPE and subprocess.STDOUT sametime [ In reply to ]
On 5/10/23 12:51, Dieter Maurer wrote:
> Horst Koiner wrote at 2023-5-9 11:13 -0700:
>> ...
>> For production i run the program with stdout=subprocess.PIPE and i can fetch than the output later. For just testing if the program works, i run with stdout=subprocess.STDOUT and I see all program output on the console, but my program afterwards crashes since there is nothing captured in the python variable. So I think I need to have the functionality of subprocess.PIPE and subprcess.STDOUT sametime.
>
> You might want to implement the functionality of the *nix programm
> `tee` in Python.
> `tee` reads from one file and writes the data to several files,
> i.e. it multiplexes one input file to several output files.
>
> Pyhton's `tee` would likely be implemented by a separate thread.
>
> For your case, the input file could be the subprocess's pipe
> and the output files `sys.stdout` and a pipe created by your own
> used by your application in place of the subprocess's pipe.

should you choose to go this route, there are multiple efforts floating
around on the internet, worth a look. Don't know which are good and
which aren't. Went looking once to see if there was something to
replace a homegrown function that wasn't reliable - ended up solving
that particular problem a different way so didn't use any of the tees.


--
https://mail.python.org/mailman/listinfo/python-list
Re: Do subprocess.PIPE and subprocess.STDOUT sametime [ In reply to ]
Using asyncio for this is a good possibility I was not aware of.

My best try with asyncio was:
import asyncio

async def run_command():
# Create subprocess
process = await asyncio.create_subprocess_exec(
'./test.sh',
stdout=asyncio.subprocess.PIPE, # Redirect stdout to a pipe
stderr=asyncio.subprocess.PIPE # Redirect stderr to a pipe
)

# Read stdout and stderr asynchronously
captured_output = b''
async for line in process.stdout:
print(line.decode().strip())
captured_output += line
async for line in process.stderr:
print(line.decode().strip())
captured_output += line

await process.wait()
print(captured_output)


# Run the asyncio event loop
asyncio.run(run_command())
########################################

This fulfills all my requirements. A nice to have would that the captured_output has not to be constructed with += 's but with a final seek(0) and read() of process.stdout. But I didn't find anything how to rewind the stream, that i can read the whole output again.
Another question is, if this solution is deadlock proof.

Thank you all for the already very valuable input!

Greetings,
Horst
--
https://mail.python.org/mailman/listinfo/python-list
Re: Do subprocess.PIPE and subprocess.STDOUT sametime [ In reply to ]
On Sat, 13 May 2023 at 07:21, Horst Koiner <koinerhorst6@gmail.com> wrote:
>
> Using asyncio for this is a good possibility I was not aware of.
>
> My best try with asyncio was:
> import asyncio
>
> async def run_command():
> # Create subprocess
> process = await asyncio.create_subprocess_exec(
> './test.sh',
> stdout=asyncio.subprocess.PIPE, # Redirect stdout to a pipe
> stderr=asyncio.subprocess.PIPE # Redirect stderr to a pipe
> )
>
> # Read stdout and stderr asynchronously
> captured_output = b''
> async for line in process.stdout:
> print(line.decode().strip())
> captured_output += line
> async for line in process.stderr:
> print(line.decode().strip())
> captured_output += line
>
> await process.wait()
> print(captured_output)
>
>
> # Run the asyncio event loop
> asyncio.run(run_command())
> ########################################
>
> This fulfills all my requirements. A nice to have would that the captured_output has not to be constructed with += 's but with a final seek(0) and read() of process.stdout. But I didn't find anything how to rewind the stream, that i can read the whole output again.
> Another question is, if this solution is deadlock proof.
>

No it's not, but the best part is, it's really close to! Asynchronous
I/O is perfect for this: you need to wait for any of three events
(data on stdout, data on stderr, or process termination). So here it
is as three tasks:

captured_output = b""
async def collect_output(stream):
global captured_output
async for line in stream:
print(line.decode().strip())
captured_output += line

(You can play around with other ways of scoping this, I'm just using a
global for simplicity)

Inside run_command, you can then spawn three independent tasks and
await them simultaneously. Once all three finish, you have your
captured output, and the process will have terminated.

This would then be guaranteed deadlock-proof (and, if you needed to
feed data on stdin, you could do that with a fourth task and it'd
still be deadlock-proof, even if it's more than one pipe buffer of
input), since all the pipes are being managed concurrently.

Even cooler? You can scale this up to multiple processes by calling
run_command more than once as separate tasks!

ChrisA
--
https://mail.python.org/mailman/listinfo/python-list