Mailing List Archive

First two bytes of 'stdout' are lost
I am trying to use StringIO to capture stdout, in code that looks like this:

import sys
from io import StringIO
old_stdout = sys.stdout
sys.stdout = mystdout = StringIO()
print( "patate")
mystdout.seek(0)
sys.stdout = old_stdout
print(mystdout.read())

Well, it is not exactly like this, since this works properly

This code is actually run from C++ using the C Python API.
This worked quite well, so the code was right at some point. But now,
two things changed:
- Now using python 3.11.7 instead of 3.7.12
- Now using only the python limited C API

And it seems that now, mystdout.read() always misses the first two
characters that have been written to stdout.

My first ideas was something related to the BOM improperly truncated
at some point, but i am manipulating UTF-8, so the bom would be 3
bytes, not 2.

I ruled out wrong C++ code to extract the string from the python
variable, since running a python print of the content of mystdout in
the real stdout also misses the two first characters.

Hopefully someone has a clue on what would have changed in Python for
this to stop working compared to python 3.7?
--
https://mail.python.org/mailman/listinfo/python-list
Re: First two bytes of 'stdout' are lost [ In reply to ]
Partly answering myself:

For some reason, right after mystdout has been created, i now have to
do mystdout.seek(0) and this solves the issue.

No idea why though..

Le jeu. 11 avr. 2024 à 14:42, Olivier B.
<perso.olivier.barthelemy@gmail.com> a écrit :
>
> I am trying to use StringIO to capture stdout, in code that looks like this:
>
> import sys
> from io import StringIO
> old_stdout = sys.stdout
> sys.stdout = mystdout = StringIO()
> print( "patate")
> mystdout.seek(0)
> sys.stdout = old_stdout
> print(mystdout.read())
>
> Well, it is not exactly like this, since this works properly
>
> This code is actually run from C++ using the C Python API.
> This worked quite well, so the code was right at some point. But now,
> two things changed:
> - Now using python 3.11.7 instead of 3.7.12
> - Now using only the python limited C API
>
> And it seems that now, mystdout.read() always misses the first two
> characters that have been written to stdout.
>
> My first ideas was something related to the BOM improperly truncated
> at some point, but i am manipulating UTF-8, so the bom would be 3
> bytes, not 2.
>
> I ruled out wrong C++ code to extract the string from the python
> variable, since running a python print of the content of mystdout in
> the real stdout also misses the two first characters.
>
> Hopefully someone has a clue on what would have changed in Python for
> this to stop working compared to python 3.7?
--
https://mail.python.org/mailman/listinfo/python-list
Re: First two bytes of 'stdout' are lost [ In reply to ]
On 4/11/2024 8:42 AM, Olivier B. via Python-list wrote:
> I am trying to use StringIO to capture stdout, in code that looks like this:
>
> import sys
> from io import StringIO
> old_stdout = sys.stdout
> sys.stdout = mystdout = StringIO()
> print( "patate")
> mystdout.seek(0)
> sys.stdout = old_stdout
> print(mystdout.read())
>
> Well, it is not exactly like this, since this works properly
>
> This code is actually run from C++ using the C Python API.
> This worked quite well, so the code was right at some point. But now,
> two things changed:
> - Now using python 3.11.7 instead of 3.7.12
> - Now using only the python limited C API
>
> And it seems that now, mystdout.read() always misses the first two
> characters that have been written to stdout.
>
> My first ideas was something related to the BOM improperly truncated
> at some point, but i am manipulating UTF-8, so the bom would be 3
> bytes, not 2.
>
> I ruled out wrong C++ code to extract the string from the python
> variable, since running a python print of the content of mystdout in
> the real stdout also misses the two first characters.
>
> Hopefully someone has a clue on what would have changed in Python for
> this to stop working compared to python 3.7?

I've not used the C API, so just for fun I asked ChatGPT about this and
it suggested that a flush after writing to StringIO might do it. It
suggested using a custom class for this purpose:

class MyStringIO(StringIO):
def write(self, s):
# Override write method to ensure all characters are written
correctly
super().write(s)
self.flush()

You would use it like this:

sys.stdout = mystdout = MyStringIO()

I haven't tested it but it seems reasonable, although I would have
naively expected to lose bytes from the end, not the beginning.
--
https://mail.python.org/mailman/listinfo/python-list
Re: First two bytes of 'stdout' are lost [ In reply to ]
On 11Apr2024 14:42, Olivier B. <perso.olivier.barthelemy@gmail.com> wrote:
>I am trying to use StringIO to capture stdout, in code that looks like this:
>
>import sys
>from io import StringIO
>old_stdout = sys.stdout
>sys.stdout = mystdout = StringIO()
>print( "patate")
>mystdout.seek(0)
>sys.stdout = old_stdout
>print(mystdout.read())
>
>Well, it is not exactly like this, since this works properly

Aye, I just tried that. All good.

>This code is actually run from C++ using the C Python API.
>This worked quite well, so the code was right at some point. But now,
>two things changed:
> - Now using python 3.11.7 instead of 3.7.12
> - Now using only the python limited C API

Maybe you should post the code then: the exact Python code and the exact
C++ code.

>And it seems that now, mystdout.read() always misses the first two
>characters that have been written to stdout.
>
>My first ideas was something related to the BOM improperly truncated
>at some point, but i am manipulating UTF-8, so the bom would be 3
>bytes, not 2.

I didn't think UTF-8 needed a BOM. Somone will doubtless correct me.

However, does the `mystdout.read()` code _know_ you're using UTF-8? I
have the vague impression that eg some Windows systems default to UTF-16
of some flavour, possibly _with_ a BOM.

I'm suggesting that you rigorously check that the bytes->text bits know
what text encoding they're using. If you've left an encoding out
anywhere, put it in explicitly.

>Hopefully someone has a clue on what would have changed in Python for
>this to stop working compared to python 3.7?

None at all, alas. My experience with the Python C API is very limited.
--
https://mail.python.org/mailman/listinfo/python-list