Mailing List Archive

[issue5110] Printing Unicode chars from the interpreter in a non-UTF8 terminal (Py3)
New submission from Ezio Melotti <ezio.melotti@gmail.com>:

In Py2.x
>>> u'\2620'
outputs u'\2620' whereas
>>> print u'\2620'
raises an error.

Instead, in Py3.x, both
>>> '\u2620'
and
>>> print('\u2620')
raise an error if the terminal doesn't use an encoding able to display
the character (e.g. the windows terminal used for these examples).

This is caused by the new string representation defined in the PEP3138[1].

Consider also the following example:
Py2:
>>> [u'\u2620']
[u'\u2620']
Py3:
>>> ['\u2620']
UnicodeEncodeError: 'charmap' codec can't encode character '\u2620' in
position 9: character maps to <undefined>

This means that there is no way to print lists (or other objects) that
contain characters that can't be encoded.
Two workarounds may be:
1) encode all the elements of the list, but it's not practical;
2) use ascii(), but it adds extra "" around the output and escape
backslashes and apostrophes (and it won't be possible to use _[0] in the
next line).

Also note that in Py3
>>> ['\ud800']
['\ud800']
>>> _[0]
'\ud800'
works, because U+D800 belongs to the category "Cs (Other, Surrogate)"
and it is escaped[2].

The best solution is probably to change the default error-handler of the
Python3 interactive interpreter to 'backslashreplace' in order to avoid
this behavior, but I don't know if it's possible only for ">>> foo" and
not for ">>> print(foo)" (print() should still raise an error as it does
in Py2).

This proposal has already been refused in the PEP3138[3] but there are
no links to the discussion that led to this decision.

I think this should be rediscussed and possibly changed, because, even
if can't see the "listOfJapaneseStrings"[4], I still prefer to see a
sequence of escaped chars than a UnicodeEncodeError.

[1]: http://www.python.org/dev/peps/pep-3138/
[2]: http://www.python.org/dev/peps/pep-3138/#specification
[3]: http://www.python.org/dev/peps/pep-3138/#rejected-proposals
[4]: http://www.python.org/dev/peps/pep-3138/#motivation

----------
components: Unicode
messages: 80820
nosy: ezio.melotti
severity: normal
status: open
title: Printing Unicode chars from the interpreter in a non-UTF8 terminal (Py3)
type: behavior
versions: Python 3.0

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue5110>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue5110] Printing Unicode chars from the interpreter in a non-UTF8 terminal (Py3) [ In reply to ]
STINNER Victor <victor.stinner@haypocalc.com> added the comment:

To be clear, this issue only affects the interpreter.

> 2) use ascii(), but it adds extra "" around the output

It doesn't ass extra "" if you replace repr() by ascii() in the
interpreter code (sys.displayhook)?

> The best solution is probably to change the default error-handler
> of the Python3 interactive interpreter to 'backslashreplace'
> in order to avoid this behavior, (...)

Hum, it implies that sys.stdout has a different behaviour in the
interpreter and when running a script. We can expect many bugs ports
from newbies "the example works in the terminal/IDLE, but not in my
script, HELP!". So I prefer ascii().

----------
nosy: +haypo

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue5110>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue5110] Printing Unicode chars from the interpreter in a non-UTF8 terminal (Py3) [ In reply to ]
STINNER Victor <victor.stinner@haypocalc.com> added the comment:

You change change the display hook with a site.py script (which have
to be in sys.path) :
---------
import sys

def hook(message):
print(ascii(message))

sys.displayhook = hook
---------

Example (run python in an empty environment to get ASCII charset):
---------
$ env -i PYTHONPATH=$PWD ./python
Python 3.1a0 (py3k:69105M, Jan 30 2009, 10:36:27)
>>> import sys
>>> sys.stdout.encoding
'ANSI_X3.4-1968'
>>> "\xe9"
'\xe9'
>>> print("\xe9")
Traceback (most recent call last):
(...)
UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' (...)
---------

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue5110>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue5110] Printing Unicode chars from the interpreter in a non-UTF8 terminal (Py3) [ In reply to ]
Ezio Melotti <ezio.melotti@gmail.com> added the comment:

This seems to solve the problem, but apparently the interactive "_"
doesn't work anymore.

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue5110>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue5110] Printing Unicode chars from the interpreter in a non-UTF8 terminal (Py3) [ In reply to ]
STINNER Victor <victor.stinner@haypocalc.com> added the comment:

Oh yeah, original sys.displayhook uses a special hack for the _ global
variable:
---------
import sys
import builtins

def hook(message):
if message is None:
return
builtins._ = message
print(ascii(message))

sys.displayhook = hook
---------

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue5110>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue5110] Printing Unicode chars from the interpreter in a non-UTF8 terminal (Py3) [ In reply to ]
STINNER Victor <victor.stinner@haypocalc.com> added the comment:

Here is a patch to use ascii() directly in sys_displayhook() (with an
unit test!).

----------
keywords: +patch
Added file: http://bugs.python.org/file12894/display_hook_ascii.patch

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue5110>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue5110] Printing Unicode chars from the interpreter in a non-UTF8 terminal (Py3) [ In reply to ]
Changes by Giampaolo Rodola' <billiejoex@users.sourceforge.net>:


----------
nosy: +giampaolo.rodola

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue5110>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue5110] Printing Unicode chars from the interpreter in a non-UTF8 terminal (Py3) [ In reply to ]
Martin v. Löwis <martin@v.loewis.de> added the comment:

Victor, I'm not sure whether you are proposing that
display_hook_ascii.patch is included into Python. IIUC, this patch
breaks PEP3138, so it clearly must be rejected.

Overall, I fail to see the bug in this report. Python 3.0 works as
designed as shown here.

----------
nosy: +loewis

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue5110>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue5110] Printing Unicode chars from the interpreter in a non-UTF8 terminal (Py3) [ In reply to ]
Ezio Melotti <ezio.melotti@gmail.com> added the comment:

This seems to fix the problem:
------------------------------
import sys
import builtins

def hook(message):
if message is None:
return
builtins._ = message
try:
print(repr(message))
except UnicodeEncodeError:
print(ascii(message))

sys.displayhook = hook
------------------------------
Just to clarify:
* The current Py3 behavior works fine in UTF8 terminals
* It doesn't work on non-UTF8 terminals if they can't encode the chars
(they raise an error)
* It only affects the interactive interpreter
* This new patch escapes the chars instead of raise an error only on
non-UTF8 terminal and only when printed as ">>> foo" (without print())
and leaves the other behaviors unchanged
* This is related to Py3 only

Apparently the patch provided by Victor always escapes the non-ascii
chars. This new hook function prints the Unicode chars if possible and
escapes them if not. On a UTF8 terminal the behavior is unchanged, on a
non-UTF8 terminal all the chars that can not be encoded will now be escaped.

This only changes the behavior of ">>> foo", so it can not lead to
confusion ("It works in the interpreter but not in the script"). In a
script one can't write "foo" alone but "print(foo)" and the behavior of
"print(foo)" is the same in both the interpreter and the scripts (with
the patch applied):

>>> ['\u2620']
['\u2620']
>>> print(['\u2620'])
UnicodeEncodeError: 'charmap' codec can't encode character '\u2620' in
position 2: character maps to <undefined>

I think that the PEP3138 didn't consider this issue. Its purpose is to
have a better output (Unicode chars instead of escaped chars), but it
only works with UTF8 terminals, on non-UTF8 terminals the output is
worse (UnicodeEncodeError instead of escaped chars).

This is an improvement and I can't see any negative side-effect.

Attached there's a txt with more example, on Py2 and Py3, on
Windows(non-UTF8 terminal) and Linux (UTF8 terminal), with and without
my patch.

Added file: http://bugs.python.org/file12900/issue5110.txt

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue5110>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue5110] Printing Unicode chars from the interpreter in a non-UTF8 terminal (Py3) [ In reply to ]
STINNER Victor <victor.stinner@haypocalc.com> added the comment:

> Victor, I'm not sure whether you are proposing that
> display_hook_ascii.patch is included into Python. IIUC, this patch
> breaks PEP3138, so it clearly must be rejected.
>
> Overall, I fail to see the bug in this report. Python 3.0 works as
> designed as shown here.

The idea is to avoid unicode error (by replacing not printable characters by
their code in hexadecimal) when the display hook tries to display a message
which is not printable in the terminal charset.

It's just to make Python3 interpreter a little bit more "user friendly" on
Windows.

Problem: use different (encoding) rule for the display hook and for print()
may disturb new users (Why does ">>> chr(...)" work whereas ">>>
print(chr(...))" fails?).

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue5110>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com