Mailing List Archive

[issue3975] PyTraceBack_Print() doesn't respect # coding: xxx header
New submission from STINNER Victor <victor.stinner@haypocalc.com>:

PyTraceBack_Print() doesn't take care of the "# coding: xxx" header of
a Python script. It calls _Py_DisplaySourceLine() which opens the file
as a byte stream (and not an unicode characters stream). Because of
this problem, the traceback maybe truncated or invalid. Example (write
it into a file and execute the file):
----
from sys import executable
from os import execvpe
filename = "pouet.py"
out = open(filename, "wb")
out.write(b"""# -*- coding: GBK -*-
print("--asc\xA1\xA7i--")
raise Exception("--asc\xA1\xA7i--")""")
out.close()
execvpe(executable, [executable, filename], None)
----

This issue depends on issue2384 (line number).

Note: Python 2.6 may also has the problem but it doesn't parse "#
coding: GBK". So it's a different problem (issue?).

----------
messages: 73851
nosy: haypo
severity: normal
status: open
title: PyTraceBack_Print() doesn't respect # coding: xxx header
versions: Python 3.0

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue3975>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue3975] PyTraceBack_Print() doesn't respect # coding: xxx header [ In reply to ]
STINNER Victor <victor.stinner@haypocalc.com> added the comment:

Here is a new version of _Py_DisplaySourceLine() using
PyTokenizer_FindEncoding() to read the coding header, and
PyFile_FromFd() to create an "unicode-awake" file. The code could be
optimized, but it least it displays correctly the file line ;-)

The code is based on the original _Py_DisplaySourceLine() and
call_find_module() (import.c).

Notes:
* The code is young and new, it might be delayed until python 3.1
* Some functions may raise new exceptions (eg. MemoryError). I don't
know how Python will react if an exception is raised during
PyTraceBack_Print() ?
* The return code is 0 for success, but is it -1 or 1 for an error? It
looks like error is "result != 0", so both -1 and 1 should be valid. I
used -1.

----------
keywords: +patch
Added file: http://bugs.python.org/file11611/traceback_unicode.patch

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue3975>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue3975] PyTraceBack_Print() doesn't respect # coding: xxx header [ In reply to ]
Changes by STINNER Victor <victor.stinner@haypocalc.com>:


Removed file: http://bugs.python.org/file11611/traceback_unicode.patch

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue3975>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue3975] PyTraceBack_Print() doesn't respect # coding: xxx header [ In reply to ]
STINNER Victor <victor.stinner@haypocalc.com> added the comment:

(oops, first patch included an useless whitespace change)

Added file: http://bugs.python.org/file11612/traceback_unicode.patch

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue3975>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue3975] PyTraceBack_Print() doesn't respect # coding: xxx header [ In reply to ]
STINNER Victor <victor.stinner@haypocalc.com> added the comment:

Ooops, my first version introduces a regression: if file open fails,
the traceback printing was stopped. Here is a new version of my patch
to support #coding: header in _Py_DisplaySourceLine(). It doesn't
print the line of file open fails, but continue to display the end of
the traceback.

But print still stops on PyFile_WriteObject() or PyFile_WriteString().
If PyFile fails, I guess that next print will also fails. (it's also
the current behaviour of PyTraceBack_Print).

Example:
----
Python 3.0rc1+ (py3k:66643M, Sep 27 2008, 17:11:51)
>>> raise Exception('err')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
Exception: err
----

The line is not displayed (why? no idea), but the exception
("Exception: err") is still displayed.

Added file: http://bugs.python.org/file11633/traceback_unicode-2.patch

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue3975>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue3975] PyTraceBack_Print() doesn't respect # coding: xxx header [ In reply to ]
Amaury Forgeot d'Arc <amauryfa@gmail.com> added the comment:

> The line is not displayed (why? no idea)
Well, it's difficult to re-open <stdin>...

----------
nosy: +amaury.forgeotdarc

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue3975>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue3975] PyTraceBack_Print() doesn't respect # coding: xxx header [ In reply to ]
Amaury Forgeot d'Arc <amauryfa@gmail.com> added the comment:

The patch goes in the good direction, here are some remarks:

- The two calls to open() are missing the O_BINARY flag, necessary on
Windows to avoid newline translation. This may be important for some codecs.

- the GIL should be released around calls to open(); searching the whole
sys.path can be long.

Besides this, the "char* filename" is relevant only on utf8-encoded
filesystems, and will not work on windows with non-ascii filenames. But
this is another issue.

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue3975>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue3975] PyTraceBack_Print() doesn't respect # coding: xxx header [ In reply to ]
STINNER Victor <victor.stinner@haypocalc.com> added the comment:

Le Tuesday 07 October 2008 15:06:01 Amaury Forgeot d'Arc, vous avez écrit :
> - The two calls to open() are missing the O_BINARY flag, necessary on
> Windows to avoid newline translation. This may be important for some
> codecs.

Oops, I always forget Windows specific options :-/

> - the GIL should be released around calls to open(); searching the whole
> sys.path can be long.

Ok, O_BINARY and GIL fixed.

I just realized that sys.path is considered as a bytes string (PyBytes_Check)
whereas Python3 uses an unicode string!

['', '/home/haypo/prog/python-ptrace', (...)]
>>> sys.path.append(b'bytes'); sys.path
['', '/home/haypo/prog/python-ptrace', '(...)', b'bytes']

Oops, so it's possible to mix strings and bytes. But the normal case is a list
of unicode strings. Another fix is needed :-/

I don't know how to get into "if (fd < 0)" (code using sys.path). Any clue?

> Besides this, the "char* filename" is relevant only on utf8-encoded
> filesystems, and will not work on windows with non-ascii filenames. But
> this is another issue.

I don't know the Windows API enough to answer.

Doesn't Python provide "high level" functions to open a file?

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue3975>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue3975] PyTraceBack_Print() doesn't respect # coding: xxx header [ In reply to ]
Amaury Forgeot d'Arc <amauryfa@gmail.com> added the comment:

> Ok, O_BINARY and GIL fixed.
which patch? ;-)

> I just realized that sys.path is considered as a bytes string
> (PyBytes_Check) whereas Python3 uses an unicode string!
>
> ['', '/home/haypo/prog/python-ptrace', (...)]
> >>> sys.path.append(b'bytes'); sys.path
> ['', '/home/haypo/prog/python-ptrace', '(...)', b'bytes']
>
> Oops, so it's possible to mix strings and bytes. But the normal case
is a list
> of unicode strings. Another fix is needed :-/

You could do like the import function, which converts unicode with the
fs encoding. See the code in import.c, below "PyList_GetItem(path, i)"

> I don't know how to get into "if (fd < 0)" (code using sys.path).
> Any clue?

The .pyc stores the full path of the .py, at compilation time. If the
directory is moved, or accessed with another name (via mounts or soft
links), the traceback displays the filename as stored in the .pyc file,
but the content can still be displayed. It does not work every time,
though, for example when the directory is shared between Windows & Unix.

> > Besides this, the "char* filename" is relevant only on utf8-encoded
> > filesystems, and will not work on windows with non-ascii filenames.
> > But this is another issue.
>
> I don't know the Windows API enough to answer.

Windows interprets the char* variables with its system-set "mbcs"
encoding. On my machine, it is equivalent to the cp1252 encoding (almost
equivalent latin-1). Since the 'filename' comes from a call to
_PyUnicode_AsString(tb->tb_frame->f_code->co_filename), it will be wrong
with any accented letter.

> Doesn't Python provide "high level" functions to open a file?
io.open()?

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue3975>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue3975] PyTraceBack_Print() doesn't respect # coding: xxx header [ In reply to ]
Changes by STINNER Victor <victor.stinner@haypocalc.com>:


Added file: http://bugs.python.org/file11732/traceback_unicode-3.patch

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue3975>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue3975] PyTraceBack_Print() doesn't respect # coding: xxx header [ In reply to ]
Amaury Forgeot d'Arc <amauryfa@gmail.com> added the comment:

I still have remarks about traceback_unicode-3.patch, that I did not see
before:
- (a minor thing) PyMem_FREE(found_encoding) could appear only once,
just after PyFile_FromFd.
- I feel it dangerous to call the PyUnicode_AS_UNICODE() macro without
checking that PyFile_GetLine() actually returned a PyUnicode object.
- If the "truncated" variable is NULL, an exception has been set; it is
necessary to exit the function.

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue3975>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue3975] PyTraceBack_Print() doesn't respect # coding: xxx header [ In reply to ]
STINNER Victor <victor.stinner@haypocalc.com> added the comment:

Thanks for your remarks amaury. I improved my patch:
- PyMem_FREE(found_encoding) is called just after PyFile_FromFd()
- Create static subfunction _Py_FindSourceFile(): find a file in
sys.path
- Consider that sys.path contains only unicode (and not bytes) string
- Clear exception on "truncated = PyUnicode_FromUnicode(p, len);"
error, but continue the execution
- I added PyUnicode_check(lineobj)
- Replace open(filename) by open(namebuf) (while searching the full
path of the file) <= fix a regression introduced by my patch

Should I stop on the first error instead of using PyErr_Clear()? I
would like to display the traceback even if an function failed.

Sum up of the patch version 4:
- use open(O_RDONLY | O_BINARY) + PyFile_FromFd() instead of fopen()
- open the file in unicode mode using the Python encoding found in
the "#coding:..." header
- consider sys.path as a list of unicode strings (and not of bytes
strings)

I used _PyUnicode_AsStringAndSize() to convert unicode to string to be
consistent with tb_printinternal(). As you noticed, it uses UTF-8
which should is on Windows :-/ I propose to open a new issue when this
one will be closed :-)

Added file: http://bugs.python.org/file11736/traceback_unicode-4.patch

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue3975>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue3975] PyTraceBack_Print() doesn't respect # coding: xxx header [ In reply to ]
Amaury Forgeot d'Arc <amauryfa@gmail.com> added the comment:

One last thing: there is a path where lineobj is not freed (when PyUnicode_Check(lineobj) is false); I suggest to move
Py_XDECREF(lineobj) just before the final return statement.
Reference counting is fun ;-)

> Should I stop on the first error instead of using PyErr_Clear()?
> I would like to display the traceback even if an function failed.
You are right. Common failures should clear the error and return 0.
In your patch, there is one remaining place, the call to PyFile_GetLine().

More fun will arise when my Windows terminal (encoding=cp1252) will try
to display Chinese characters. Let's pretend this is yet another issue.

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue3975>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue3975] PyTraceBack_Print() doesn't respect # coding: xxx header [ In reply to ]
Changes by STINNER Victor <victor.stinner@haypocalc.com>:


Removed file: http://bugs.python.org/file11612/traceback_unicode.patch

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue3975>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue3975] PyTraceBack_Print() doesn't respect # coding: xxx header [ In reply to ]
Changes by STINNER Victor <victor.stinner@haypocalc.com>:


Removed file: http://bugs.python.org/file11633/traceback_unicode-2.patch

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue3975>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue3975] PyTraceBack_Print() doesn't respect # coding: xxx header [ In reply to ]
Changes by STINNER Victor <victor.stinner@haypocalc.com>:


Removed file: http://bugs.python.org/file11732/traceback_unicode-3.patch

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue3975>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue3975] PyTraceBack_Print() doesn't respect # coding: xxx header [ In reply to ]
STINNER Victor <victor.stinner@haypocalc.com> added the comment:

> More fun will arise when my Windows terminal (encoding=cp1252)
> will try to display Chinese characters. Let's pretend this is
> yet another issue.

I tried the patch using a script with unicode characters (character
not representable in ISO-8859-1 like polish characters ł and Ł).

Result in an UTF-8 terminal (my default locale):
Traceback (most recent call last):
File "unicode.py", line 2, in <module>
raise ValueError("unicode: Łł")
ValueError: unicode: Łł
=> correct

Result in an ISO-8859-1 terminal (I changed the encoding in my Konsole
configuration):
Traceback (most recent call last):
File "unicode.py", line 2, in <module>
raise ValueError("unicode: \u0141\u0142")
ValueError: unicode: \u0141\u0142
=> correct

Why does it work? It's because PyErr_Display() uses sys.stderr instead
of sys.stdout and sys.stderr uses a different unicode error mechanism:
>>> import sys
>>> sys.stdout.errors
'strict'
>>> sys.stderr.errors
'backslashreplace'

Which is a great idea :-)

You can try on Windows using the new attached file unicode.py.

Added file: http://bugs.python.org/file11746/unicode.py

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue3975>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue3975] PyTraceBack_Print() doesn't respect # coding: xxx header [ In reply to ]
STINNER Victor <victor.stinner@haypocalc.com> added the comment:

@amaury: Oops, yes, I introduced a refleak in the version 4 with the
PyUnicode_Check(). Instead of just moved Py_(X)RECREF(lineobj);, I
could not not resist to refactor the code to remove one more
indentation level (I prefer if (...) return; instead of if (...) {
very long block; }).

Changes in version 5:
- rename 'namebuf' buffer to 'buf', it's used for the filename and to
display the indentation space (strcpy(buf, ' ');).
- move Py_DECREF(fob); at the end of the GetLine loop
- return on lineobj error

I think that the new version is easier to read than the current code
because they are few indentation and no more local variables (if (...)
{ local var; ... })

Added file: http://bugs.python.org/file11747/traceback_unicode-5.patch

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue3975>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue3975] PyTraceBack_Print() doesn't respect # coding: xxx header [ In reply to ]
Changes by STINNER Victor <victor.stinner@haypocalc.com>:


Removed file: http://bugs.python.org/file11736/traceback_unicode-4.patch

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue3975>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue3975] PyTraceBack_Print() doesn't respect # coding: xxx header [ In reply to ]
Changes by STINNER Victor <victor.stinner@haypocalc.com>:


Removed file: http://bugs.python.org/file11747/traceback_unicode-5.patch

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue3975>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue3975] PyTraceBack_Print() doesn't respect # coding: xxx header [ In reply to ]
Changes by STINNER Victor <victor.stinner@haypocalc.com>:


Added file: http://bugs.python.org/file11748/traceback_unicode-5.patch

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue3975>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue3975] PyTraceBack_Print() doesn't respect # coding: xxx header [ In reply to ]
Amaury Forgeot d'Arc <amauryfa@gmail.com> added the comment:

The code is indeed easier to follow.
I don't have any more remark, thanks to you perseverance!

Now, is there some unit test we could provide? #2384 depends on this
issue, it should be easy to extract a small test case.

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue3975>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue3975] PyTraceBack_Print() doesn't respect # coding: xxx header [ In reply to ]
STINNER Victor <victor.stinner@haypocalc.com> added the comment:

My patch for #2384 contains a testcase which require #3975 and #2384
to be fixed (you have to apply both patches to test it).

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue3975>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue3975] PyTraceBack_Print() doesn't respect # coding: xxx header [ In reply to ]
Amaury Forgeot d'Arc <amauryfa@gmail.com> added the comment:

Committed r66867, together with #2384. Thanks for your perseverance ;-)

----------
resolution: -> fixed
status: open -> closed

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue3975>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com