Mailing List Archive

[issue5108] Invalid UTF-8 ("%s") length in PyUnicode_FromFormatV()
New submission from STINNER Victor <victor.stinner@haypocalc.com>:

PyUnicode_FromFormatV() doesn't count correctly the unicode length of
an UTF-8 string. Commit r57837 "Change %s argument for
PyUnicode_FromFormat to be UTF-8. Fixes #1070." introduced the bug. To
compute the length, it uses a a complex code to compute the length of
the UTF-8 string, whereas PyUnicode_DecodeUTF8(p,
strlen(p), "replace") + Py_UNICODE_COPY() is used to copy the string.
The problem may comes from the error handling ("replace").

Valgrind show that the error occurs at Py_UNICODE_COPY(): Invalid
write of size 1. Since it's only one byte, Python does not always
crash.

To reproduce the crash, use PyUnicode_FromFormatV() or function using
it: PyUnicode_FromFormat(), PyErr_Format(), ...

Example 1:

import grp

x=["\uDBE7\u8C99", "\u9C31\uF8DC\u3EC5\u1804\u629D\uE748\u68C8\uCF74\u9E63\uF647\uBF7A\uED63"]
x=str(x)
grp.getgrnam(x)

Example 2:

import unicodedata
x
= "\\udbe7\u8c99', '\u9c31\\uf8dc\u3ec5\u1804\u629d\\ue748\u68c8\ucf74\u9e63\\uf647\ubf7a\\ued63"
unicodedata.lookup(x)

I wrote a patch reusing PyUnicode_DecodeUTF8(p, strlen(p), "replace")
+ PyUnicode_GET_SIZE() to get the real length of the converted UTF-8
string.

A better patch should reuse code used to convert UTF-8 to Unicode with
the "replace" error handling.

----------
components: Unicode
messages: 80814
nosy: haypo
severity: normal
status: open
title: Invalid UTF-8 ("%s") length in PyUnicode_FromFormatV()
versions: Python 3.0, Python 3.1

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue5108>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue5108] Invalid UTF-8 ("%s") length in PyUnicode_FromFormatV() [ In reply to ]
Changes by STINNER Victor <victor.stinner@haypocalc.com>:


----------
keywords: +patch
Added file: http://bugs.python.org/file12892/unicode_format.patch

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue5108>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com