Mailing List Archive

[issue691291] codecs.open(filename, 'U', 'UTF-16') corrupts text
And Clover <and@doxdesk.com> added the comment:

> The problem is that codecs.open() forces binary mode on the underlying
file object, and this defeats the U mode.

Actually the problem is it doesn't defeat it!

The function is documented to force binary, but it actually only does
"mode = mode + 'b'", which can leave you with a mode of 'rUb'. This mode
should be invalid but in practice the 'U' wins out, and causes the
expected problems for UTF-16 and some East Asian codecs.

Until such time as text/universal mode is supported at the overlying
decoded stream level, I suggest that 'U' should be .replace()d out of
the mode as well as 'b' being added, as the documentation would imply.

----------
nosy: +aclover

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue691291>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com