Mailing List Archive

[issue4874] decoding functions in _codecs module accept str arguments
New submission from Antoine Pitrou <pitrou@free.fr>:

The following function calls should raise a TypeError instead. Encoding
functions are fine (they only accept str).

>>> import codecs
>>> codecs.utf_8_decode('aa')
('aa', 2)
>>> codecs.utf_8_decode('éé')
('éé', 4)
>>> codecs.latin_1_decode('éé')
('éé', 4)

----------
components: Extension Modules
messages: 79384
nosy: pitrou
priority: release blocker
severity: normal
status: open
title: decoding functions in _codecs module accept str arguments
type: behavior
versions: Python 3.0, Python 3.1

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue4874>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue4874] decoding functions in _codecs module accept str arguments [ In reply to ]
STINNER Victor <victor.stinner@haypocalc.com> added the comment:

Patch replacing "s*" parsing format by "y*" for:
- utf_7_decode()
- utf_8_decode()
- utf_16_decode()
- utf_16_le_decode()
- utf_16_be_decode()
- utf_16_ex_decode()
- utf_32_decode()
- utf_32_le_decode()
- utf_32_be_decode()
- utf_32_ex_decode()
- unicode_escape_decode()
- raw_unicode_escape_decode()
- latin_1_decode()
- ascii_decode()
- charmap_decode()
- mbcs_decode()

Using run_tests.sh, all tests are ok (with 19 skipped tests). I guess
that there is not tests for all these functions :-/

Note: codecs documentation was already correct:

.. method:: Codec.decode(input[, errors])
(...)
*input* must be a bytes object or one which provides the read-only
character
buffer interface -- for example, buffer objects and memory mapped
files.

----------
keywords: +patch
nosy: +haypo
Added file: http://bugs.python.org/file12641/_codecs_bytes.patch

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue4874>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue4874] decoding functions in _codecs module accept str arguments [ In reply to ]
Marc-Andre Lemburg <mal@egenix.com> added the comment:

On 2009-01-08 01:59, STINNER Victor wrote:
> STINNER Victor <victor.stinner@haypocalc.com> added the comment:
>
> Patch replacing "s*" parsing format by "y*" for:
> - utf_7_decode()
> - utf_8_decode()
> - utf_16_decode()
> - utf_16_le_decode()
> - utf_16_be_decode()
> - utf_16_ex_decode()
> - utf_32_decode()
> - utf_32_le_decode()
> - utf_32_be_decode()
> - utf_32_ex_decode()
> - latin_1_decode()
> - ascii_decode()
> - charmap_decode()
> - mbcs_decode()

These are fine.

> - unicode_escape_decode()
> - raw_unicode_escape_decode()

These changes are in line with their C API codec interfaces as well,
but those particular codecs could well also be made to work on Unicode
input, since unescaping can well be applied to Unicode as well.

I'll probably open a new item for this.

> Using run_tests.sh, all tests are ok (with 19 skipped tests). I guess
> that there is not tests for all these functions :-/

The mbcs codec is only available on Windows.

All others are tested by test_codecs.py.

Which ones are skipped in your case ?

> Note: codecs documentation was already correct:
>
> .. method:: Codec.decode(input[, errors])
> (...)
> *input* must be a bytes object or one which provides the read-only
> character
> buffer interface -- for example, buffer objects and memory mapped
> files.

That's not entirely correct: codecs are allowed to accept any
object type and can also return any object type. It up to them
to decide, e.g. a codec may accept both bytes and Unicode input
and always generate Unicode output when decoding.

I guess I have to review those doc changes...

----------
nosy: +lemburg

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue4874>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue4874] decoding functions in _codecs module accept str arguments [ In reply to ]
Antoine Pitrou <pitrou@free.fr> added the comment:

The patch is probably fine, but it would be nice to add some unit tests
for the new behaviour.

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue4874>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue4874] decoding functions in _codecs module accept str arguments [ In reply to ]
Marc-Andre Lemburg <mal@egenix.com> added the comment:

On 2009-01-13 14:14, Antoine Pitrou wrote:
> Antoine Pitrou <pitrou@free.fr> added the comment:
>
> The patch is probably fine, but it would be nice to add some unit tests
> for the new behaviour.

+1 from my side.

Thanks for the patch, Viktor.

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue4874>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue4874] decoding functions in _codecs module accept str arguments [ In reply to ]
STINNER Victor <victor.stinner@haypocalc.com> added the comment:

New patch:
- Leave unicode_escape_decode() and raw_unicode_escape_decode()
unchanged (still accept unicode string)
- Test changes (reject unicode for most codecs decode functions)
- Write tests for unicode_escape_decode() and
raw_unicode_escape_decode() (there was no test for these functions)

Added file: http://bugs.python.org/file12770/_codecs_bytes-2.patch

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue4874>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue4874] decoding functions in _codecs module accept str arguments [ In reply to ]
Changes by STINNER Victor <victor.stinner@haypocalc.com>:


Removed file: http://bugs.python.org/file12641/_codecs_bytes.patch

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue4874>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue4874] decoding functions in _codecs module accept str arguments [ In reply to ]
Changes by Antoine Pitrou <pitrou@free.fr>:


----------
assignee: -> pitrou
resolution: -> accepted

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue4874>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue4874] decoding functions in _codecs module accept str arguments [ In reply to ]
Antoine Pitrou <pitrou@free.fr> added the comment:

Committed in r68855, r68856. Thanks!

----------
resolution: accepted -> fixed
status: open -> closed

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue4874>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue4874] decoding functions in _codecs module accept str arguments [ In reply to ]
Amaury Forgeot d'Arc <amauryfa@gmail.com> added the comment:

IMO, Modules/cjkcodecs/multibytecodec.c should be changed as well.

----------
nosy: +amaury.forgeotdarc

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue4874>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue4874] decoding functions in _codecs module accept str arguments [ In reply to ]
Antoine Pitrou <pitrou@free.fr> added the comment:

Right, I hadn't thought of that.

----------
resolution: fixed ->
status: closed -> open

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue4874>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue4874] decoding functions in _codecs module accept str arguments [ In reply to ]
Antoine Pitrou <pitrou@free.fr> added the comment:

"Fixing" multibytecodec.c produces a TypeError in the following test:

def test_errorcallback_longindex(self):
dec = codecs.getdecoder('euc-kr')
myreplace = lambda exc: ('', sys.maxsize+1)
codecs.register_error('test.cjktest', myreplace)
self.assertRaises(IndexError, dec,
'apple\x92ham\x93spam', 'test.cjktest')

TypeError: decode() argument 1 must be bytes or buffer, not str

Since the test is meant to test recovery from a misbehaving callback, I
guess the type of the input string is not really important and can be
changed to a bytes string instead. What do you think?

(in any case, here is a patch)

Added file: http://bugs.python.org/file12831/mbdecode-unicode.patch

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue4874>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue4874] decoding functions in _codecs module accept str arguments [ In reply to ]
Amaury Forgeot d'Arc <amauryfa@gmail.com> added the comment:

The patch looks good.
I think the missing b in test_errorcallback_longindex is an overlook
when the tests were updated for py3k. You are right to change the test.

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue4874>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue4874] decoding functions in _codecs module accept str arguments [ In reply to ]
Antoine Pitrou <pitrou@free.fr> added the comment:

Committed in r68857, r68858.

----------
resolution: -> fixed
status: open -> closed

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue4874>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com