Mailing List Archive

string.ato? and Unicode
Is this an over-sight, or by design?

>>> string.atoi(u"1")
...
TypeError: argument 1: expected string, unicode found

It appears easy to support Unicode - there is already an explicit
StringType check in these functions, and it simply delegates to
int(), which already _does_ work for Unicode

A patch would leave the following behaviour:
>>> string.atio(u"1")
1
>>> string.atio(u"1", 16)
...
TypeError: can't convert non-string with explicit base

IMO, this is better than what we have now. I'll put together a
patch if one is wanted...

Mark.
Re: string.ato? and Unicode [ In reply to ]
Mark Hammond wrote:
>
> Is this an over-sight, or by design?
>
> >>> string.atoi(u"1")
> ...
> TypeError: argument 1: expected string, unicode found

Probably an oversight... and it may well not be the only
one: there are many explicit string checks in the code
which might need to be fixed for Unicode support.

As for string.ato? I'm not sure: these functions are
obsoleted by int(), float() and long().

> It appears easy to support Unicode - there is already an explicit
> StringType check in these functions, and it simply delegates to
> int(), which already _does_ work for Unicode

Right. I fixed the above three APIs to support Unicode.

> A patch would leave the following behaviour:
> >>> string.atio(u"1")
> 1
> >>> string.atio(u"1", 16)
> ...
> TypeError: can't convert non-string with explicit base
>
> IMO, this is better than what we have now. I'll put together a
> patch if one is wanted...

BTW, the code in string.py for atoi() et al. looks really
complicated:

"""
def atoi(*args):

"""atoi(s [,base]) -> int

Return the integer represented by the string s in the given
base, which defaults to 10. The string s must consist of one
or more digits, possibly preceded by a sign. If base is 0, it
is chosen from the leading characters of s, 0 for octal, 0x or
0X for hexadecimal. If base is 16, a preceding 0x or 0X is
accepted.

"""
try:
s = args[0]
except IndexError:
raise TypeError('function requires at least 1 argument: %d given' %
len(args))
# Don't catch type error resulting from too many arguments to int(). The
# error message isn't compatible but the error type is, and this function
# is complicated enough already.
if type(s) == _StringType:
return _apply(_int, args)
else:
raise TypeError('argument 1: expected string, %s found' %
type(s).__name__)
"""

Why not simply...

def atoi(s, base=10):
return int(s, base)

dito for atol() and atof()... ?! This would not only give us better
performance, but also Unicode support for free. (I'll fix int()
and long() to accept Unicode when using an explicit base too.)

--
Marc-Andre Lemburg
______________________________________________________________________
Business: http://www.lemburg.com/
Python Pages: http://www.lemburg.com/python/