Mailing List Archive

Round Bug in Python 1.6?
Hi,

asa side effect, I happened to observe the following rounding bug.
It happens in Stackless Python, which is built against the
pre-unicode CVS branch.

Is this changed for 1.6, or might it be my bug?

D:\python\spc>python
Python 1.5.42+ (#0, Mar 29 2000, 20:23:26) [MSC 32 bit (Intel)] on win32
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
>>> round(3.1415926585, 4)
3.1415999999999999
>>> ^Z

D:\python>python
Python 1.5.2 (#0, Apr 13 1999, 10:51:12) [MSC 32 bit (Intel)] on win32
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
>>> round(3.1415926585, 4)
3.1416
>>> ^Z

ciao - chris

--
Christian Tismer :^) <mailto:tismer@appliedbiometrics.com>
Applied Biometrics GmbH : Have a break! Take a ride on Python's
Kaunstr. 26 : *Starship* http://starship.python.net
14163 Berlin : PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF
where do you want to jump today? http://www.stackless.com
Re: Round Bug in Python 1.6? [ In reply to ]
Chris> I happened to observe the following rounding bug. It happens in
Chris> Stackless Python, which is built against the pre-unicode CVS
Chris> branch.

Chris> Is this changed for 1.6, or might it be my bug?

I doubt it's your problem. I see it too with 1.6a2 (no stackless):

% ./python
Python 1.6a2 (#2, Apr 6 2000, 15:27:22) [GCC pgcc-2.91.66 19990314 (egcs-1.1.2 release)] on linux2
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
>>> round(3.1415926585, 4)
3.1415999999999999

Same behavior whether compiled with -O2 or -g.

--
Skip Montanaro | http://www.mojam.com/
skip@mojam.com | http://www.musi-cal.com/
Re: Round Bug in Python 1.6? [ In reply to ]
> asa side effect, I happened to observe the following rounding bug.
> It happens in Stackless Python, which is built against the
> pre-unicode CVS branch.
>
> Is this changed for 1.6, or might it be my bug?
>
> D:\python\spc>python
> Python 1.5.42+ (#0, Mar 29 2000, 20:23:26) [MSC 32 bit (Intel)] on win32
> Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
> >>> round(3.1415926585, 4)
> 3.1415999999999999
> >>> ^Z
>
> D:\python>python
> Python 1.5.2 (#0, Apr 13 1999, 10:51:12) [MSC 32 bit (Intel)] on win32
> Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
> >>> round(3.1415926585, 4)
> 3.1416
> >>> ^Z

This is because repr() now uses full precision for floating point
numbers. round() does what it can, but 3.1416 just can't be
represented exactly, and "%.17g" gives 3.1415999999999999.

This is definitely the right thing to do for repr() -- ask Tim.

However, it may be time to switch so that "immediate expression"
values are printed as str() instead of as repr()...

--Guido van Rossum (home page: http://www.python.org/~guido/)
RE: Round Bug in Python 1.6? [ In reply to ]
[posted & mailed]

[Christian Tismer]
> as a side effect, I happened to observe the following rounding bug.
> It happens in Stackless Python, which is built against the
> pre-unicode CVS branch.
>
> Is this changed for 1.6, or might it be my bug?

It's a 1.6 thing, and is not a bug.

> D:\python\spc>python
> Python 1.5.42+ (#0, Mar 29 2000, 20:23:26) [MSC 32 bit (Intel)] on win32
> Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
> >>> round(3.1415926585, 4)
> 3.1415999999999999
> >>> ^Z

The best possible IEEE-754 double approximation to 3.1416 is (exactly)

3.141599999999999948130380289512686431407928466796875

so the output you got is correctly rounded to 17 significant digits. IOW,
it's a feature.

1.6 boosted the number of decimal digits repr(float) produces so that

eval(repr(x)) == x

for every finite float on every platform with an IEEE-754-conforming libc.
It was actually rare for that equality to hold pre-1.6. repr() cannot
produce fewer digits than this without allowing the equality to fail in some
cases.

The 1.6 str() still produces the *illusion* that the result is 3.1416 (as
repr() also did pre-1.6).

IMO it would be better if Python stopped using repr() (at least by default)
for formatting expressions at the interactive prompt (for much more on this,
see DejaNews).

the-two-things-you-can-do-about-it-are-nothing-and-love-it<wink>-ly
y'rs - tim
Re: Round Bug in Python 1.6? [ In reply to ]
On 06 April 2000, Guido van Rossum said:
> This is because repr() now uses full precision for floating point
> numbers. round() does what it can, but 3.1416 just can't be
> represented exactly, and "%.17g" gives 3.1415999999999999.
>
> This is definitely the right thing to do for repr() -- ask Tim.
>
> However, it may be time to switch so that "immediate expression"
> values are printed as str() instead of as repr()...

+1 on this: it's easier to change "foo" to "`foo`" than to "str(foo)" or
"print foo". It just makes more sense to use str().

Oh, joy! oh happiness! someday soon, I may be able to type
"blah.__doc__" at the interactive prompt and get a readable result!

Greg
Re: Round Bug in Python 1.6? [ In reply to ]
On 07-Apr-00 Greg Ward wrote:
> Oh, joy! oh happiness! someday soon, I may be able to type
> "blah.__doc__" at the interactive prompt and get a readable result!

Just i case... I hope you haven't missed "print blah.__doc__".

/Mikael

-----------------------------------------------------------------------
E-Mail: Mikael Olofsson <mikael@isy.liu.se>
WWW: http://www.dtr.isy.liu.se/dtr/staff/mikael
Phone: +46 - (0)13 - 28 1343
Telefax: +46 - (0)13 - 28 1339
Date: 07-Apr-00
Time: 14:56:52

This message was sent by XF-Mail.
-----------------------------------------------------------------------
Re: Round Bug in Python 1.6? [ In reply to ]
On 07 April 2000, Mikael Olofsson said:
>
> On 07-Apr-00 Greg Ward wrote:
> > Oh, joy! oh happiness! someday soon, I may be able to type
> > "blah.__doc__" at the interactive prompt and get a readable result!
>
> Just i case... I hope you haven't missed "print blah.__doc__".

Yeah, I know: my usual mode of operation is this:

>>> blah.__doc__
...repr of docstring...
...sound of me cursing...
>>> print blah.__doc__

The real reason for using str() at the interactive prompt is not to save
me keystrokes, but because it just seems like the sensible thing to do.
People who understand the str/repr difference, and really want the repr
version, can slap backquotes around whatever they're printing.

Greg
Re: Round Bug in Python 1.6? [ In reply to ]
Greg wrote:
> Yeah, I know: my usual mode of operation is this:
>
> >>> blah.__doc__
> ...repr of docstring...
> ...sound of me cursing...
> >>> print blah.__doc__

on the other hand, I tend to do this now
and then:

>>> blah = foo() # returns chunk of binary data
>>> blah

which, if you use str instead of repr, can
reprogram your terminal window in many
interesting ways...

but I think I'm +1 on this anyway. or at
least +0.90000000000000002

</F>
Re: Round Bug in Python 1.6? [ In reply to ]
Tim Peters wrote:
>
> The best possible IEEE-754 double approximation to 3.1416 is (exactly)
>
> 3.141599999999999948130380289512686431407928466796875
>
> so the output you got is correctly rounded to 17 significant digits. IOW,
> it's a feature.

I'm very respectful when I see a number with so many digits in a row. :-)

I'm not sure that this will be of any interest to you, number crunchers,
but a research team in computer arithmetics here reported some major
results lately: they claim that they "solved" the Table Maker's Dilemma
for most common functions in IEEE-754 double precision arithmetic.
(and no, don't ask me what this means ;-) For more information, see:
http://www.ens-lyon.fr/~jmmuller/Intro-to-TMD.htm

--
Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252
Re: Round Bug in Python 1.6? [ In reply to ]
On Thu, 6 Apr 2000, Guido van Rossum wrote:

> However, it may be time to switch so that "immediate expression"
> values are printed as str() instead of as repr()...

Just checking my newly bought "Guido Channeling" kit -- you mean str()
but special case the snot out of strings(TM), don't you

Trademark probably belong to Tim Peters.
--
Moshe Zadka <mzadka@geocities.com>.
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com
Re: Round Bug in Python 1.6? [ In reply to ]
Tim Peters wrote:
> The best possible IEEE-754 double approximation to 3.1416 is (exactly)
>
> 3.141599999999999948130380289512686431407928466796875

Let's call this number 'A' for the sake of discussion.

> so the output you got is correctly rounded to 17 significant digits. IOW,
> it's a feature.

Clearly there is something very wrong here:

Python 1.5.2+ (#2, Mar 28 2000, 18:27:50)
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
>>> 3.1416
3.1415999999999999
>>>

Now you say that 17 significant digits are required to ensure
that eval(repr(x)) == x, but we surely know that 17 digits are
*not* required when x is A because i *just typed in* 3.1416 and
the best choice of double value was A.

I haven't gone and figured it out, but i'll take your word for
it that 17 digits may be required in *certain* cases to ensure
that eval(repr(x)) == x. They're just not required in all cases.

It's very jarring to type something in, and have the interpreter
give you back something that looks very different. It breaks a
fundamental rule of consistency, and that damages the user's
trust in the system or their understanding of the system. (What
do you do then, start explaining the IEEE double representation
to your CP4E beginner?)

What should really happen is that floats intelligently print in
the shortest and simplest manner possible, i.e. the fewest
number of digits such that the decimal representation will
convert back to the actual value. Now you may say this is a
pain to implement, but i'm talking about sanity for the user here.

I haven't investigated how to do this best yet. I'll go off
now and see if i can come up with an algorithm that's not
quite so stupid as

def smartrepr(x):
p = 17
while eval('%%.%df' % (p - 1) % x) == x: p = p - 1
return '%%.%df' % p % x



-- ?!ng
Re: Round Bug in Python 1.6? [ In reply to ]
> Just checking my newly bought "Guido Channeling" kit -- you mean str()
> but special case the snot out of strings(TM), don't you

Except I'm not sure what kind of special-casing should be happening.

Put quotes around it without worrying if that makes it a valid string
literal is one thought that comes to mind.

Another approach might be what Tk's text widget does -- pass through
certain control characters (LF, TAB) and all (even non-ASCII) printing
characters, but display other control characters as \x.. escapes
rather than risk putting the terminal in a weird mode. No quotes
though. Hm, I kind of like this: when used as intended, it will just
display the text, with newlines and umlauts etc.; but when printing
binary gibberish, it will do something friendly.

There's also the issue of what to do with lists (or tuples, or dicts)
containing strings. If we agree on this:

>>> "hello\nworld\n\347" # octal 347 is a cedilla
hello
world
ç
>>>

Then what should ("hello\nworld", "\347") show? I've got enough serious
complaints that I don't want to propose that it use repr():

>>> ("hello\nworld", "\347")
('hello\nworld', '\347')
>>>

Other possibilities:

>>> ("hello\nworld", "\347")
('hello
world', 'ç')
>>>

or maybe

>>> ("hello\nworld", "\347")
('''hello
world''', 'ç')
>>>

Of course there's also the Unicode issue -- the above all assumes
Latin-1 for stdout.

Still no closure, I think...

--Guido van Rossum (home page: http://www.python.org/~guido/)
Re: Round Bug in Python 1.6? [ In reply to ]
> Tim Peters wrote:
> > The best possible IEEE-754 double approximation to 3.1416 is (exactly)
> >
> > 3.141599999999999948130380289512686431407928466796875
>
> Let's call this number 'A' for the sake of discussion.
>
> > so the output you got is correctly rounded to 17 significant digits. IOW,
> > it's a feature.
>
> Clearly there is something very wrong here:
>
> Python 1.5.2+ (#2, Mar 28 2000, 18:27:50)
> Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
> >>> 3.1416
> 3.1415999999999999
> >>>
>
> Now you say that 17 significant digits are required to ensure
> that eval(repr(x)) == x, but we surely know that 17 digits are
> *not* required when x is A because i *just typed in* 3.1416 and
> the best choice of double value was A.

Ping has a point!

> I haven't gone and figured it out, but i'll take your word for
> it that 17 digits may be required in *certain* cases to ensure
> that eval(repr(x)) == x. They're just not required in all cases.
>
> It's very jarring to type something in, and have the interpreter
> give you back something that looks very different. It breaks a
> fundamental rule of consistency, and that damages the user's
> trust in the system or their understanding of the system. (What
> do you do then, start explaining the IEEE double representation
> to your CP4E beginner?)
>
> What should really happen is that floats intelligently print in
> the shortest and simplest manner possible, i.e. the fewest
> number of digits such that the decimal representation will
> convert back to the actual value. Now you may say this is a
> pain to implement, but i'm talking about sanity for the user here.
>
> I haven't investigated how to do this best yet. I'll go off
> now and see if i can come up with an algorithm that's not
> quite so stupid as
>
> def smartrepr(x):
> p = 17
> while eval('%%.%df' % (p - 1) % x) == x: p = p - 1
> return '%%.%df' % p % x

Have a look at what Java does; it seems to be doing this right:

& jpython
JPython 1.1 on java1.2 (JIT: sunwjit)
Copyright (C) 1997-1999 Corporation for National Research Initiatives
>>> import java.lang
>>> x = java.lang.Float(3.1416)
>>> x.toString()
'3.1416'
>>> ^D
&

Could it be as simple as converting x +/- one bit and seeing how many
differing digits there were? (Not that +/- one bit is easy to
calculate...)

--Guido van Rossum (home page: http://www.python.org/~guido/)
RE: Round Bug in Python 1.6? [ In reply to ]
[Ka-Ping Yee]
> ,,,
> Now you say that 17 significant digits are required to ensure
> that eval(repr(x)) == x,

Yes. This was first proved in Jerome Coonen's doctoral dissertation, and is
one of the few things IEEE-754 guarantees about fp I/O: that
input(output(x)) == x for all finite double x provided that output()
produces at least 17 significant decimal digits (and 17 is minimal). In
particular, IEEE-754 does *not* guarantee that either I or O are properly
rounded, which latter is needed for what *you* want to see here. The std
doesn't require proper rounding in this case (despite that it requires it in
all other cases) because no efficient method for doing properly rounded I/O
was known at the time (and, alas, that's still true).

> but we surely know that 17 digits are *not* required when x is A
> because i *just typed in* 3.1416 and the best choice of double value
> was A.

Well, x = 1.0 provides a simpler case <wink>.

> I haven't gone and figured it out, but i'll take your word for
> it that 17 digits may be required in *certain* cases to ensure
> that eval(repr(x)) == x. They're just not required in all cases.
>
> It's very jarring to type something in, and have the interpreter
> give you back something that looks very different.

It's in the very nature of binary floating-point that the numbers they type
in are often not the numbers the system uses.

> It breaks a fundamental rule of consistency, and that damages the user's
> trust in the system or their understanding of the system.

If they're surprised by this, they indeed don't understand the arithmetic at
all! This is an argument for using a different form of arithmetic, not for
lying about reality.

> (What do you do then, start explaining the IEEE double representation
> to your CP4E beginner?)

As above. repr() shouldn't be used at the interactive prompt anyway (but
note that I did not say str() should be).

> What should really happen is that floats intelligently print in
> the shortest and simplest manner possible, i.e. the fewest
> number of digits such that the decimal representation will
> convert back to the actual value. Now you may say this is a
> pain to implement, but i'm talking about sanity for the user here.

This can be done, but only if Python does all fp I/O conversions entirely on
its own -- 754-conforming libc routines are inadequate for this purpose
(and, indeed, I don't believe any libc other than Sun's does do proper
rounding here).

For background and code, track down "How To Print Floating-Point Numbers
Accurately" by Steele & White, and its companion paper (s/Print/Read/) by
Clinger. Steele & White were specifically concerned with printing the
"shortest" fp representation possible such that proper input could later
reconstruct the value exactly. Steele, White & Clinger give relatively
simple code for this that relies on unbounded int arithmetic.
Excruciatingly difficult and platform-#ifdef'ed "optimized" code for this
was written & refined over several years by the numerical analyst David Gay,
and is available from Netlib.

> I haven't investigated how to do this best yet. I'll go off
> now and see if i can come up with an algorithm that's not
> quite so stupid as
>
> def smartrepr(x):
> p = 17
> while eval('%%.%df' % (p - 1) % x) == x: p = p - 1
> return '%%.%df' % p % x

This merely exposes accidents in the libc on the specific platform you run
it. That is, after

print smartrepr(x)

on IEEE-754 platform A, reading that back in on IEEE-754 platform B may not
yield the same number platform A started with. Both platforms have to do
proper rounding to make this work; there's no way to do proper rounding by
using libc; so Python has to do it itself; there's no efficient way to do it
regardless; nevertheless, it's a noble goal, and at least a few languages in
the Lisp family require it (most notably Scheme, from whence Steele, White &
Clinger's interest in the subject).

you're-in-over-your-head-before-the-water-touches-your-toes<wink>-ly
y'rs - tim
RE: Round Bug in Python 1.6? [ In reply to ]
[Guido]
> Have a look at what Java does; it seems to be doing this right:
>
> & jpython
> JPython 1.1 on java1.2 (JIT: sunwjit)
> Copyright (C) 1997-1999 Corporation for National Research Initiatives
> >>> import java.lang
> >>> x = java.lang.Float(3.1416)
> >>> x.toString()
> '3.1416'
> >>>

That Java does this is not an accident: Guy Steele pushed for the same
rules he got into Scheme, although

a) The Java rules are much tighter than Scheme's.

and

b) He didn't prevail on this point in Java until version 1.1 (before then
Java's double/float->string never produced more precision than ANSI C's
default %g format, so was inadequate to preserve equality under I/O).

I suspect there was more than a bit of internal politics behind the delay,
as the 754 camp has never liked the "minimal width" gimmick(*), and Sun's C
and Fortran numerics (incl. their properly-rounding libc I/O routines) were
strongly influenced by 754 committee members.

> Could it be as simple as converting x +/- one bit and seeing how many
> differing digits there were? (Not that +/- one bit is easy to
> calculate...)

Sorry, it's much harder than that. See the papers (and/or David Gay's code)
I referenced before.


(*) Why the minimal-width gimmick is disliked: If you print a (32-bit) IEEE
float with minimal width, then read it back in as a (64-bit) IEEE double,
you may not get the same result as if you had converted the original float
to a double directly. This is because "minimal width" here is *relative to*
the universe of 32-bit floats, and you don't always get the same minimal
width if you compute it relative to the universe of 64-bit doubles instead.
In other words, "minimal width" can lose accuracy needlessly -- but this
can't happen if you print the float to full precision instead.
RE: Round Bug in Python 1.6? [ In reply to ]
[Vladimir Marangozov]
> I'm not sure that this will be of any interest to you, number crunchers,
> but a research team in computer arithmetics here reported some major
> results lately: they claim that they "solved" the Table Maker's Dilemma
> for most common functions in IEEE-754 double precision arithmetic.
> (and no, don't ask me what this means ;-)

Back in the old days, some people spent decades making tables of various
function values. A common way was to laboriously compute high-precision
values over a sparse grid, using e.g. series expansions, then extend that to
a fine grid via relatively simple interpolation formulas between the
high-precision results.

You have to compute the sparse grid to *some* "extra" precision in order to
absorb roundoff errors in the interpolated values. The "dilemma" is
figuring out how *much* extra precision: too much and it greatly slows the
calculations, too little and the interpolated values are inaccurate.

The "problem cases" for a function f(x) are those x such that the exact
value of f(x) is very close to being exactly halfway between representable
numbers. In order to round correctly, you have to figure out which
representable number f(x) is closest to. How much extra precision do you
need to use to resolve this correctly in all cases?

Suppose you're computing f(x) to 2 significant decimal digits, using 4-digit
arithmetic, and for some specific x0 f(x0) turns out to be 41.49 +- 3.
That's not enough to know whether it *should* round to 41 or 42. So you
need to try again with more precision. But how much? You might try 5
digits next, and might get 41.501 +- 3, and you're still stuck. Try 6 next?
Might be a waste of effort. Try 20 next? Might *still* not be enough -- or
could just as well be that 7 would have been enough and you did 10x the work
you needed to do.

Etc. It turns out that for most functions there's no general way known to
answer the "how much?" question in advance: brute force is the best method
known.

For various IEEE double precision functions, so far it's turned out that you
need in the ballpark of 40-60 extra accurate bits (beyond the native 53) in
order to round back correctly to 53 in all cases, but there's no *theory*
supporting that. It *could* require millions of extra bits.

For those wondering "why bother?", the practical answer is this: if a std
could require correct rounding, functions would be wholly portable across
machines ("correctly rounded" is precisely defined by purely mathematical
means). That's where IEEE-754 made its huge break with tradition, by
requiring correct rounding for + - * / and sqrt. The places it left fuzzy
(like string<->float, and all transcendental functions) are the places your
program produces different results when you port it.

Irritating one: MS VC++ on Intel platforms generates different code for
exp() depending on the optimization level. They often differ in the last
bit they compute. This wholly accounts for why Dragon's speech recognition
software sometimes produces subtly (but very visibly!) different results
depending on how it was compiled. Before I got tossed into this pit, it was
assumed for a year to be either a -O bug or somebody fetching uninitialized
storage.

that's-what-you-get-when-you-refuse-to-define-results-ly y'rs - tim
RE: Round Bug in Python 1.6? [ In reply to ]
In a previous message, i wrote:
> > It's very jarring to type something in, and have the interpreter
> > give you back something that looks very different.
[...]
> > It breaks a fundamental rule of consistency, and that damages the user's
> > trust in the system or their understanding of the system.

Then on Fri, 7 Apr 2000, Tim Peters replied:
> If they're surprised by this, they indeed don't understand the arithmetic at
> all! This is an argument for using a different form of arithmetic, not for
> lying about reality.

This is not lying! If you type in "3.1416" and Python says "3.1416",
then indeed it is the case that "3.1416" is a correct way to type in
the floating-point number being expressed. So "3.1415999999999999"
is not any more truthful than "3.1416" -- it's just more annoying.

I just tried this in Python 1.5.2+:

>>> .1
0.10000000000000001
>>> .2
0.20000000000000001
>>> .3
0.29999999999999999
>>> .4
0.40000000000000002
>>> .5
0.5
>>> .6
0.59999999999999998
>>> .7
0.69999999999999996
>>> .8
0.80000000000000004
>>> .9
0.90000000000000002

Ouch.


I wrote:
> > (What do you do then, start explaining the IEEE double representation
> > to your CP4E beginner?)

Tim replied:
> As above. repr() shouldn't be used at the interactive prompt anyway (but
> note that I did not say str() should be).

What, then? Introduce a third conversion routine and further
complicate the issue? I don't see why it's necessary.

I wrote:
> > What should really happen is that floats intelligently print in
> > the shortest and simplest manner possible

Tim replied:
> This can be done, but only if Python does all fp I/O conversions entirely on
> its own -- 754-conforming libc routines are inadequate for this purpose

Not "all fp I/O conversions", right? Only repr(float) needs to
be implemented for this particular purpose. Other conversions
like "%f" and "%g" can be left to libc, as they are now.

I suppose for convenience's sake it may be nice to add another
format spec so that one can ask for this behaviour from the "%"
operator as well, but that's a separate issue (perhaps "%r" to
insert the repr() of an argument of any type?).

> For background and code, track down "How To Print Floating-Point Numbers
> Accurately" by Steele & White, and its companion paper (s/Print/Read/)

Thanks! I found 'em. Will read...

I suggested:
> > def smartrepr(x):
> > p = 17
> > while eval('%%.%df' % (p - 1) % x) == x: p = p - 1
> > return '%%.%df' % p % x

Tim replied:
> This merely exposes accidents in the libc on the specific platform you run
> it. That is, after
>
> print smartrepr(x)
>
> on IEEE-754 platform A, reading that back in on IEEE-754 platform B may not
> yield the same number platform A started with.

That is not repr()'s job. Once again:

repr() is not for the machine.

It is not part of repr()'s contract to ensure the kind of
platform-independent conversion you're talking about. It
prints out the number in a way that upholds the eval(repr(x)) == x
contract for the system you are currently interacting with, and
that's good enough.

If you wanted platform-independent serialization, you would
use something else. As long as the language reference says

"These represent machine-level double precision floating
point numbers. You are at the mercy of the underlying
machine architecture and C implementation for the accepted
range and handling of overflow."

and until Python specifies the exact sizes and behaviours of
its floating-point numbers, you can't expect these kinds of
cross-platform guarantees anyway.


Here are the expectations i've come to have:

str()'s contract:
- if x is a string, str(x) == x
- otherwise, str(x) is a reasonable string coercion from x

repr()'s contract:
- if repr(x) is syntactically valid, eval(repr(x)) == x
- repr(x) displays x in a safe and readable way
- for objects composed of basic types, repr(x) reflects
what the user would have to say to produce x

pickle's contract:
- pickle.dumps(x) is a platform-independent serialization
of the value and state of object x


-- ?!ng
Re: Round Bug in Python 1.6? [ In reply to ]
Ok, just a word (carefully:)

Ka-Ping Yee wrote:
...
> I just tried this in Python 1.5.2+:
>
> >>> .1
> 0.10000000000000001
> >>> .2
> 0.20000000000000001
> >>> .3
> 0.29999999999999999

Agreed that this is not good.

...
> repr()'s contract:
> - if repr(x) is syntactically valid, eval(repr(x)) == x
> - repr(x) displays x in a safe and readable way
> - for objects composed of basic types, repr(x) reflects
> what the user would have to say to produce x

This sounds reasonable.
BTW my problem did not come up by typing something
in, but I just rounded a number down to 3 digits past the dot.
Then, as usual, I just let the result drop from the prompt,
without prefixing it with "print". repr() was used, and the
result was astonishing.

Here is the problem, as I see it:
You say if you type 3.1416, you want to get exactly this back.
But how should Python know that you typed it in?
Same in my case: I just rounded to 3 digits, but how
should Python know about this?

And what do you expect when you type in 3.14160, do you want
the trailing zero preserved or not?

Maybe we would need to carry exactness around for numbers.
Or even have a different float type for cases where we want
exact numbers? Keyboard entry and rounding produce exact numbers.
Simple operations between exact numbers would keep exactness,
higher level functions would probably not.

I think we dlved into a very difficult domain here.

ciao - chris

--
Christian Tismer :^) <mailto:tismer@appliedbiometrics.com>
Applied Biometrics GmbH : Have a break! Take a ride on Python's
Kaunstr. 26 : *Starship* http://starship.python.net
14163 Berlin : PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF
where do you want to jump today? http://www.stackless.com
Re: Round Bug in Python 1.6? [ In reply to ]
On Sun, 9 Apr 2000, Christian Tismer wrote:
> Here is the problem, as I see it:
> You say if you type 3.1416, you want to get exactly this back.
> But how should Python know that you typed it in?
> Same in my case: I just rounded to 3 digits, but how
> should Python know about this?
>
> And what do you expect when you type in 3.14160, do you want
> the trailing zero preserved or not?

It's okay for the zero to go away, because it doesn't affect
the value of the number. (Carrying around a significant-digit
count or error range with numbers is another issue entirely,
and a very thorny one at that.)

I think "fewest digits needed to distinguish the correct value"
will give good and least-surprising results here. This method
guarantees:

- If you just type a number in and the interpreter
prints it back, it will never respond with more
junk digits than you typed.

- If you type in what the interpreter displays for a
float, you can be assured of getting the same value.

> Maybe we would need to carry exactness around for numbers.
> Or even have a different float type for cases where we want
> exact numbers? Keyboard entry and rounding produce exact numbers.

If you mean a decimal representation, yes, perhaps we need to
explore that possibility a little more.



-- ?!ng

"All models are wrong; some models are useful."
-- George Box
Re: Round Bug in Python 1.6? [ In reply to ]
Ka-Ping Yee wrote:
>
> On Sun, 9 Apr 2000, Christian Tismer wrote:
> > Here is the problem, as I see it:
> > You say if you type 3.1416, you want to get exactly this back.
> > But how should Python know that you typed it in?
> > Same in my case: I just rounded to 3 digits, but how
> > should Python know about this?
> >
> > And what do you expect when you type in 3.14160, do you want
> > the trailing zero preserved or not?
>
> It's okay for the zero to go away, because it doesn't affect
> the value of the number. (Carrying around a significant-digit
> count or error range with numbers is another issue entirely,
> and a very thorny one at that.)
>
> I think "fewest digits needed to distinguish the correct value"
> will give good and least-surprising results here. This method
> guarantees:

Hmm, I hope I understood.
Oh, wait a minute! What is the method? What is the correct value?

If I type
>>> 0.1
0.10000000000000001
>>> 0.10000000000000001
0.10000000000000001
>>>

There is only one value: The one which is in the machine.
Would you think it is ok to get 0.1 back, when you
actually *typed* 0.10000000000000001 ?

--
Christian Tismer :^) <mailto:tismer@appliedbiometrics.com>
Applied Biometrics GmbH : Have a break! Take a ride on Python's
Kaunstr. 26 : *Starship* http://starship.python.net
14163 Berlin : PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF
where do you want to jump today? http://www.stackless.com
RE: Round Bug in Python 1.6? [ In reply to ]
[Christian Tismer]
> ...
> Here is the problem, as I see it:
> You say if you type 3.1416, you want to get exactly this back.
>
> But how should Python know that you typed it in?
> Same in my case: I just rounded to 3 digits, but how
> should Python know about this?
>
> And what do you expect when you type in 3.14160, do you want
> the trailing zero preserved or not?
>
> Maybe we would need to carry exactness around for numbers.
> Or even have a different float type for cases where we want
> exact numbers? Keyboard entry and rounding produce exact numbers.
> Simple operations between exact numbers would keep exactness,
> higher level functions would probably not.
>
> I think we dlved into a very difficult domain here.

"This kind of thing" is hopeless so long as Python uses binary floating
point. Ping latched on to "shortest" conversion because it appeared to
solve "the problem" in a specific case. But it doesn't really solve
anything -- it just shuffles the surprises around. For example,

>>> 3.1416 - 3.141
0.00059999999999993392
>>>

Do "shorest conversion" (relative to the universe of IEEE doubles) instead,
and it would print

0.0005999999999999339

Neither bears much syntactic resemblance to the

0.0006

the numerically naive "expect". Do anything less than the 16 significant
digits shortest conversion happens to produce in this case, and eval'ing the
string won't return the number you started with. So "0.0005999999999999339"
is the "best possible" string repr can produce (assuming you think "best" ==
"shortest faithful, relative to the platform's universe of possibilities",
which is itself highly debatable).

If you don't want to see that at the interactive prompt, one of two things
has to change:

A) Give up on eval(repr(x)) == x for float x, even on a single machine.

or

B) Stop using repr by default.

There is *no* advantage to #A over the long haul: lying always extracts a
price, and unlike most of you <wink>, I appeared to be the lucky email
recipient of the passionate gripes about repr(float)'s inadequacy in 1.5.2
and before. Giving a newbie an illusion of comfort at the cost of making it
useless for experts is simply nuts.

The desire for #B pops up from multiple sources: people trying to use
native non-ASCII chars in strings; people just trying to display docstrings
without embedded "\012" (newline) and "\011" (tab) escapes; and people using
"big" types (like NumPy arrays or rationals) where repr() can produce
unboundedly more info than the interactive user typically wants to see.

It *so happens* that str() already "does the right thing" in all 3 of the
last three points, and also happens to produce "0.0006" for the example
above. This is why people leap to:

C) Use str by default instead of repr.

But str doesn't pass down to containees, and *partly* does a wrong thing
when applied to strings, so it's not suitable either. It's *more* suitable
than repr, though!

trade-off-ing-ly y'rs - tim
RE: Round Bug in Python 1.6? [ In reply to ]
[Tim]
>> If they're surprised by this, they indeed don't understand the
>> arithmetic at all! This is an argument for using a different form of
>> arithmetic, not for lying about reality.

> This is not lying!

Yes, I overstated that. It's not lying, but I defy anyone to explain the
full truth of it in a way even Guido could understand <0.9 wink>. "Shortest
conversion" is a subtle concept, requiring knowledge not only of the
mathematical value, but of details of the HW representation. Plain old
"correct rounding" is HW-independent, so is much easier to *fully*
understand. And in things floating-point, what you don't fully understand
will eventually burn you.

Note that in a machine with 2-bit floating point, the "shortest conversion"
for 0.75 is the string "0.8": this should suggest the sense in which
"shortest conversion" can be actively misleading too.

> If you type in "3.1416" and Python says "3.1416", then indeed it is the
> case that "3.1416" is a correct way to type in the floating-point number
> being expressed. So "3.1415999999999999" is not any more truthful than
> "3.1416" -- it's just more annoying.

Yes, shortest conversion is *defensible*. But Python has no code to
implement that now, so it's not an option today.

> I just tried this in Python 1.5.2+:
>
> >>> .1
> 0.10000000000000001
> >>> .2
> 0.20000000000000001
> >>> .3
> 0.29999999999999999
> >>> .4
> 0.40000000000000002
> >>> .5
> 0.5
> >>> .6
> 0.59999999999999998
> >>> .7
> 0.69999999999999996
> >>> .8
> 0.80000000000000004
> >>> .9
> 0.90000000000000002
>
> Ouch.

As shown in my reply to Christian, shortest conversion is not a cure for
this "gosh, it printed so much more than I expected it to"; it only appears
to "fix it" in the simplest examples. So long as you want
eval(what's_diplayed) == what's_typed, this is unavoidable. The only ways
to avoid that are to use a different arithmetic, or stop using repr() at the
prompt.

>> As above. repr() shouldn't be used at the interactive prompt
>> anyway (but note that I did not say str() should be).

> What, then? Introduce a third conversion routine and further
> complicate the issue? I don't see why it's necessary.

Because I almost never want current repr() or str() at the prompt, and even
you <wink> don't want 3.1416-3.141 to display 0.0005999999999999339 (which
is the least you can print and have eval return the true answer).

>>> What should really happen is that floats intelligently print in
>>> the shortest and simplest manner possible

>> This can be done, but only if Python does all fp I/O conversions
>> entirely on its own -- 754-conforming libc routines are inadequate
>> for this purpose

> Not "all fp I/O conversions", right? Only repr(float) needs to
> be implemented for this particular purpose. Other conversions
> like "%f" and "%g" can be left to libc, as they are now.

No, all, else you risk %f and %g producing results that are inconsistent
with repr(), which creates yet another set of incomprehensible surprises.
This is not an area that rewards half-assed hacks! I'm intimately familiar
with just about every half-assed hack that's been tried here over the last
20 years -- they never work in the end. The only approach that ever bore
fruit was 754's "there is *a* mathematically correct answer, and *that's*
the one you return". Unfortunately, they dropped the ball here on
float<->string conversions (and very publicly regret that today).

> I suppose for convenience's sake it may be nice to add another
> format spec so that one can ask for this behaviour from the "%"
> operator as well, but that's a separate issue (perhaps "%r" to
> insert the repr() of an argument of any type?).

%r is cool! I like that.

>>> def smartrepr(x):
>>> p = 17
>>> while eval('%%.%df' % (p - 1) % x) == x: p = p - 1
>>> return '%%.%df' % p % x

>> This merely exposes accidents in the libc on the specific
>> platform you run it. That is, after
>>
>> print smartrepr(x)
>>
>> on IEEE-754 platform A, reading that back in on IEEE-754
?> platform B may not yield the same number platform A started with.

> That is not repr()'s job. Once again:
>
> repr() is not for the machine.

And once again, I didn't and don't agree with that, and, to save the next
seven msgs, never will <wink>.

> It is not part of repr()'s contract to ensure the kind of
> platform-independent conversion you're talking about. It
> prints out the number in a way that upholds the eval(repr(x)) == x
> contract for the system you are currently interacting with, and
> that's good enough.

It's not good enough for Java and Scheme, and *shouldn't* be good enough for
Python. The 1.6 repr(float) is already platform-independent across IEEE-754
machines (it's not correctly rounded on most platforms, but *does* print
enough that 754 guarantees bit-for-bit reproducibility) -- and virtually all
Python platforms are IEEE-754 (I don't know of an exception -- perhaps
Python is running on some ancient VAX?). The std has been around for 15+
years, virtually all platforms support it fully now, and it's about time
languages caught up.

BTW, the 1.5.2 text-mode pickle was *not* sufficient for reproducing floats
either, even on a single machine. It is now -- but thanks to the change in
repr.

> If you wanted platform-independent serialization, you would
> use something else.

There is nothing else. In 1.5.2 and before, people mucked around with
binary dumps hoping they didn't screw up endianness.

> As long as the language reference says
>
> "These represent machine-level double precision floating
> point numbers. You are at the mercy of the underlying
> machine architecture and C implementation for the accepted
> range and handling of overflow."
>
> and until Python specifies the exact sizes and behaviours of
> its floating-point numbers, you can't expect these kinds of
> cross-platform guarantees anyway.

There's nothing wrong with exceeding expectations <wink>. Despite what the
reference manual says, virtually all machines use identical fp
representations today (this wasn't true when the text above was written).

> str()'s contract:
> - if x is a string, str(x) == x
> - otherwise, str(x) is a reasonable string coercion from x

The last is so vague as to say nothing. My counterpart-- at least equally
vague --is

- otherwise, str(x) is a string that's easy to read and contains
a compact summary indicating x's nature and value in general
terms

> repr()'s contract:
> - if repr(x) is syntactically valid, eval(repr(x)) == x
> - repr(x) displays x in a safe and readable way

I would say instead:

- every character c in repr(x) has ord(c) in range(32, 128)
- repr(x) should strive to be easily readable by humans

> - for objects composed of basic types, repr(x) reflects
> what the user would have to say to produce x

Given your first point, does this say something other than "for basic types,
repr(x) is syntactically valid"? Also unclear what "basic types" means.


> pickle's contract:
> - pickle.dumps(x) is a platform-independent serialization
> of the value and state of object x

Since pickle can't handle all objects, this exaggerates the difference
between it and repr. Give a fuller description, like

- If pickle.dumps(x) is defined,
pickle.loads(pickle.dumps(x)) == x

and it's the same as the first line of your repr() contract, modulo

s/syntactically valid/is defined/
s/eval/pickle.loads/
s/repr/pickle.dumps/

The differences among all these guys remain fuzzy to me.

but-not-surprising-when-talking-about-what-people-like-to-look-at-ly
y'rs - tim
RE: Round Bug in Python 1.6? [ In reply to ]
[Ping]
> ...
> I think "fewest digits needed to distinguish the correct value"
> will give good and least-surprising results here. This method
> guarantees:
>
> - If you just type a number in and the interpreter
> prints it back, it will never respond with more
> junk digits than you typed.

Note the example from another reply of a machine with 2-bit floats. There
the user would see:

>>> 0.75 # happens to be exactly representable on this machine
0.8 # because that's the shortest string needed on this machine
# to get back 0.75 internally
>>

This kind of surprise is inherent in the approach, not specific to 2-bit
machines <wink>.

BTW, I don't know that it will never print more digits than you type: did
you prove that? It's plausible, but many plausible claims about fp turn out
to be false.

> - If you type in what the interpreter displays for a
> float, you can be assured of getting the same value.

This isn't of value for most interactive use -- in general you want to see
the range of a number, not enough to get 53 bits exactly (that's beyond the
limits of human "number sense"). It also has one clearly bad aspect: when
printing containers full of floats, the number of digits printed for each
will vary wildly from float to float. Makes for an unfriendly display. If
the prompt's display function were settable, I'd probably plug in pprint!
RE: Round Bug in Python 1.6? [ In reply to ]
[Christian]
> Hmm, I hope I understood.
> Oh, wait a minute! What is the method? What is the correct value?
>
> If I type
> >>> 0.1
> 0.10000000000000001
> >>> 0.10000000000000001
> 0.10000000000000001
> >>>
>
> There is only one value: The one which is in the machine.
> Would you think it is ok to get 0.1 back, when you
> actually *typed* 0.10000000000000001 ?

Yes, this is the kind of surprise I sketched with the "2-bit machine"
example. It can get more surprising than the above (where, as you suspect,
"shortest conversion" yields "0.1" for both -- which, btw, is why reading it
back in to a float type with more precision loses accuracy needlessly, which
in turn is why 754 True Believers dislike it).

repetitively y'rs - tim
RE: Round Bug in Python 1.6? [ In reply to ]
[Moshe Zadka]
> Just checking my newly bought "Guido Channeling" kit -- you mean str()
> but special case the snot out of strings(TM), don't you

[Guido]
> Except I'm not sure what kind of special-casing should be happening.

Welcome to the club.

> Put quotes around it without worrying if that makes it a valid string
> literal is one thought that comes to mind.

If nothing else <wink>, Ping convinced me the temptation to type that back
in will prove overwhelming.

> Another approach might be what Tk's text widget does -- pass through
> certain control characters (LF, TAB) and all (even non-ASCII) printing
> characters, but display other control characters as \x.. escapes
> rather than risk putting the terminal in a weird mode.

This must be platform-dependent? Just tried this loop in Win95 IDLE, using
Courier:

>>> for i in range(256):
print i, chr(i),

Across the whole range, it just showed what Windows always shows in the
Courier font (which is usually a (empty or filled) rectangle for most
"control characters"). No \x escapes at all.

BTW, note that Tk unhelpfully translates a request for "Courier New" into a
request for "Courier", which aren't the same fonts under Windows! So if
anyone tries this with the IDLE Windows defaults, and doesn't see all the
special characters Windows assigns to the range 128-159 in Courier New,
that's why -- most of them aren't assigned under Courier.

> No quotes though. Hm, I kind of like this: when used as intended, it will
> just display the text, with newlines and umlauts etc.; but when printing
> binary gibberish, it will do something friendly.

Can't be worse than what happens now <wink>.

> There's also the issue of what to do with lists (or tuples, or dicts)
> containing strings. If we agree on this:
>
> >>> "hello\nworld\n\347" # octal 347 is a cedilla
> hello
> world
> ç
> >>>

I don't think there is agreement on this, because nothing in the output says
"btw, this thing was a string". Is that worth preserving? "It depends" is
the only answer I've got to that.

> Then what should ("hello\nworld", "\347") show? I've got enough serious
> complaints that I don't want to propose that it use repr():
>
> >>> ("hello\nworld", "\347")
> ('hello\nworld', '\347')
> >>>
>
> Other possibilities:
>
> >>> ("hello\nworld", "\347")
> ('hello
> world', 'ç')
> >>>
>
> or maybe
>
> >>> ("hello\nworld", "\347")
> ('''hello
> world''', 'ç')
> >>>

I like the last best.

> Of course there's also the Unicode issue -- the above all assumes
> Latin-1 for stdout.
>
> Still no closure, I think...

It's curious how you invoke "closure" when and only when you don't know what
*you* want to do <wink>.

a-guido-divided-against-himself-cannot-stand-ly y'rs - tim

1 2  View All