Mailing List Archive: [issue45876] Improve accuracy of stdev functions in statistics

[issue45876] Improve accuracy of stdev functions in statistics [ In reply to ]

Nov 26, 2021, 10:49 AM

Post #26 of 41 (104 views)

Tim Peters <tim@python.org> added the comment:

Mark, ya, MS's Visual Studio's ldexp() has, far as I know, always worked this way. The code I showed was run under the 2019 edition, which we use to build the Windows CPython.

Raymond,

x = float(i)

is screamingly obvious at first glance.

x = i/1

looks like a coding error at first. The "reason" for different spellings in different branches looked obvious in the original: one branch needs to divide, and the other doesn't. So the original code was materially clearer to me.

Both, not sure it helps, but this use of round-to-odd appears akin to the decimal module's ROUND_05UP, which rounds an operation result in such a way that, if it's rounded again - under any rounding mode - to a narrower precision, you get the same narrower result as if you had used that rounding mode on the original operation to that narrower precision to begin with.

Decimal only needs to adjust the value of the last retained digit to, effectively, "encode" all possibilities, but binary needs two trailing bits. "Round" and "sticky" are great names for them :-)

----------

_______________________________________
Python tracker <report@bugs.python.org>
<https://bugs.python.org/issue45876>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com

[issue45876] Improve accuracy of stdev functions in statistics [ In reply to ]

Nov 26, 2021, 1:18 PM

Post #27 of 41 (104 views)

Raymond Hettinger <raymond.hettinger@gmail.com> added the comment:

[Tim]
> Note that, on Windows, ldexp() in the presence of
> denorms can truncate. Division rounds, so
>
> assert x / 2**i == ldexp(x, -i)
>
> can fail.

Objects/longobject.c::long_true_divide() uses ldexp() internally. Will it suffer the same issues with subnormals on Windows? Is CPython int/int true division guaranteed to be correctly rounded?

----------

_______________________________________
Python tracker <report@bugs.python.org>
<https://bugs.python.org/issue45876>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com

[issue45876] Improve accuracy of stdev functions in statistics [ In reply to ]

Nov 26, 2021, 1:33 PM

Post #28 of 41 (104 views)

Mark Dickinson <dickinsm@gmail.com> added the comment:

> Will it suffer the same issues with subnormals on Windows?

No, it should be fine. All the rounding has already happened at the point where ldexp is called, and the result of the ldexp call is exact.

> Is CPython int/int true division guaranteed to be correctly rounded?

Funny you should ask. :-) There's certainly no documented guarantee, and there _is_ a case (documented in comments) where the current code may not return correctly rounded results on machines that use x87: there's a fast path where both numerator and denominator fit into an IEEE 754 double without rounding, and we then do a floating-point division.

But we can't hit that case with the proposed code, since the numerator will always have at least 55 bits, so the fast path is never taken.

----------

_______________________________________
Python tracker <report@bugs.python.org>
<https://bugs.python.org/issue45876>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com

[issue45876] Improve accuracy of stdev functions in statistics [ In reply to ]

Nov 26, 2021, 1:43 PM

Post #29 of 41 (104 views)

Tim Peters <tim@python.org> added the comment:

> Objects/longobject.c::long_true_divide() uses ldexp() internally.
> Will it suffer the same issues with subnormals on Windows?

Doesn't look like it will. In context, looks like it's ensuring that ldexp can only lose trailing 0 bits, so that _whatever_ ldexp does in the way of rounding is irrelevant. But it's not doing this because of Windows - it's to prevent "double-rounding" errors regardless of platform.

> Is CPython int/int true division guaranteed to be correctly rounded?

If there's some promise of that in the docs, I don't know where it is. But the code clearly intends to strive for correct rounding. Ironically, PEP 238 guarantees that if it is correctly rounded, that's purely by accident ;-) :

"""
True division for ints and longs will convert the arguments to float and then apply a float division. That is, even 2/1 will return a float
"""

But i/j is emphatically not implemented via float(i)/float(j).

----------

_______________________________________
Python tracker <report@bugs.python.org>
<https://bugs.python.org/issue45876>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com

[issue45876] Improve accuracy of stdev functions in statistics [ In reply to ]

Nov 26, 2021, 1:46 PM

Post #30 of 41 (104 views)

Mark Dickinson <dickinsm@gmail.com> added the comment:

> All the rounding has already happened at the point where ldexp is called, and the result of the ldexp call is exact.

Sketch of proof:

[Here](https://github.com/python/cpython/blob/4ebde73b8e416eeb1fd5d2ca3283f7ddb534c5b1/Objects/longobject.c#L3929) we have:

shift = Py_MAX(diff, DBL_MIN_EXP) - DBL_MANT_DIG - 2;

from which (assuming IEEE 754 as usual) shift >= -1076. (DBL_MIN_EXP = -1021, DBL_MANT_DIG = 53)

[Here](https://github.com/python/cpython/blob/4ebde73b8e416eeb1fd5d2ca3283f7ddb534c5b1/Objects/longobject.c#L4008) we round away the last two or three bits of x, after which x is guaranteed to be a multiple of 4:

x->ob_digit[0] = low & ~(2U*mask-1U);

Then after converting the PyLong x to a double dx with exactly the same value [here](https://github.com/python/cpython/blob/4ebde73b8e416eeb1fd5d2ca3283f7ddb534c5b1/Objects/longobject.c#L4020) we make the ldexp call:

result = ldexp(dx, (int)shift);

At this point dx is a multiple of 4 and shift >= -1076, so the result of the ldexp scaling is a multiple of 2**-1074, and in the case of a subnormal result, it's already exactly representable.

For the int/int division possibly not being correctly rounded on x87, see [here](https://github.com/python/cpython/blob/4ebde73b8e416eeb1fd5d2ca3283f7ddb534c5b1/Objects/longobject.c#L3889-L3892).

It won't affect _this_ application, but possibly we should fix this anyway. Though the progression of time is already effectively fixing it for us, as x87 becomes less and less relevant.

----------

_______________________________________
Python tracker <report@bugs.python.org>
<https://bugs.python.org/issue45876>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com

[issue45876] Improve accuracy of stdev functions in statistics [ In reply to ]

Nov 26, 2021, 1:59 PM

Post #31 of 41 (104 views)

Mark Dickinson <dickinsm@gmail.com> added the comment:

Concrete example of int/int not being correctly rounded on systems using x87 instructions: on those systems, I'd expect to see 1/2731 return a result of 0.00036616623947272064 (0x1.7ff4005ffd002p-12), as a result of first rounding to 64-bit precision and then to 53-bit. The correctly-rounded result is 0.0003661662394727206 (0x1.7ff4005ffd001p-12).

----------

_______________________________________
Python tracker <report@bugs.python.org>
<https://bugs.python.org/issue45876>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com

[issue45876] Improve accuracy of stdev functions in statistics [ In reply to ]

Nov 26, 2021, 6:22 PM

Post #32 of 41 (104 views)

Raymond Hettinger <raymond.hettinger@gmail.com> added the comment:

> It won't affect _this_ application, but possibly we should
> fix this anyway.

I would like to see this fixed. It affects our ability to
reason about int/int code. That comes up every time a
fraction is fed into a math library function than converts
its input to a float.

----------

_______________________________________
Python tracker <report@bugs.python.org>
<https://bugs.python.org/issue45876>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com

[issue45876] Improve accuracy of stdev functions in statistics [ In reply to ]

Nov 26, 2021, 7:16 PM

Post #33 of 41 (104 views)

Tim Peters <tim@python.org> added the comment:

But I would like to leave it alone. Extended precision simply is not an issue on any current platform I'm aware of ("not even Windows"), and I would, e.g., hate trying to explain to users why

1 / 2731 != 1.0 / 2731.0

(assuming we're not also proposing to take float division away from the HW). It's A Feature that

I / J == float(I) / float(J)

whenever I and J are both representable as floats.

If extended precision is an issue on some platform, fine, let them speak up. On x87 we could document that CPython assumes the FPU's "precision control" is set to 53 bits.

----------

_______________________________________
Python tracker <report@bugs.python.org>
<https://bugs.python.org/issue45876>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com

[issue45876] Improve accuracy of stdev functions in statistics [ In reply to ]

Nov 26, 2021, 9:11 PM

Post #34 of 41 (104 views)

Steven D'Aprano <steve+python@pearwood.info> added the comment:

Raymond, Mark, Tim,

I have been reading this whole thread. Thank you all. I am in awe and a little bit intimidated by how much I still have to learn about floating point arithmetic.

----------

_______________________________________
Python tracker <report@bugs.python.org>
<https://bugs.python.org/issue45876>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com

[issue45876] Improve accuracy of stdev functions in statistics [ In reply to ]

Nov 26, 2021, 9:55 PM

Post #35 of 41 (104 views)

Change by Raymond Hettinger <raymond.hettinger@gmail.com>:

----------
resolution: -> fixed
stage: patch review -> resolved
status: open -> closed

_______________________________________
Python tracker <report@bugs.python.org>
<https://bugs.python.org/issue45876>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com

[issue45876] Improve accuracy of stdev functions in statistics [ In reply to ]

Nov 26, 2021, 9:55 PM

Post #36 of 41 (104 views)

Raymond Hettinger <raymond.hettinger@gmail.com> added the comment:

New changeset af9ee57b96cb872df6574e36027cc753417605f9 by Raymond Hettinger in branch 'main':
bpo-45876: Improve accuracy for stdev() and pstdev() in statistics (GH-29736)
https://github.com/python/cpython/commit/af9ee57b96cb872df6574e36027cc753417605f9

----------
resolution: -> fixed
stage: patch review -> resolved
status: open -> closed

_______________________________________
Python tracker <report@bugs.python.org>
<https://bugs.python.org/issue45876>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com

[issue45876] Improve accuracy of stdev functions in statistics [ In reply to ]

Nov 26, 2021, 9:59 PM

Post #37 of 41 (104 views)

Raymond Hettinger <raymond.hettinger@gmail.com> added the comment:

Thank you all for looking at this. It's unlikely that anyone will ever notice the improvement, but I'm happy with it and that's all the matters ;-)

----------

_______________________________________
Python tracker <report@bugs.python.org>
<https://bugs.python.org/issue45876>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com

[issue45876] Improve accuracy of stdev functions in statistics [ In reply to ]

Nov 28, 2021, 7:25 PM

Post #38 of 41 (103 views)

Change by Raymond Hettinger <raymond.hettinger@gmail.com>:

----------
pull_requests: +28059
pull_request: https://github.com/python/cpython/pull/29828

_______________________________________
Python tracker <report@bugs.python.org>
<https://bugs.python.org/issue45876>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com

[issue45876] Improve accuracy of stdev functions in statistics [ In reply to ]

Nov 30, 2021, 4:20 PM

Post #39 of 41 (103 views)

Raymond Hettinger <raymond.hettinger@gmail.com> added the comment:

New changeset a39f46afdead515e7ac3722464b5ee8d7b0b2c9b by Raymond Hettinger in branch 'main':
bpo-45876: Correctly rounded stdev() and pstdev() for the Decimal case (GH-29828)
https://github.com/python/cpython/commit/a39f46afdead515e7ac3722464b5ee8d7b0b2c9b

----------

_______________________________________
Python tracker <report@bugs.python.org>
<https://bugs.python.org/issue45876>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com

[issue45876] Improve accuracy of stdev functions in statistics [ In reply to ]

Nov 30, 2021, 4:27 PM

Post #40 of 41 (103 views)

Change by Raymond Hettinger <raymond.hettinger@gmail.com>:

----------
pull_requests: +28095
pull_request: https://github.com/python/cpython/pull/29869

_______________________________________
Python tracker <report@bugs.python.org>
<https://bugs.python.org/issue45876>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com

[issue45876] Improve accuracy of stdev functions in statistics [ In reply to ]

Nov 30, 2021, 5:26 PM

Post #41 of 41 (103 views)

Raymond Hettinger <raymond.hettinger@gmail.com> added the comment:

New changeset 0aa0bd056349f73de9577ccc38560c1d01864d51 by Raymond Hettinger in branch 'main':
bpo-45876: Have stdev() also use decimal specific square root. (GH-29869)
https://github.com/python/cpython/commit/0aa0bd056349f73de9577ccc38560c1d01864d51

----------

_______________________________________
Python tracker <report@bugs.python.org>
<https://bugs.python.org/issue45876>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com