Mailing List Archive: python/dist/src/Objects longobject.c,1.124,1.125

Update of /cvsroot/python/python/dist/src/Objects
In directory usw-pr-cvs1:/tmp/cvs-serv16200/python/Objects

Modified Files:
longobject.c
Log Message:
x_mul(): This failed to normalize its result.

k_mul(): This didn't allocate enough result space when one input had
more than twice as many bits as the other. This was partly hidden by
that x_mul() didn't normalize its result.

The Karatsuba recurrence is pretty much hosed if the inputs aren't
roughly the same size. If one has at least twice as many bits as the
other, we get a degenerate case where the "high half" of the smaller
input is 0. Added a special case for that, for speed, but despite that
it helped, this can still be much slower than the "grade school" method.
It seems to take a really wild imbalance to trigger that; e.g., a
2**22-bit input times a 1000-bit input on my box runs about twice as slow
under k_mul than under x_mul. This still needs to be addressed.

I'm also not sure that allocating a->ob_size + b->ob_size digits is
enough, given that this is computing k = (ah+al)*(bh+bl) instead of
k = (ah-al)*(bl-bh); i.e., it's certainly enough for the final result,
but it's vaguely possible that adding in the "artificially" large k may
overflow that temporarily. If so, an assert will trigger in the debug
build, but we'll probably compute the right result anyway(!).

Index: longobject.c
===================================================================
RCS file: /cvsroot/python/python/dist/src/Objects/longobject.c,v
retrieving revision 1.124
retrieving revision 1.125
diff -C2 -d -r1.124 -r1.125
*** longobject.c 12 Aug 2002 05:09:36 -0000 1.124
--- longobject.c 12 Aug 2002 06:17:58 -0000 1.125
***************
*** 1557,1561 ****
}
}
! return z;
}

--- 1557,1561 ----
}
}
! return long_normalize(z);
}

***************
*** 1631,1636 ****

/* Use gradeschool math when either number is too small. */
! if (ABS(a->ob_size) <= KARATSUBA_CUTOFF)
! return x_mul(a, b);

shift = ABS(b->ob_size) >> 1;
--- 1631,1643 ----

/* Use gradeschool math when either number is too small. */
! if (ABS(a->ob_size) <= KARATSUBA_CUTOFF) {
! /* 0 is inevitable if one kmul arg has more than twice
! * the digits of another, so it's worth special-casing.
! */
! if (a->ob_size == 0)
! return _PyLong_New(0);
! else
! return x_mul(a, b);
! }

shift = ABS(b->ob_size) >> 1;
***************
*** 1642,1646 ****

/* Allocate result space, and copy ahbh into the high digits. */
! ret = _PyLong_New(ahbh->ob_size + 2*shift + 1);
if (ret == NULL) goto fail;
#ifdef Py_DEBUG
--- 1649,1653 ----

/* Allocate result space, and copy ahbh into the high digits. */
! ret = _PyLong_New(ABS(a->ob_size) + ABS(b->ob_size));
if (ret == NULL) goto fail;
#ifdef Py_DEBUG
***************
*** 1648,1655 ****
memset(ret->ob_digit, 0xDF, ret->ob_size * sizeof(digit));
#endif
memcpy(ret->ob_digit + 2*shift, ahbh->ob_digit,
ahbh->ob_size * sizeof(digit));
! /* That didn't copy into the most-significant (overflow) digit. */
! ret->ob_digit[ret->ob_size - 1] = 0;

/* Compute al*bl, and copy into the low digits. */
--- 1655,1667 ----
memset(ret->ob_digit, 0xDF, ret->ob_size * sizeof(digit));
#endif
+ assert(2*shift + ahbh->ob_size <= ret->ob_size);
memcpy(ret->ob_digit + 2*shift, ahbh->ob_digit,
ahbh->ob_size * sizeof(digit));
!
! /* Zero-out the digits higher than the ahbh copy. */
! i = ret->ob_size - 2*shift - ahbh->ob_size;
! if (i)
! memset(ret->ob_digit + 2*shift + ahbh->ob_size, 0,
! i * sizeof(digit));

/* Compute al*bl, and copy into the low digits. */