Mailing List Archive

[issue1757062] Pickle fails on BeautifulSoup's navigableString instances
Christophe Michel added the comment:

I started by isolating the most minimalist code that triggers the error.
If you play a bit with NavigableString, you will end up with the
attached code.

As expected, this program fails with RuntimeError: maximum recursion
depth exceeded
The evil recursion proceeds as follows :

>> File "C:\Python25\lib\pickle.py", line 1364, in dump
>> Pickler(file, protocol).dump(obj)

Initial call to dump(), as intended.

>> File "C:\Python25\lib\pickle.py", line 224, in dump
>> self.save(obj)

save() calls obj.__reduce_ex(), obj being our EvilString instance.

This function is defined in copyreg.py, line 58 and following my
example, returns a tuple containing three elements:
1) the _reconstructor function, as defined in copyreg.py, line 46
2) a tuple : (<class '__main__.EvilString'>, <type 'unicode'>,
<'__main__.EvilString' instance at 0xXXXXXXXX>)
First element is the actual class of obj, second is the base class,
and third is the current instance (known as state).
3) an empty dict {}

>> File "C:\Python25\lib\pickle.py", line 331, in save
>> self.save_reduce(obj=obj, *rv)

save_reduce() calls self.save() twice:
- first on the func argument, which is the _reconstructor function. This
call works as intended
- next on the tuple (<class '__main__.EvilString'>, <type 'unicode'>,
<'__main__.EvilString' instance at 0xXXXXXXXX>)

>> File "C:\Python25\lib\pickle.py", line 403, in save_reduce
>> save(args)
>> File "C:\Python25\lib\pickle.py", line 286, in save
>> f(self, obj) # Call unbound method with explicit self

save() finds out its argument is a Tuple, and calls save_tuple()
appropriately

>> File "C:\Python25\lib\pickle.py", line 564, in save_tuple
>> save(element)

... and save_tuple() calls save() on each element of the tuple.
See what's wrong ?
This means calling save() again on the EvilString instance. Which, in
turn, will call save_reduce() on it, and so on.

The problem lies in _reduce_ex(), in the definition of the state of the
object:

copyreg.py, lines 65 to 70:
if base is object:
state = None
else:
if base is self.__class__:
raise TypeError, "can't pickle %s objects" % base.__name__
state = base(self)

When this code gets executed on an EvilString instance, base is the type
'unicode'.
Since it's not an object, and since it's not the actual class EvilString
either, the following line gets executed:
state=base(self)

Which corresponds to unicode(self), or self.__unicode__, which returns
an EvilString instance, not a variable of type unicode.
And there starts the recursion.

I don't know if this is flaw in the design of _reduce_ex, or a flaw
inherent to having __unicode__(self) returning self.
My guess is the latter is right.

----------
nosy: +altherac

_____________________________________
Tracker <report@bugs.python.org>
<http://bugs.python.org/issue1757062>
_____________________________________
[issue1757062] Pickle fails on BeautifulSoup's navigableString instances [ In reply to ]
Georg Brandl added the comment:

This is indeed tricky. The docs say __unicode__ "should return a Unicode
object", so I'm inclined to blame BeautifulSoup.

Asking Neal for a second opinion.

----------
assignee: -> nnorwitz
nosy: +georg.brandl

_____________________________________
Tracker <report@bugs.python.org>
<http://bugs.python.org/issue1757062>
_____________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com