Mailing List Archive

[issue5084] unpickling does not intern attribute names
Changes by Jake McGuire <jake@youtube.com>:


Added file: http://bugs.python.org/file12880/pickle.py.diff

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue5084>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue5084] unpickling does not intern attribute names [ In reply to ]
New submission from Jake McGuire <jake@youtube.com>:

Instance attribute names are normally interned - this is done in
PyObject_SetAttr (among other places). Unpickling (in pickle and
cPickle) directly updates __dict__ on the instance object. This
bypasses the interning so you end up with many copies of the strings
representing your attribute names, which wastes a lot of space, both in
RAM and in pickles of sequences of objects created from pickles. Note
that the native python memcached client uses pickle to serialize
objects.

>>> import pickle
>>> class C(object):
... def __init__(self, x):
... self.long_attribute_name = x
...
>>> len(pickle.dumps([pickle.loads(pickle.dumps(C(None),
pickle.HIGHEST_PROTOCOL)) for i in range(100)],
pickle.HIGHEST_PROTOCOL))
3658
>>> len(pickle.dumps([C(None) for i in range(100)],
pickle.HIGHEST_PROTOCOL))
1441
>>>

Interning the strings on unpickling makes the pickles smaller, and at
least for cPickle actually makes unpickling sequences of many objects
slightly faster. I have included proposed patches to cPickle.c and
pickle.py, and would appreciate any feedback.

----------
components: Library (Lib)
files: cPickle.c.diff
keywords: patch
messages: 80670
nosy: jakemcguire
severity: normal
status: open
title: unpickling does not intern attribute names
type: resource usage
Added file: http://bugs.python.org/file12879/cPickle.c.diff

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue5084>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue5084] unpickling does not intern attribute names [ In reply to ]
New submission from Jake McGuire <jake@youtube.com>:

Instance attribute names are normally interned - this is done in
PyObject_SetAttr (among other places). Unpickling (in pickle and
cPickle) directly updates __dict__ on the instance object. This
bypasses the interning so you end up with many copies of the strings
representing your attribute names, which wastes a lot of space, both in
RAM and in pickles of sequences of objects created from pickles. Note
that the native python memcached client uses pickle to serialize
objects.

>>> import pickle
>>> class C(object):
... def __init__(self, x):
... self.long_attribute_name = x
...
>>> len(pickle.dumps([pickle.loads(pickle.dumps(C(None),
pickle.HIGHEST_PROTOCOL)) for i in range(100)],
pickle.HIGHEST_PROTOCOL))
3658
>>> len(pickle.dumps([C(None) for i in range(100)],
pickle.HIGHEST_PROTOCOL))
1441
>>>

Interning the strings on unpickling makes the pickles smaller, and at
least for cPickle actually makes unpickling sequences of many objects
slightly faster. I have included proposed patches to cPickle.c and
pickle.py, and would appreciate any feedback.

----------
components: Library (Lib)
files: cPickle.c.diff
keywords: patch
messages: 80670
nosy: jakemcguire
severity: normal
status: open
title: unpickling does not intern attribute names
type: resource usage
Added file: http://bugs.python.org/file12879/cPickle.c.diff

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue5084>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue5084] unpickling does not intern attribute names [ In reply to ]
Gabriel Genellina <gagsl-py2@yahoo.com.ar> added the comment:

Either my browser got crazy, or you uploaded the same patch (.py) twice.

----------
nosy: +gagenellina

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue5084>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue5084] unpickling does not intern attribute names [ In reply to ]
Changes by Jake McGuire <jake@youtube.com>:


Removed file: http://bugs.python.org/file12879/cPickle.c.diff

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue5084>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue5084] unpickling does not intern attribute names [ In reply to ]
Changes by Jake McGuire <jake@youtube.com>:


Added file: http://bugs.python.org/file12882/cPickle.c.diff

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue5084>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue5084] unpickling does not intern attribute names [ In reply to ]
Alexandre Vassalotti <alexandre@peadrop.com> added the comment:

The patch for cPickle doesn't do the same thing as the pickle one. In
the cPickle one, you are only interning slot attributes, which, I
believe, is not what you intended. :-)

----------
assignee: -> alexandre.vassalotti
nosy: +alexandre.vassalotti

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue5084>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue5084] unpickling does not intern attribute names [ In reply to ]
Jake McGuire <jake@youtube.com> added the comment:

Are you sure? I may not have enough context in my diff, which I
should have done against an anonymous svn checkout, but it looks like
the slot attributes get set several lines after my diff. "while
(PyDict_Next(slotstate, ...))" as opposed to the "while
(PyDict_Next(state, ...))" in my change...

-jake

On Jan 27, 2009, at 6:54 PM, Alexandre Vassalotti wrote:

>
> Alexandre Vassalotti <alexandre@peadrop.com> added the comment:
>
> The patch for cPickle doesn't do the same thing as the pickle one. In
> the cPickle one, you are only interning slot attributes, which, I
> believe, is not what you intended. :-)
>
> ----------
> assignee: -> alexandre.vassalotti
> nosy: +alexandre.vassalotti
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue5084>
> _______________________________________

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue5084>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue5084] unpickling does not intern attribute names [ In reply to ]
Alexandre Vassalotti <alexandre@peadrop.com> added the comment:

Oh, you are right. I was looking at py3k's version of pickle, which uses
PyDict_Update instead of a while loop; that is why I got confused.

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue5084>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue5084] unpickling does not intern attribute names [ In reply to ]
Changes by Jesús Cea Avión <jcea@jcea.es>:


----------
nosy: +jcea

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue5084>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue5084] unpickling does not intern attribute names [ In reply to ]
Antoine Pitrou <pitrou@free.fr> added the comment:

Why do you call PyString_AsString() followed by PyString_FromString()?
Strings are immutable so you shouldn't neek to take a copy.

Besides, it would be nice to add a test.

----------
nosy: +pitrou

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue5084>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue5084] unpickling does not intern attribute names [ In reply to ]
Changes by Collin Winter <collinw@gmail.com>:


----------
nosy: +collinwinter, jyasskin

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue5084>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com