Mailing List Archive

[issue40116] Regression in memory use of shared key dictionaries for "compact dicts"
Mark Shannon <mark@hotpy.org> added the comment:

This can be mitigated, if not entirely fixed, by storing an ordering bit vector in the values.

This way all instances of the class SometimesShared in the example above can share the keys.

The keys might be ("optional", "attr")

For any instances with only "attr" as an attibute, the values would be (NULL, value) and the order would be (1,)

The downsides of this approach are:
1. Each values, and thus dict needs an extra 64 bit value.
2. Shared keys have a maximum size of 16.

Overall, I expect the improved sharing to more than compensate for the disadvantages.

----------

_______________________________________
Python tracker <report@bugs.python.org>
<https://bugs.python.org/issue40116>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue40116] Regression in memory use of shared key dictionaries for "compact dicts" [ In reply to ]
Change by Mark Shannon <mark@hotpy.org>:


----------
keywords: +patch
pull_requests: +26911
stage: test needed -> patch review
pull_request: https://github.com/python/cpython/pull/28520

_______________________________________
Python tracker <report@bugs.python.org>
<https://bugs.python.org/issue40116>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue40116] Regression in memory use of shared key dictionaries for "compact dicts" [ In reply to ]
Raymond Hettinger <raymond.hettinger@gmail.com> added the comment:

> Overall, I expect the improved sharing to more than
> compensate for the disadvantages.

I expect the opposite. This makes all dicts pay a price (in space, initialization time, and complexity) for a micro-optimization of an uncommon case (the normal case is for __init__ to run and set all the keys in a consistent order). It is unlikely that the "benefits" to never be felt in real-word applications, but "disadvantages" would affect every Python program.

> The language specification says that the dicts maintain insertion
> order, but the wording implies that this only to explicit
> dictionaries, not instance attribute or other namespace dicts.

That is a quite liberal reading of the spec. I would object to making instance and namespace dicts behave differently. That would be a behavior regression and we would forever have to wrestle with the difference.

----------

_______________________________________
Python tracker <report@bugs.python.org>
<https://bugs.python.org/issue40116>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue40116] Regression in memory use of shared key dictionaries for "compact dicts" [ In reply to ]
Marc-Andre Lemburg <mal@egenix.com> added the comment:

On 22.09.2021 21:02, Raymond Hettinger wrote:
>> The language specification says that the dicts maintain insertion
>> order, but the wording implies that this only to explicit
>> dictionaries, not instance attribute or other namespace dicts.
>
> That is a quite liberal reading of the spec. I would object to making instance and namespace dicts behave differently. That would be a behavior regression and we would forever have to wrestle with the difference.

I agree. Keeping the insertion order is essential for many common
use cases, including those where a class or instance dict is used,
e.g. namespaces used for data records, data caches, field
definitions in data records, etc. (and yes, those often can be
dynamically extended as well :-)).

I think for the case you mention, a documentation patch would be
better and more helpful for the programmers. Point them to slots
and the sharing problem should go away in most cases :-)

----------
nosy: +lemburg

_______________________________________
Python tracker <report@bugs.python.org>
<https://bugs.python.org/issue40116>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue40116] Regression in memory use of shared key dictionaries for "compact dicts" [ In reply to ]
Mark Shannon <mark@hotpy.org> added the comment:

Raymond,

Only split dicts need the extra field.

Classes where many instances do not have exactly the same set of attributes may be more common than you think.
There are many reasons why some attributes may be added conditionally.

PR 28520 actually makes the dictionary code a bit simpler, as we don't need to maintain the invariant that value arrays cannot have holes. Maintaining an order is simple and cheap:

order = (order<<4) | insertion_index

There are pros and cons to both schemes: PR 28520 and the current implementation.

The problem with the current scheme is that it only works well for classes where all instances are initialized with exactly the same attributes, and in the same order.

The PR 28520 scheme can handle those cases where order and key set differ a bit, but has a maximum size of 16 before the dict must be combined.


We need as many instances as possible to have split dictionaries, to get https://github.com/faster-cpython/ideas/issues/72 working well as it will make the cost of not sharing even greater, relatively.

----------

_______________________________________
Python tracker <report@bugs.python.org>
<https://bugs.python.org/issue40116>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue40116] Regression in memory use of shared key dictionaries for "compact dicts" [ In reply to ]
Serhiy Storchaka <storchaka+cpython@gmail.com> added the comment:

There are several common idioms which do not work well with shared dictionaries.

1. Class attributes as defaults. If most instances of the class have the default value for some attribute, it can be set as the class attribute. It saves memory because most instances will have smaller dict. But if the first instance has such attribute, it cancels shared dict for all following instances without this attribute.

2. cached_property (and analogs). The cached instance attributes can be added in arbitrary order, canceling shared dict for this instance and maybe for all instances created later.

3. Some classes delete attributes in close() or __exit__() methods to avoid reference loops and to release resources earlier. Since shared dicts do not support deletion, such closed objects have now larger size than non-closed objects.

----------

_______________________________________
Python tracker <report@bugs.python.org>
<https://bugs.python.org/issue40116>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue40116] Regression in memory use of shared key dictionaries for "compact dicts" [ In reply to ]
Mark Shannon <mark@hotpy.org> added the comment:


New changeset a7252f88d3fa33036bdd6036b8c97bc785ed6f17 by Mark Shannon in branch 'main':
bpo-40116: Add insertion order bit-vector to dict values to allow dicts to share keys more freely. (GH-28520)
https://github.com/python/cpython/commit/a7252f88d3fa33036bdd6036b8c97bc785ed6f17


----------

_______________________________________
Python tracker <report@bugs.python.org>
<https://bugs.python.org/issue40116>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com