Mailing List Archive: On storage of attributes and Pad structures

TL;DR: A bunch of questions:
1) Where is best to store definitions of attributes? Maybe the pad?
2) Do we want to split the pad into two parts - individual mutables per
CvDEPTH, plus one set of constants per CV overall?

This is a bit of a sprawling mess of thoughts, related to my current
hackery at adding an `attributes.c`, but also overlaps with a few other
thoughts I had about pads.

On attributes: I currently have an attributes.c that provides a
"register" function for code to say "here's a new attribute", and some
lookup-and-apply functions to actually apply the registered attributes
to a given target. It works nicely for the new stuff in class.c and I'm
working on adjusting some of the existing things (code, variables) to
use those too.

The current storage mechanism is a copy of the only way I found to
implement it in Object::Pad, being a linked-list of Newx()'d structs.
It's kindof terrible - the only improvement this core version makes on
that situation is that the root pointer is an interpreter var rather
than literally a static pointer in the .c file. I want to think of a
better way to do it.

At the moment, the core mechanism doesn't have any scoping/visibility
rules; it's all globally visible. The one in Object::Pad allows each
attribute to have a "lexical hint hash key", a name that has to exist
in %^H otherwise it won't apply. This is about the best thing that a
CPAN module could do for visibility, but for core perl we can do better.

In particular, I'm thinking that the lexically-scoped pad already has
the right kind of visibility rules to it, and would be a good place for
core to provide storage of lexically-scoped attributes. We even have
convenient sigils on pad names; so it would seem quite easy to add
attributes using a ':' sigil; as per

pad_add_my(":attrname", ...);

These would make them easy to lexically-import when `use`ing a module,
much like other things can be (lexically) imported - functions,
variables, etc... Scoping allows them to be seen within the right block
and disappear afterwards.

This does bring up the question of what kind of SV would be used to
represent an attribute - as now it requires something of an SV shape.
Previously, it was a structure with various fields storing flags, vtable
function pointers, etc... Having added SVt_PVOBJ I had to expand the
size of the "SV type" bitfield anyway, so we have space for another 14
or so types. We could potentially use another one here.

The use of the pad for these things does start to concern me though. We
seem to be using the pad for a mix of two different kinds of things
lately, and the more of each we add, the more it gets in the way of the
other.

* Some pad slots store mutable values that need to be kept distinct
per call - actual lexicals, and other expression temporaries. Each
nested call of a function requires an entire new set of these.

* Other pad slots become read-only constants for the entire lifetime
of the program - actual constants in code, captured GVs, captured
outer lexicals, etc... These are constant per CV, but need to get
copied for each nested call so they exist in every layer.

The trouble with storing both kinds of things in the same list is that
every time the CvDEPTH of the CV increases, the entire pad structure
has to be copied. Fresh SVs need to be created for those first kind,
but also the entire set of those second kind of SVs gets copied as well
- which would get worse the wider the pad is. Mixing both kinds of SV
into the same pad makes the pad both wide *and* deep - which has
performance impacts in terms of CPU and memory. Adding attributes into
pads to implement their scoping rules just adds to the problem. There
will also be more information stored in methods of roles that may end
up in the pad as well, and would be a constant for each CV.

I begin to wonder whether somehow we need to split the pad into two
separate ones. An actual "scratchpad" (hence the name) for temporaries,
whose depth does get increased per nested call, and a totally separate
"list of constants", which is stored just once and referred to by every
call of that function. This would come at the cost of extra code
complication, and no doubt some bugs and XS module incompatibilities
while we worked out the best way to do it, but would (hopefully) give
advantages in terms of smaller memory usage overall, and faster
runtime of OP_ENTERSUB having less work to do when extending the
padlist.

Thoughts on either?

--
Paul "LeoNerd" Evans

leonerd@leonerd.org.uk | https://metacpan.org/author/PEVANS
http://www.leonerd.org.uk/ | https://www.tindie.com/stores/leonerd/