Mailing List Archive: A better model for attributes

A better model for attributes

Nov 14, 2023, 4:38 AM

Post #1 of 5 (110 views)

TL;DR:
I'm currently thinking about how to implement a few next steps in
feature 'class', such as the :reader and :writer attributes on fields.
I suspect this needs quite a rethink of the code shapes provided by
core perl, but I'm not 100% sure what it should become.

The way that attributes are currently handled by Perl generally is not
great. A few built-in ones like `:lvalue` on subs are very hard-coded
into the parser by direct blocks of code guarded by string comparisons;
e.g. around here:

https://github.com/Perl/perl5/blob/09bf96e5f12e735089467e76df29747c878d4097/toke.c#L12844

Anything that isn't directly recognised by the core perl parser then
gets handled via the equally-terrible APPLY_CODE_ATTRIBUTES interface.
(This is bad because it's based on class inheritence rather than
lexical scoping, one single place for everything instead of a
per-attribute named lookup, and has very little power to actually *do*
anything)

This is all pretty terrible from an extensibility perspective - both for
adding more code internally in core itself, and allowing thirdparty
(i.e. CPAN modules) to provide more of them.

I'd like to change this around quite a lot to make it a lot easier to
extend - both from core and modules, and also to give these things more
power, overall. I have a partial design for a much better system, but
it still needs some more work. That design is in Object::Pad.

The way that these things work in Object::Pad is very different.
Object::Pad stores a registry of known attributes; each attribute has a
name, a key to find in the lexical hints hash that enables it, and a
set of flags and function pointers that implement various stages of its
behaviour. This table acts a little bit like a magic vtable or similar
such that we already have. For example, the structure for field hooks
looks like this:

https://metacpan.org/release/PEVANS/Object-Pad-0.805/source/include/object_pad.h#L59

As a structure, this is already a lot nicer than what core perl does
for attributes in a number of ways:

* Each attribute individual is handled in entirely independent code,
rather than having one overall function for all of them at once.

* O:P allows both its own code and third-party modules to register new
ones. There's quite a few more CPAN modules now that implement more
field attributes.

* O:P uses a key in the lexical hint hash to enable visibility of each
attribute; thus they are all neatly lexically scoped by their
importing modules.

* The set of vtable function pointers in the structure allows a very
fine-grained set of behaviours to be implemented on each hook, by
letting the attribute's implementation customise and change a wide
variety of base behaviours around the thing it is attached to.

The entire shape exists twice in Object::Pad - once for fields, once
for classes. I then copied a similar shape of thing into
XS::Parse::Sublike to allow modules to create attributes on signature
parameters - this is what allows the :Checked attribute for value
constraints:

`extended sub f($x :Checked(Num), $y :Checked(Str)) { ... }`

Overall, this feels like a sufficiently powerful and flexible mechanism
that I'd like to stop copying it in custom weird ways each time I want
it, and provide one standard nice thing in core perl. It isn't quite
perfect though; there's still a few things about it I don't like. I
don't think it's quite right yet to copy into core perl, without some
variations and changes.

Primarily, there's a tight coupling between a named attribute and a set
of behaviours that can be implemented. The only way to create custom
behaviours is to create a named attribute, the two are part of the same
thing. This might get complicated to use if you wanted to apply some
custom behaviour to something by a mechanism other than applying an
attribute to it. It also gets in the way of trying to reuse the same
attribute name when applied in multiple places at once - e.g. :Checked
on both object fields and subroutine parameters.

I wonder, therefore, whether a better way to handle attributes
generally in core perl would be more of a two-part setup.

1. Create a (lexically-scoped) registry of named attributes, each with
a set of flags to say to what it applies (regular lexicals, fields,
signature parameters, packages, classes, subroutines, etc...), and
a function pointer to an "apply" function, that gets invoked when
the attribute is applied to a thing.

2. Define a collection of "hook" structures - slight copies of the
ones I currently have in Object::Pad and XS::Parse::Sublike - that
would be somewhat similar to the Magic vtables that currently exist
on SVs.

Then in practice, the common approach taken by a CPAN module that
wanted to provide some custom behaviour in an attribute would be to
first create that hook structure, similar to how they might use magic
now, and create an attribute whose "apply" function attached that hook
to the hook thing - again, similar to sv_magicext() now.

Whereas, a core Perl attribute such as `:lvalue` on a subroutine
wouldn't need that hook structure; its "apply" function can just
directly invoke CvLVALUE_on() like it does currently.

I suspect part 1 above would be a relatively simple fixed task, whereas
part 2 involves weaving ever-more interesting hook functions into
ever-more exciting parts of the perl interpreter. It'd be quite
open-ended and subject to a lot of experimentation and careful
version-numbering to ensure things work nicely. I've found in practice
from Object::Pad that it takes a long time of trying to use the hook
mechanism for real to implement real things, before you get a good feel
for what extension functions are required. And there's always new
situations that end up needing more hook points. This really is the
part that I'm least sure about, overall.

I don't have a firm exact design yet, just some vague thoughts-out-loud
that I have written above, and want to start experimenting with
sometime soon. If anyone wants to throw any thoughts or opinions my way
on any of this, consider this your opportunity to do so.

Thanks all,

--
Paul "LeoNerd" Evans

leonerd@leonerd.org.uk | https://metacpan.org/author/PEVANS
http://www.leonerd.org.uk/ | https://www.tindie.com/stores/leonerd/

Re: A better model for attributes [ In reply to ]

leonerd at leonerd

Nov 14, 2023, 5:48 AM

Post #2 of 5 (110 views)

Permalink

On Tue, 14 Nov 2023 12:38:23 +0000
"Paul \"LeoNerd\" Evans" <leonerd@leonerd.org.uk> wrote:

> 2. Define a collection of "hook" structures - slight copies of the
> ones I currently have in Object::Pad and XS::Parse::Sublike - that
> would be somewhat similar to the Magic vtables that currently
> exist on SVs.

Also while I think of it:

Leon - this sounds like it might have quite some overlap with our
previous discussions on making the existing magic work better for
AV/HVs, in terms of key lookup etc...

--
Paul "LeoNerd" Evans

leonerd@leonerd.org.uk | https://metacpan.org/author/PEVANS
http://www.leonerd.org.uk/ | https://www.tindie.com/stores/leonerd/

Re: A better model for attributes [ In reply to ]

happy.barney at gmail

Nov 16, 2023, 12:58 PM

Post #3 of 5 (107 views)

Permalink

I tried to address probably same problem in independent symbol spaces
- for importing symbols from different symbol spaces
- for binding independent symbol value to anything

I had few formal requirements:
- attribute declaration is package scoped
- fully qualified attribute names should be allowed
- it should be possible to overload builtin attributes (still available via
builtin:: syntax)
- attribute value is parsed by custom grammar (defaults to
"perl-expression")
- it should be possible to define new context where attribute can be used

1. Attribute context

Intended for use in parsers, eg:
class_attributes:
{ push_attribute_context (ATTRIBUTE_CONTEXT_NAMESPACE); }
attributes
{ pop_attribute_context (); $$ = $attributes }

Importing of symbols from independent symbols spaces:
use_statement:
KW_USE
{ push_attribute_context (ATTRIBUTE_CONTEXT_USE); }
optional_attributes
{ pop_attribute_context (); }
...

2. Attribute registry

Attribute is registered using its fully qualified name (in case of builtin
attributes i'd suggest eg builtin::label)
Attributes are registered:
- to scope they are valid
- with parser of attribute value (NULL may mean attribute doesn't take any
value)
- compile-time hook (if NULL, fallback to current behaviour)

Hook will receive:
- target (eg builtin::lvalue will get CV *)
- attribute context (eg: builting::lvalue will get ATTRIBUTE_CONTEXT_CODE)
- attribute value
- hook data (goes with registration, managed by hook)

3. Compile time "apply attributes"

Should be invoked manually from parser, eg
apply_attributes (target, attrbiute_context, attributes)

When attribute name is registered fully qualified attribute, run hook
Otherwise look into namespace:
- current package
- builtin

4. Fully qualified attributes and custom grammars
To reduce complexity of tokenizer and parser, let's introduce new grammar
for attributes:
:= attribute
:= attrribute => { perl expression returning value }
:= attribute => custom expression parsed by registered grammar

prefix token := is mandatory, := expression should act like comma
custom grammar must recognize unbalanced := and comma as non-grammar symbols

For mvp it will be imho enough to support:
- only C API
- register_attribute_context
- register_attribute
- attribute_value_lookup (target, attribute_context, attribute)
- restriction that attribute can be declared only at package level (GV *)
- new context USE (to allow import of attributes)
- in fact implementation of import/export part of independent symbol
spaces)
- will be enough to support only exact symbol enumeration (whitespace /
comment / pod separated)

Brano

Re: A better model for attributes [ In reply to ]

leonerd at leonerd

Nov 17, 2023, 8:57 AM

Post #4 of 5 (107 views)

Permalink

On Tue, 14 Nov 2023 12:38:23 +0000
"Paul \"LeoNerd\" Evans" <leonerd@leonerd.org.uk> wrote:

> I wonder, therefore, whether a better way to handle attributes
> generally in core perl would be more of a two-part setup.
>
> 1. Create a (lexically-scoped) registry of named attributes, each
> with a set of flags to say to what it applies (regular lexicals,
> fields, signature parameters, packages, classes, subroutines,
> etc...), and a function pointer to an "apply" function, that gets
> invoked when the attribute is applied to a thing.

I have started a hack at this first part of the process. Extracting the
code out of class.c into a more generically-shared mechanism was easy
enough.

Next up I'm looking at migrating things like the builtin code
attributes (which live inexplicably directly in toke.c) out into the
much nicer system. For example, the :lvalue attribute is handled with
code like

STATIC void
apply_code_attribute_lvalue(pTHX_ enum AttributeSubject stype, void *subject, SV *value)
{
CV *cv = (CV *)subject;
CvLVALUE_on(cv);
}

registered on startup with

register_attribute("lvalue", ATTRSUBJECT_SUBROUTINE, ATTRf_NO_VALUE, &apply_code_attribute_lvalue);

There's a *tiny* change in behaviour that comes of this, however.

In existing core perl, the attribute must be given as exactly
`:lvalue`, no value in parentheses is permitted, not even an empty one:

$ perl -E 'sub x :lvalue() {}'
Invalid CODE attribute: lvalue() at -e line 1.

However, due to my newly adjusted code structure, the attribute value
is parsed out and handled separately from parsing just the attribute
name, and only causes an error if it's nonempty. Empty parens are
permitted and just mean the same thing as their total absence.

$ ./perl -E 'sub x :lvalue { }'

$ ./perl -E 'sub x :lvalue() { }'

$ ./perl -E 'sub x :lvalue(xx) { }'
Subroutine attribute lvalue does not take a value at -e line 1.

As this is only a change of wording of error messages in
compiletime-error cases, plus a *slight* expansion of what is
considered valid, now not failing on what used to fail, I don't feel
that this is a huge departure of behaviour.

I'm not totally fixed on this though, I could perhaps be convinced to
treat them differently, but I do like the similarity of the fact of
calling regular functions - a call without parens at all is the same as
empty parens, and the invoked function can't tell.

Any opinions from any folks who're likely to comment at PR-time? :)

--
Paul "LeoNerd" Evans

leonerd@leonerd.org.uk | https://metacpan.org/author/PEVANS
http://www.leonerd.org.uk/ | https://www.tindie.com/stores/leonerd/

Re: A better model for attributes [ In reply to ]

happy.barney at gmail

Nov 18, 2023, 1:52 AM

Post #5 of 5 (107 views)

Permalink

It will be nice:
- to store those attributes in struct gp
- lookup for attributes in "builtin" namespace

that will allow nice follow-ups, at least for me, as I intend to:
- continue with independent symbol spaces
- continue with custom grammar support for attribute values

On Fri, 17 Nov 2023 at 17:57, Paul "LeoNerd" Evans <leonerd@leonerd.org.uk>
wrote:

> On Tue, 14 Nov 2023 12:38:23 +0000
> "Paul \"LeoNerd\" Evans" <leonerd@leonerd.org.uk> wrote:
>
> > I wonder, therefore, whether a better way to handle attributes
> > generally in core perl would be more of a two-part setup.
> >
> > 1. Create a (lexically-scoped) registry of named attributes, each
> > with a set of flags to say to what it applies (regular lexicals,
> > fields, signature parameters, packages, classes, subroutines,
> > etc...), and a function pointer to an "apply" function, that gets
> > invoked when the attribute is applied to a thing.
>
> I have started a hack at this first part of the process. Extracting the
> code out of class.c into a more generically-shared mechanism was easy
> enough.
>
> Next up I'm looking at migrating things like the builtin code
> attributes (which live inexplicably directly in toke.c) out into the
> much nicer system. For example, the :lvalue attribute is handled with
> code like
>
> STATIC void
> apply_code_attribute_lvalue(pTHX_ enum AttributeSubject stype, void
> *subject, SV *value)
> {
> CV *cv = (CV *)subject;
> CvLVALUE_on(cv);
> }
>
> registered on startup with
>
> register_attribute("lvalue", ATTRSUBJECT_SUBROUTINE, ATTRf_NO_VALUE,
> &apply_code_attribute_lvalue);
>
> There's a *tiny* change in behaviour that comes of this, however.
>
> In existing core perl, the attribute must be given as exactly
> `:lvalue`, no value in parentheses is permitted, not even an empty one:
>
> $ perl -E 'sub x :lvalue() {}'
> Invalid CODE attribute: lvalue() at -e line 1.
>
> However, due to my newly adjusted code structure, the attribute value
> is parsed out and handled separately from parsing just the attribute
> name, and only causes an error if it's nonempty. Empty parens are
> permitted and just mean the same thing as their total absence.
>
> $ ./perl -E 'sub x :lvalue { }'
>
> $ ./perl -E 'sub x :lvalue() { }'
>
> $ ./perl -E 'sub x :lvalue(xx) { }'
> Subroutine attribute lvalue does not take a value at -e line 1.
>
>
> As this is only a change of wording of error messages in
> compiletime-error cases, plus a *slight* expansion of what is
> considered valid, now not failing on what used to fail, I don't feel
> that this is a huge departure of behaviour.
>
> I'm not totally fixed on this though, I could perhaps be convinced to
> treat them differently, but I do like the similarity of the fact of
> calling regular functions - a call without parens at all is the same as
> empty parens, and the invoked function can't tell.
>
> Any opinions from any folks who're likely to comment at PR-time? :)
>
> --
> Paul "LeoNerd" Evans
>
> leonerd@leonerd.org.uk | https://metacpan.org/author/PEVANS
> http://www.leonerd.org.uk/ | https://www.tindie.com/stores/leonerd/
>