Mailing List Archive

PATCH: modifying elements in a foreach loop
I just got bitten by the problem foreach (and grep, map, and sub
calls) have changing elements in an array. (Namely that sv_undef
placeholders produce "Modification of a read-only value" error.) I've
fixed this by forcing the elements to become real in such situations.

Of course, this may cost if someone loops over a sparse array
*without* modifying elements, but I doubt this happens too often.
Getting it working right seems more important anyway.

(Example:

$unk = "unknown0000";
$row[2] = "xsize";
foreach $elt (@row) {
$elt = ++$unk unless defined $elt;
}
)


Mark


*** pp_hot.c.bak Wed Aug 30 13:05:09 1995
--- pp_hot.c Wed Aug 30 13:22:05 1995
***************
*** 378,387 ****
}

if (GIMME == G_ARRAY) {
! I32 maxarg = AvFILL(av) + 1;
! EXTEND(SP, maxarg);
! Copy(AvARRAY(av), SP+1, maxarg, SV*);
! SP += maxarg;
}
else {
dTARGET;
--- 378,407 ----
}

if (GIMME == G_ARRAY) {
! if (op->op_flags & OPf_MOD) {
! I32 maxarg = AvFILL(av) + 1;
! I32 key;
! SV** ary;
!
! EXTEND(SP, maxarg);
! ary = AvARRAY(av);
! for (key = 0; key < maxarg; ++key) {
! SV* it = ary[key];
!
! if (it == &sv_undef) {
! it = NEWSV(68,0);
! av_store(av,key,it);
! ary = AvARRAY(av);
! }
! *++SP = it;
! }
! }
! else {
! I32 maxarg = AvFILL(av) + 1;
! EXTEND(SP, maxarg);
! Copy(AvARRAY(av), SP+1, maxarg, SV*);
! SP += maxarg;
! }
}
else {
dTARGET;
*** pp.c.bak Wed Aug 30 13:40:09 1995
--- pp.c Wed Aug 30 13:42:51 1995
***************
*** 46,55 ****
RETURN;
}
if (GIMME == G_ARRAY) {
! I32 maxarg = AvFILL((AV*)TARG) + 1;
! EXTEND(SP, maxarg);
! Copy(AvARRAY((AV*)TARG), SP+1, maxarg, SV*);
! SP += maxarg;
}
else {
SV* sv = sv_newmortal();
--- 46,75 ----
RETURN;
}
if (GIMME == G_ARRAY) {
! if (op->op_flags & OPf_MOD) {
! I32 maxarg = AvFILL((AV*)TARG) + 1;
! I32 key;
! SV** ary;
!
! EXTEND(SP, maxarg);
! ary = AvARRAY((AV*)TARG);
! for (key = 0; key < maxarg; ++key) {
! SV* it = ary[key];
!
! if (it == &sv_undef) {
! it = NEWSV(68,0);
! av_store((AV*)TARG,key,it);
! ary = AvARRAY((AV*)TARG);
! }
! *++SP = it;
! }
! }
! else {
! I32 maxarg = AvFILL((AV*)TARG) + 1;
! EXTEND(SP, maxarg);
! Copy(AvARRAY((AV*)TARG), SP+1, maxarg, SV*);
! SP += maxarg;
! }
}
else {
SV* sv = sv_newmortal();
Re: PATCH: modifying elements in a foreach loop [ In reply to ]
: I just got bitten by the problem foreach (and grep, map, and sub
: calls) have changing elements in an array. (Namely that sv_undef
: placeholders produce "Modification of a read-only value" error.) I've
: fixed this by forcing the elements to become real in such situations.

We've talked this over before on the list, and I'm not sure this is a
service to the user. Either the sparse array is unintentional, and
they may want an error, or it's intentional, and they should probably
be testing elements for definedness anyway.

This also de-sparsifies arrays merely on spec, and in contexts where
the user might not expect it. This could chew up some memory the user
was expecting never to allocate, merely because the array was used
in a context that *might* modify the elements.

: Of course, this may cost if someone loops over a sparse array
: *without* modifying elements, but I doubt this happens too often.

It always costs extra if OPf_MOD is set, which apparently it always is
on foreach (though I'm not entirely sure when that snuck in). We might
be able to optimize this with a bit in the array that says it might be
sparse. Most arrays aren't.

: Getting it working right seems more important anyway.

You do keep saying that... :-)

But this doesn't really get it working "right" either, since it doesn't fix
parameters passed to a subroutine, which doesn't set OPf_MOD currently.

I think the best solution might be for each sparse array to have its
own clone-on-write version of undef, instead of using &sv_undef. Since
av.c has to deal with undefined elements anyway, and non-sparse arrays
wouldn't need such a magical scalar, there should be almost no overhead
on non-sparse arrays. The only difficulty might be that, by the time
we've decided we're writing an element of an array, we may have lost
the subscript.

Another possibility is to define a desparsify function that the user can
call themselves.

At the moment I'm still vaguely in favor of the status quo. In second place
is probably an AVf_SPARSE bit on the array.

Larry
Re: PATCH: modifying elements in a foreach loop [ In reply to ]
> From: Larry Wall <lwall@scalpel.netlabs.com>
>
> I think the best solution might be for each sparse array to have its
> own clone-on-write version of undef, instead of using &sv_undef. Since
> av.c has to deal with undefined elements anyway, and non-sparse arrays
> wouldn't need such a magical scalar, there should be almost no overhead
> on non-sparse arrays. The only difficulty might be that, by the time
> we've decided we're writing an element of an array, we may have lost
> the subscript.
>
I think that's probable.

> Another possibility is to define a desparsify function that the user can
> call themselves.
>
I think that would be useful (I've suggested overloading 'values' for this
in the past).

> At the moment I'm still vaguely in favor of the status quo.

I can live with that.

> Larry
>
Tim.
Re: PATCH: modifying elements in a foreach loop [ In reply to ]
On Wed, 30 Aug 1995, Larry Wall wrote:

% We've talked this over before on the list, and I'm not sure this is a
% service to the user. Either the sparse array is unintentional, and
% they may want an error, or it's intentional, and they should probably
% be testing elements for definedness anyway.

In the first case, the problem was unwanted undefs, not unwanted
sparse undefs, and would probably be caught by -w (use of
uninitialized value).

In the second case, testing for definedness assumes they don't want to
change the undefined elements.

There is a third case, when the array has undefs intentionally, but
just happens to be sparse. Here, trying to change undefined elements will
fail for no good reason, causing confusion and irritation.


% This also de-sparsifies arrays merely on spec, and in contexts where
% the user might not expect it. This could chew up some memory the user
% was expecting never to allocate, merely because the array was used
% in a context that *might* modify the elements.

This goes both ways. Currently Perl is creating weird errors merely
because the user *might* want the array to remain sparse.

Most users aren't going to understand why a few of the elements in a
list can't be modified, when they look like all the others. Come to
think of it, even I don't understand it, although I know why it
happens.

From the user's point of view, these "sparse" elements are simply
ordinary undefined elements. The fact that perl saves some memory
using them is simply a helpful optimization, and optimizations
shouldn't change code behavior.


% It always costs extra if OPf_MOD is set, which apparently it always is
% on foreach (though I'm not entirely sure when that snuck in). We might
% be able to optimize this with a bit in the array that says it might be
% sparse. Most arrays aren't.

I put that in when we were still in alpha, mostly to fix the behavior
of references, as I recall.

It's true that OPf_MOD always costs a little, but it's not a large
amount. It might be nice if an array knew it contained a reference to
&sv_undef, so the search for them could be skipped when possible. I
suspect that it wouldn't be worth the effort.


% : Getting it working right seems more important anyway.
%
% You do keep saying that... :-)

Yep. Correctness before efficiency. Then optimize the hell out of
it. :-) (E.g. it would be nice if Perl tried to identify when the
iterator could not be changed, and not call mod() in such cases.)


% But this doesn't really get it working "right" either, since it doesn't fix
% parameters passed to a subroutine, which doesn't set OPf_MOD currently.

That's weird---it worked when I put it in, and ck_subr() still calls
mod(). I wonder why it broke. <Sigh> Another bug.


% I think the best solution might be for each sparse array to have its
% own clone-on-write version of undef, instead of using &sv_undef. Since
% av.c has to deal with undefined elements anyway, and non-sparse arrays
% wouldn't need such a magical scalar, there should be almost no overhead
% on non-sparse arrays. The only difficulty might be that, by the time
% we've decided we're writing an element of an array, we may have lost
% the subscript.

That is the difficulty. A fatal one, I'm afraid.

The only reason to provide a separate copy-on-write undef for each
array is if you can't identify the array the element originates from
otherwise. But if you can't identify the array, how can you identify
the element?

% Another possibility is to define a desparsify function that the user can
% call themselves.

This is putting the onus on the wrong users. A protection function
that prevents an array from being desparsified would be better, I
think.

It's fine if someone wants to take advantage of Perl's array storage
to make a real sparse array, but I think it is on their heads to keep
it sparse, not everyone else's. Especially considering how rare
sparse arrays are.


Mark