Mailing List Archive

Can anyone help with one failing ASAN test?
Hi all,

My feature-class branch PR is nearing readiness. All the CI machines
pass on it, except for the "ASAN (-DDEBUGGING)" one. Example

https://github.com/Perl/perl5/actions/runs/3918368246/jobs/6698783745

It has a single failure, in t/lib/croak.t:

FILE: lib/croak/class ; line 91
PROG:
use strict;
no warnings 'experimental::class';
use feature 'class';
class XXX {
field $x = $self + 1;
}
EXPECTED:
Global symbol "$self" requires explicit package name (did you forget to declare "my $self"?) at - line 5.
Execution of - aborted due to compilation errors.
EXIT STATUS: != 0
GOT:
Global symbol "$self" requires explicit package name (did you forget to declare "my $self"?) at - line 5.
Execution of - aborted due to compilation errors.
perl: op.c:599: void Perl_opslab_force_free(OPSLAB *): Assertion `savestack_count == slab->opslab_refcnt-1' failed.
Aborted (core dumped)
EXIT STATUS: 134
# Failed test 13 - test from lib/croak/class at line 91 at lib/croak/class line 91

In more detail, this is a croak test where it's *expecting* that the
perl under test throws an error. The error that comes back is what it
was expecting but has one additional detail:

perl: op.c:599: void Perl_opslab_force_free(OPSLAB *): Assertion `savestack_count == slab->opslab_refcnt-1' failed.
Aborted (core dumped)

At first glance it looks like it's complaining at op_free() time, when
it's failing to properly tidy up the optree after this failure. But
I've had a good stare around the code generally and I think I'm doing
all the right things. There doesn't seem to be anything missing.

For some context: There are two new optypes involved in the object
system - OP_METHSTART and OP_INITFIELD; both of which are of UNOP_AUX
type. Each one has an allocated aux vector, which gets freed. In each
case, the aux vector itself contains a bunch of integers and other
fields, but no other pointers that need freeing.

Can anyone see anything here I might have missed?

--
Paul "LeoNerd" Evans

leonerd@leonerd.org.uk | https://metacpan.org/author/PEVANS
http://www.leonerd.org.uk/ | https://www.tindie.com/stores/leonerd/
Re: Can anyone help with one failing ASAN test? [ In reply to ]
On 1/15/23 10:09, Paul "LeoNerd" Evans wrote:
> Hi all,
>
> My feature-class branch PR is nearing readiness. All the CI machines
> pass on it, except for the "ASAN (-DDEBUGGING)" one. Example
>
> https://github.com/Perl/perl5/actions/runs/3918368246/jobs/6698783745
>
> It has a single failure, in t/lib/croak.t:
>
> FILE: lib/croak/class ; line 91
> PROG:
> use strict;
> no warnings 'experimental::class';
> use feature 'class';
> class XXX {
> field $x = $self + 1;
> }
> EXPECTED:
> Global symbol "$self" requires explicit package name (did you forget to declare "my $self"?) at - line 5.
> Execution of - aborted due to compilation errors.
> EXIT STATUS: != 0
> GOT:
> Global symbol "$self" requires explicit package name (did you forget to declare "my $self"?) at - line 5.
> Execution of - aborted due to compilation errors.
> perl: op.c:599: void Perl_opslab_force_free(OPSLAB *): Assertion `savestack_count == slab->opslab_refcnt-1' failed.
> Aborted (core dumped)
> EXIT STATUS: 134
> # Failed test 13 - test from lib/croak/class at line 91 at lib/croak/class line 91
>
> In more detail, this is a croak test where it's *expecting* that the
> perl under test throws an error. The error that comes back is what it
> was expecting but has one additional detail:
>
> perl: op.c:599: void Perl_opslab_force_free(OPSLAB *): Assertion `savestack_count == slab->opslab_refcnt-1' failed.
> Aborted (core dumped)
>
> At first glance it looks like it's complaining at op_free() time, when
> it's failing to properly tidy up the optree after this failure. But
> I've had a good stare around the code generally and I think I'm doing
> all the right things. There doesn't seem to be anything missing.
>
> For some context: There are two new optypes involved in the object
> system - OP_METHSTART and OP_INITFIELD; both of which are of UNOP_AUX
> type. Each one has an allocated aux vector, which gets freed. In each
> case, the aux vector itself contains a bunch of integers and other
> fields, but no other pointers that need freeing.
>
> Can anyone see anything here I might have missed?
>

I get different results depending on what envvars I set before running
the test file.

#####
$ uname -mrs
Linux 5.10.0-18-amd64 x86_64

$ ./perl -Ilib -V:config_args
config_args='-des -Dusedevel -Dcc=clang -fsanitize=address -Dld=clang
-fsanitize=address -DDEBUGGING';
#####

No special envvars:

#####
$ cd t; ./perl harness t/lib/croak.t; cd -
lib/croak.t .. ok
All tests successful.
Files=1, Tests=215, 5 wallclock secs ( 0.12 usr 0.01 sys + 1.97 cusr
2.37 csys = 4.47 CPU)
Result: PASS
#####

2 envvars as per our CI setup

#####
$ cd t; ASAN_OPTIONS=detect_leaks=0 PERL_DESTRUCT_LEVEL=2 ./perl harness
t/lib/croak.t; cd -
lib/croak.t .. 1/215 FILE: lib/croak/class ; line 91
PROG:
use strict;
no warnings 'experimental::class';
use feature 'class';
class XXX {
field $x = $self + 1;
}
EXPECTED:
Global symbol "$self" requires explicit package name (did you forget to
declare "my $self"?) at - line 5.
Execution of - aborted due to compilation errors.
EXIT STATUS: != 0
GOT:
Global symbol "$self" requires explicit package name (did you forget to
declare "my $self"?) at - line 5.
Execution of - aborted due to compilation errors.
perl: op.c:599: void Perl_opslab_force_free(OPSLAB *): Assertion
`savestack_count == slab->opslab_refcnt-1' failed.
Aborted
EXIT STATUS: 134
# Failed test 13 - test from lib/croak/class at line 91 at
lib/croak/class line 91
lib/croak.t .. 184/215 #
# Note: 'run_multiple_progs' run has one or more failures
# you can consider setting the environment variable
# PERL_TEST_ABORT_FIRST_FAILURE=1 before running the test
# to stop on the first error.
#
lib/croak.t .. Failed 1/215 subtests
(less 3 skipped subtests: 211 okay)

Test Summary Report
-------------------
lib/croak.t (Wstat: 0 Tests: 215 Failed: 1)
Failed test: 13
Files=1, Tests=215, 3 wallclock secs ( 0.11 usr 0.02 sys + 1.32 cusr
1.73 csys = 3.18 CPU)
Result: FAIL
#####

Set only ASAN_OPTIONS (same results as no envvars):

#####
$ cd t; ASAN_OPTIONS=detect_leaks=0 ./perl harness t/lib/croak.t; cd -
lib/croak.t .. ok
All tests successful.
Files=1, Tests=215, 3 wallclock secs ( 0.11 usr 0.01 sys + 1.28 cusr
1.81 csys = 3.21 CPU)
Result: PASS
#####

Set only PERL_DESTRUCT_LEVEL; 15 failures, here showing only the first.

#####
$ cd t; PERL_DESTRUCT_LEVEL=2 PERL_TEST_ABORT_FIRST_FAILURE=1 ./perl
harness t/lib/croak.t; cd -
FILE: lib/croak/class ; line 2
PROG:
# Method calls on no args
no warnings 'experimental::class';
use feature 'class';
class XXX { method m { } }
XXX::m()
EXPECTED:
Cannot invoke method on a non-instance at - line 5.
EXIT STATUS: != 0
GOT:
Cannot invoke method on a non-instance at - line 5.

=================================================================
==433845==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 56 byte(s) in 1 object(s) allocated from:
#0 0x4b1bdd in malloc (/home/jkeenan/gitwork/perl/perl+0x4b1bdd)
#1 0x7b2fa6 in Perl_safesysmalloc
/home/jkeenan/gitwork/perl/util.c:161:21

SUMMARY: AddressSanitizer: 56 byte(s) leaked in 1 allocation(s).
EXIT STATUS: 1
PERL_TEST_ABORT_FIRST_FAILURE set Test Failure at ./test.pl line 1508.
Compilation failed in require at lib/croak.t line 7.
# Looks like you planned 215 tests but ran 0.
lib/croak.t .. Dubious, test returned 1 (wstat 256, 0x100)
Failed 215/215 subtests

Test Summary Report
-------------------
lib/croak.t (Wstat: 256 (exited 1) Tests: 0 Failed: 0)
Non-zero exit status: 1
Parse errors: Bad plan. You planned 215 tests but ran 0.
Files=1, Tests=0, 0 wallclock secs ( 0.03 usr 0.01 sys + 0.07 cusr
0.04 csys = 0.15 CPU)
Result: FAIL
#####

I don't know enough about either ASAN or PERL_DESTRUCT_LEVEL to comment
further.
Re: Can anyone help with one failing ASAN test? [ In reply to ]
On Sun, 15 Jan 2023 at 16:55, James E Keenan <jkeenan@pobox.com> wrote:
> I get different results depending on what envvars I set before running
> the test file.

James: Nice observation, I can reproduce with a normal debugging perl
and PERL_DESTRUCT_LEVEL=2. ASAN isn't required.

LeoNerd: I added some debug and the "savestack_count" is 0, but the
refcount is 2, one is owned by the CV that is being freed. So where
did the extra refcount come from? It seems to suggest that something
in the slab is being freed even though it is still in the save stack.
Is it possible your code saves something in the save stack and then
frees it anyway instead of letting it be freed by the save stack
mechanism? Then this logic fires on teardown, and sees the
discrepancy? I am guessing this is to detect case like that.

cheers,
Yves
Re: Can anyone help with one failing ASAN test? [ In reply to ]
On Sun, 15 Jan 2023 17:38:58 +0100
demerphq <demerphq@gmail.com> wrote:

> LeoNerd: I added some debug and the "savestack_count" is 0, but the
> refcount is 2, one is owned by the CV that is being freed.

Yeah, I see the same.

(Also it makes me want a nicer macro so we can
assert_eq(LEFT, RIGHT);
and have it print the two values if it fails. But eh...)

> So where
> did the extra refcount come from? It seems to suggest that something
> in the slab is being freed even though it is still in the save stack.
> Is it possible your code saves something in the save stack and then
> frees it anyway instead of letting it be freed by the save stack
> mechanism? Then this logic fires on teardown, and sees the
> discrepancy? I am guessing this is to detect case like that.

I don't have any uses of SAVEFREEOP anywhere around.

But, reading the code in op.c closer, it doesn't appear to be
inspecting the save stack at all. It accumulates the `savestack_count`
by directly looking over the ops themselves, for ops whose type is not
OP_FREED and which don't have the op_savefree flag set on them. It
counts none of those.

But then it finds the slab's refcount is still too high.

This suggests that I'm just plain losing an op; something somehow
getting entirely lost and not freed at all. I wonder how best I can
debug this part... valgrind won't be much help due to the opslabs.

--
Paul "LeoNerd" Evans

leonerd@leonerd.org.uk | https://metacpan.org/author/PEVANS
http://www.leonerd.org.uk/ | https://www.tindie.com/stores/leonerd/
Re: Can anyone help with one failing ASAN test? [ In reply to ]
On Sat, 28 Jan 2023 23:34:07 +0000
"Paul \"LeoNerd\" Evans" <leonerd@leonerd.org.uk> wrote:

> This suggests that I'm just plain losing an op; something somehow
> getting entirely lost and not freed at all. I wonder how best I can
> debug this part... valgrind won't be much help due to the opslabs.

Turns out at least, t/harness isn't required. This makes it much faster
to test:

$ PERL_DESTRUCT_LEVEL=2 ./perl -Ilib -Mv5.36 -Mfeature=class -e 'class XXX { field $x = $self + 1; }'
class is experimental at -e line 1.
field is experimental at -e line 1.
Global symbol "$self" requires explicit package name (did you forget to declare "my $self"?) at -e line 1.
Execution of -e aborted due to compilation errors.
panic: savestack_count==0 but (slab->opslab_refcnt-1)==1
perl: op.c:602: Perl_opslab_force_free: Assertion `savestack_count == slab->opslab_refcnt-1' failed.
Aborted

And as expected, valgrind doesn't have anything to go on:

...
==1059006== LEAK SUMMARY:
==1059006== definitely lost: 0 bytes in 0 blocks
==1059006== indirectly lost: 0 bytes in 0 blocks


I suspect any further debug is going to have to involve lots of trace
printing of when ops are allocated and freed, trying to pair them up...
Hoboy...

--
Paul "LeoNerd" Evans

leonerd@leonerd.org.uk | https://metacpan.org/author/PEVANS
http://www.leonerd.org.uk/ | https://www.tindie.com/stores/leonerd/
Re: Can anyone help with one failing ASAN test? [ In reply to ]
On Fri, 3 Feb 2023 19:32:03 +0000
"Paul \"LeoNerd\" Evans" <leonerd@leonerd.org.uk> wrote:

> I suspect any further debug is going to have to involve lots of trace
> printing of when ops are allocated and freed, trying to pair them
> up... Hoboy...

Aaaand with a bunch of not-exactly-easy-to-add warns added to key
places in op.c to print lots of pointer values for lining up later in a
log file, I find the following:

The eventual failure I get is:

panic: savestack_count==0 but (slab->opslab_refcnt-1)==1 for slab=55b674482740
perl: op.c:613: Perl_opslab_force_free: Assertion `savestack_count == slab->opslab_refcnt-1' failed.

This slab was created much earlier:

op.c:350 Perl_Slab_Alloc created new slab=55b674482740 for cv=55b6744479a8

I couldn't work out how to get perl to tell me anything more about that
CV but by some careful pokery involving Devel::MAT I learn:

pmat> show 0x55b6744479a8
CODE() at 0x55b6744479a8=main_cv with refcount 8

So,... yeah. This is a leaked op of the main_cv itself. It's really
hard to see how the code in my object class has caused this to leak,
yet here we are.

I then got it to print the op types of the remaining still-live ops in
the slab that have not yet been freed:

op still on the slab o=55b674482898 [o->op_type is 197/lineseq]

It's an OP_LINESEQ. Hrm. I wonder when that got allocated.

Over to class.c now and I do some more debug printing whenever I
allocate an OP_LINESEQ:

class.c:1014 allocated o=55c29572d778 [op_type=197/lineseq] while PL_compcv=55c2956f29a8=PL_main_cv
class.c:664 allocated o=55c2956f7948 [op_type=419/methstart] while PL_compcv=55c2959b4310
class.c:714 allocated o=55c2956f7908 [op_type=197/lineseq] while PL_compcv=55c2959b4310
class.c:781 allocated o=55c2956f7908 [op_type=197/lineseq] while PL_compcv=55c2959b4310

Well, ooops. There we have it. Line 1014 has screwed up.

Turns out I had to call class_set_field_defop *before* I LEAVE, so that
PL_compcv is still set right.

But even doing that didn't help. The previously-failing opslab
deallocation now succeded, but a different one failed, because the
lineseq op was still there instead. There was a second far more subtle
bug.

Much printing later found me the eventual problem. When you op_free(o);
it checks if (o->op_flags & OPf_KIDS) and if so tries iterating over
the kids of the op as well, so it can recurse. It does this in a while
loop on the kid pointer, but the problem here is that if I'd already
done

cLOOPOPo->op_first = cLOOPOPo->op_last = NULL;

then that loop will immediately exit and the op doesn't get freed.

The fix in my case was to remember to

o->op_flags &= ~OPf_KIDS;

but I can't help thinking this is quite a subtle cornercase behaviour
of op_free itself. Perhaps that ought to only bother if the op has the
KIDS flag *and* its ->op_first is non-NULL. Specifically, this line:

https://github.com/Perl/perl5/blob/blead/op.c#L901

maybe wants to test

if(!went_up && (o->op_flags & OPf_KIDS) && (cUNOPo->op_first))


Buuut anyhow, the originally-failing test should be fixed now and as I
write this the CI is busily retesting it again... am hoping for green
tick which may mean we can merge this feature-class branch. Woo \o/

--
Paul "LeoNerd" Evans

leonerd@leonerd.org.uk | https://metacpan.org/author/PEVANS
http://www.leonerd.org.uk/ | https://www.tindie.com/stores/leonerd/
Re: Can anyone help with one failing ASAN test? [ In reply to ]
On Fri, 3 Feb 2023 at 23:44, Paul "LeoNerd" Evans
<leonerd@leonerd.org.uk> wrote:
> Buuut anyhow, the originally-failing test should be fixed now and as I
> write this the CI is busily retesting it again... am hoping for green
> tick which may mean we can merge this feature-class branch. Woo \o/

Nice debugging, sounds like a PITA to get through. Congrats you got to
the bottom of it. Do you think there is anything you can write up from
your experience here, maybe in the docs for op_free() so the next
person doesn't have to relearn this? (If you already did then my
apologies.)

cheers,
Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"
Re: Can anyone help with one failing ASAN test? [ In reply to ]
On Sat, 4 Feb 2023 13:00:07 +0100
demerphq <demerphq@gmail.com> wrote:

> Do you think there is anything you can write up from
> your experience here, maybe in the docs for op_free() so the next
> person doesn't have to relearn this? (If you already did then my
> apologies.)

Yeah I've been thinking about it today. I think at least it probably
wants an assert() being added. That would alert the caller to bad
things happening (at least in the -DDEBUGGING case), without changing
the current behaviour.

Alternatively, the code should only descend if both OPf_KIDS and
op_first != NULL.

I can't quite decide which of these would be best. I've written it up
on an issue, so we can discuss and vote on it.

https://github.com/Perl/perl5/issues/20764

--
Paul "LeoNerd" Evans

leonerd@leonerd.org.uk | https://metacpan.org/author/PEVANS
http://www.leonerd.org.uk/ | https://www.tindie.com/stores/leonerd/