Mailing List Archive

Finding nonoptional dependencies.
HI folks,

I was wondering if anyone else has any experience with mass scanning for
non-optional dependencies. That is using some non-perl compilation
technique to extract all the requires/uses and functions that wrap them
(notably with() from Moose) from a file.

So far I am using PPI as it seems to be the fastest and most robust
strategy. I tried PPR, but it goes insane when it encounters certain
constructs which unfortunately we find in our code, like modules which
initialize a hash with 50k static entries, or modules that use new syntax
->%* or function parameters or whatnot. PPI is much much faster for these
modules in my testing.

My definition of non-optional dependencies is any use/require/with which
would execute as a top level statement in the package, and BEGIN or use
statements anywhere in the file. (eg, pretty much what would be required by
doing a simple perl -MModule -e1)

An additional complication is extracting the dependency part without
evaluating things like data structures or anonymous subroutines in with()
declarations, i also need to deal with things like use parent, and to
ignore use if clauses, etc.

Essentially I want to statically analyze a piece of code for the main way
people declare dependencies. I dont need to deal with things like an
arbitrary sub that sets up dependencies, I dont need a solution to the
halting problem. It just needs to be good enough, eg I dont need to handle
this type of pattern:

arbitrary_sub_that_requires_stuff();
1
__END__

In a longer run IMO Mooses with() statement introduces an interesting
issue. We all know that:

require Foo;

will require something, but there is no formal way to attribute a function
so that a programmer can declare "treat this like a require". So if a new
OO framework offers it own new syntactic sugar to declare dependencies then
it has to be "discovered" and added to some parsers list or whatnot. It
might be nice to have a way for folks to declare which subs are require
wrappers. Maybe its too niche a problem, but it would sure be helpful.

Anyway, I realize this is an unusual problem. I need to scan a very large
codebase in the range of 40k-200k files for dependencies declarations. If
anybody has experience with doing this at high speed and high reliability
id be very interested.

What i would love is way to use perl itself to do this. Eg, maybe doing
perl --dump-deps $file.pl

would output a list of module names or something. But I have a feeling that
aint likely to happen.

Any thoughts or advice welcome.

cheers,
Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"
Re: Finding nonoptional dependencies. [ In reply to ]
On Sat, Oct 30, 2021 at 10:44 AM demerphq <demerphq@gmail.com> wrote:

> HI folks,
>
> I was wondering if anyone else has any experience with mass scanning for
> non-optional dependencies. That is using some non-perl compilation
> technique to extract all the requires/uses and functions that wrap them
> (notably with() from Moose) from a file.
>
> So far I am using PPI as it seems to be the fastest and most robust
> strategy. I tried PPR, but it goes insane when it encounters certain
> constructs which unfortunately we find in our code, like modules which
> initialize a hash with 50k static entries, or modules that use new syntax
> ->%* or function parameters or whatnot. PPI is much much faster for these
> modules in my testing.
>
> My definition of non-optional dependencies is any use/require/with which
> would execute as a top level statement in the package, and BEGIN or use
> statements anywhere in the file. (eg, pretty much what would be required by
> doing a simple perl -MModule -e1)
>
> An additional complication is extracting the dependency part without
> evaluating things like data structures or anonymous subroutines in with()
> declarations, i also need to deal with things like use parent, and to
> ignore use if clauses, etc.
>
> Essentially I want to statically analyze a piece of code for the main way
> people declare dependencies. I dont need to deal with things like an
> arbitrary sub that sets up dependencies, I dont need a solution to the
> halting problem. It just needs to be good enough, eg I dont need to handle
> this type of pattern:
>
> arbitrary_sub_that_requires_stuff();
> 1
> __END__
>
> In a longer run IMO Mooses with() statement introduces an interesting
> issue. We all know that:
>
> require Foo;
>
> will require something, but there is no formal way to attribute a function
> so that a programmer can declare "treat this like a require". So if a new
> OO framework offers it own new syntactic sugar to declare dependencies then
> it has to be "discovered" and added to some parsers list or whatnot. It
> might be nice to have a way for folks to declare which subs are require
> wrappers. Maybe its too niche a problem, but it would sure be helpful.
>
> Anyway, I realize this is an unusual problem. I need to scan a very large
> codebase in the range of 40k-200k files for dependencies declarations. If
> anybody has experience with doing this at high speed and high reliability
> id be very interested.
>
> What i would love is way to use perl itself to do this. Eg, maybe doing
> perl --dump-deps $file.pl
>
> would output a list of module names or something. But I have a feeling
> that aint likely to happen.
>
> Any thoughts or advice welcome.
>
> cheers,
> Yves
>
> --
> perl -Mre=debug -e "/just|another|perl|hacker/"
>

I think you just described Perl::PrereqScanner

Leon
Re: Finding nonoptional dependencies. [ In reply to ]
On Sat, 30 Oct 2021 at 23:59, Leon Timmermans <fawaka@gmail.com> wrote:

> On Sat, Oct 30, 2021 at 10:44 AM demerphq <demerphq@gmail.com> wrote:
>
>> HI folks,
>>
>> I was wondering if anyone else has any experience with mass scanning for
>> non-optional dependencies. That is using some non-perl compilation
>> technique to extract all the requires/uses and functions that wrap them
>> (notably with() from Moose) from a file.
>>
>> So far I am using PPI as it seems to be the fastest and most robust
>> strategy. I tried PPR, but it goes insane when it encounters certain
>> constructs which unfortunately we find in our code, like modules which
>> initialize a hash with 50k static entries, or modules that use new syntax
>> ->%* or function parameters or whatnot. PPI is much much faster for these
>> modules in my testing.
>>
>> My definition of non-optional dependencies is any use/require/with which
>> would execute as a top level statement in the package, and BEGIN or use
>> statements anywhere in the file. (eg, pretty much what would be required by
>> doing a simple perl -MModule -e1)
>>
>> An additional complication is extracting the dependency part without
>> evaluating things like data structures or anonymous subroutines in with()
>> declarations, i also need to deal with things like use parent, and to
>> ignore use if clauses, etc.
>>
>> Essentially I want to statically analyze a piece of code for the main way
>> people declare dependencies. I dont need to deal with things like an
>> arbitrary sub that sets up dependencies, I dont need a solution to the
>> halting problem. It just needs to be good enough, eg I dont need to handle
>> this type of pattern:
>>
>> arbitrary_sub_that_requires_stuff();
>> 1
>> __END__
>>
>> In a longer run IMO Mooses with() statement introduces an interesting
>> issue. We all know that:
>>
>> require Foo;
>>
>> will require something, but there is no formal way to attribute a
>> function so that a programmer can declare "treat this like a require". So
>> if a new OO framework offers it own new syntactic sugar to declare
>> dependencies then it has to be "discovered" and added to some parsers list
>> or whatnot. It might be nice to have a way for folks to declare which subs
>> are require wrappers. Maybe its too niche a problem, but it would sure be
>> helpful.
>>
>> Anyway, I realize this is an unusual problem. I need to scan a very large
>> codebase in the range of 40k-200k files for dependencies declarations. If
>> anybody has experience with doing this at high speed and high reliability
>> id be very interested.
>>
>> What i would love is way to use perl itself to do this. Eg, maybe doing
>> perl --dump-deps $file.pl
>>
>> would output a list of module names or something. But I have a feeling
>> that aint likely to happen.
>>
>> Any thoughts or advice welcome.
>>
>> cheers,
>> Yves
>>
>> --
>> perl -Mre=debug -e "/just|another|perl|hacker/"
>>
>
> I think you just described Perl::PrereqScanner
>

Ill give it a comparison to how I am doing things.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"
Re: Finding nonoptional dependencies. [ In reply to ]
On Sun, 31 Oct 2021 at 08:43, demerphq <demerphq@gmail.com> wrote:

> On Sat, 30 Oct 2021 at 23:59, Leon Timmermans <fawaka@gmail.com> wrote:
>
>>
>> I think you just described Perl::PrereqScanner
>>
>
> Ill give it a comparison to how I am doing things.
>

The main difference (besides that I bet the CPAN module is more robust than
my scribblings) seems to be the approach to handling this:

sub whatever {
require Something;
Something::doSomething();
}

For me this is an optional dependency I dont want to know directly about,
or if I do want to know about it want it separated from harder deps since
if nothing calls whatever() then Something wouldn't be loaded. If the
require was changed to a use tho then it should be picked up, as it should
if it was in a BEGIN block in a sub.

Thanks a lot for the recommendation, it is very useful to compare the two
tools, so far it has helped me find some issues in my tool, even if it
PrereqScanner picks up things I don't want its a great cross check. I
expect my list to be a subset of its list every time.

Once I get my head more wrapped around PPI ill take a stab at teaching
PrereqScanner how to do it my way. IMO both have value.

Thanks again!

cheers,
Yves


--
perl -Mre=debug -e "/just|another|perl|hacker/"