Mailing List Archive

Perl Workflows and Hacking Perl Tips
I was asked to publish some of my workflows and recipes for building
perl. So here goes. This is a brain dump, so it is not perfectly
ordered or structured. If someone wants to use this as a basis for a
perlhack style document they are very welcome to do so.

* BUILDING MADE EASY *

First, I have several aliases set up in bash:

alias ccfg='./Configure -Dusethreads -Doptimize=-g -d -Dusedevel
-Dcc=ccache\ gcc -Dld=gcc -DDEBUGGING'
alias ccfg_nd='./Configure -Dusethreads -Doptimize=-O3 -d
-Dusedevel -Dcc=ccache\ gcc -Dld=gcc'
alias ccfg_nd_nt='./Configure -Doptimize=-O3 -d -Dusedevel
-Dcc=ccache\ gcc -Dld=gcc'
alias ccfg_nt='./Configure -Doptimize=-g -d -Dusedevel
-Dcc=ccache\ gcc -Dld=gcc -DDEBUGGING'
alias make_test='TEST_JOBS=16 make -j16 test_harness'

The above are the main tools I use to build blead perl in a git
checkout. The reason I have 4 aliases is because I normally build with
threads and with debugging, but from time to time I want to check how
things build without threads or without debugging or both. Thus I have
the _nd suffix for no-debug, the _nt suffix for no-threads and a
_nd_nt suffix for no-threads and no-debugging. The other handy thing
about these aliases is that I can always type 'alias' into bash and
get a "Configure recipe" that I can then modify for other purposes.

I generally recommend developers compile WITH threads when they are
hacking on core, even if they are totally uninterested in threads and
would typically run a non-threaded perl in production. The reason is
that when you work on a non-threaded build you can very easily write
code that will fail to compile when built with threads, as omitting a
pTHX_ from a function signature, or omitting an aTHX_ from a function
call will not be a problem without threads, but will throw compile
time errors when you build with it. So always do your perl core
hacking with threads enabled, it saves a lot of headaches when you
think you are done.

The other important thing is to ensure you build with -g when you are
debugging, as otherwise gdb lacks the required line number data to
give your proper insight into what is going wrong.

The make_test alias ensures that the build and tests are run in
parallel with 16 processes which seems to be the optimum for my
laptop. Adding more processes seems to slow things down. The other
thing to note is that I test with *harness* as this is the mode that
supports parallel builds, 'make test' does not.

In general my workflow is something like:

cd ~/git_tree/perl # i keep all my git repos in the directory ~/git_tree
git pull --rebase # rebase is a noop if I have no local changes,
and is helpful if I do.
ccfg # note the interactive prompt in the Configure process - I
never bothered to find out how to disable that.
make_test # build and test with 16 processes

Generally I can skip the ccfg as it has been done previously unless
the perl version has changed or certain significant changes have been
made which confuse the build process. Running

git clean -dfx # warning dangerous, do not run with unadded new files!

will reset the build back to empty state where I would then run `ccfg`
again. I occasionally do this anyway just to make sure that there is
nothing subtle going wrong. Our build process is pretty good about
building after a git pull, but occasionally it does the wrong thing
and a clean start is always a good way to be sure you arent missing
something.

* TESTING *

Because I hack on the regex engine fairly regularly I also often run this:

TEST_JOBS=16 make -j16 test_reonly

so that I can test JUST the regex engine. I haven't added this to my
aliases as I have it in my command history and I havent gotten around
to it. :-) The regex engine is a critical part of perl, if it is
significantly broken but still compiles then it will often fail when
it runs 'miniperl autodoc.pl' so I also have this in my command
history:

gdb --args ./miniperl -Ilib autodoc.pl

which typically allows me to find whatever I screwed up in the regex
engine. For subtle regex bugs the build process will complete and I
will have a proper perl to test with test_reonly.

Another command I run regularly, especially after modifying anything
with pod in it (and when i remember) is:

TEST_JOBS=16 make -j16 test_porting

which runs the porting tests only. This is the command you want to run
if you have done a doc fix and don't want to wait for every test to
complete.

Sometimes after a full test one or two files will have failed and I
want to run them again individually and in verbose mode. To do so I
use the TEST_ARGS functionality in our build stack. (Note that
historically Win32 used a different env var for this, supporting two
env vars TEST_SWITCHES and TEST_FILES, these days it also supports
TEST_ARGS just like *nix does, however *nix does not support
TEST_SWITCHES and TEST_FILES.)

make -j16 test_harness TEST_ARGS='-v -re PATTERN'

where PATTERN is a *regex* (NOT glob) pattern representing a unique
part of the file name. The -re option runs a regex over the list of
test files that harness knows about. You can use any regex (remember
that bash quoting rules affect the pattern so be careful with \w+
style metapatterns) and you can list -re multiple times. You can also
use -nre to filter tests out, and you may combine them as well. The
regexp matches on the *full* file name relative to the t/ directory so
you can filter by directory name or whatnot.

The reason I use the -re option (and the reason I added it in the
first place) is it means I dont have to remember or type the full path
to the test file, just some unique part of its name, and I dont have
to remember that test files in harness are named relative to the t/
directory. We have tests in the t/ directory and lib/ and cpan/ or
dist/ or ext/ directories which means some tests you need to add "../"
to the front, and some you need to remove the "t/" from the front, I
always forget one or the other so using -re save me some keyboard
rage.

Less often I want to run the tests just like `make test_harness` would
but without building first. In which case I do this

TEST_JOBS=16 TEST_ARGS='-re 099_binary' TESTFILE=harness ./runtests choose;

I do this MUCH less often as whenever I do I end up modifying the code
and then wondering why I havent seen the effect of my change. But for
some tasks it speeds things up quite a bit.

In some cases I run core tests directly with something like:

./perl -Ilib t/re/regexp.t

but usually this only works properly with tests in the t/ directory,
and even then sometimes I have to patch the test to run properly from
the root dir, as some have been hard coded to expect to be run from
the t/ directory, which I consider to be a bug. I often push patches
making the test files run from the root dir. I never do this with
ext/cpan/dist as it just doesn't work. :-) Actually I find specific
testing of files in the ext/dist/cpan directories to be the biggest
pain, and I always run them via one of the TEST_ARGS based recipes.

* REGEN *

Another thing to be aware of is 'make regen'. This uses a pre
installed perl to regenerate many of our generated files. Generally
you should NOT run regen.pl with the perl you are working with, if it
is broken you will make the situation much worse. This goes for the
files it runs, all of which live in the regen/ directory.

* BUILDING FOR BLEAD BREAKS CPAN TICKETS *

Another case that comes up fairly regularly is BBC (Blead Breaks CPAN)
tickets. With these the process is a bit more complex. Sometimes for
EUMM based builds with minimal dependencies I can get by without
installing blead at all. And I can do something like

cd ~/git_tree/$dist_name/
/home/yorton/perl/perl -I/home/yorton/perl/lib Makefile.PL # note that
absolute paths required here!

But if there are CPAN deps or if it uses Module::Build or Dist::Zilla
I must install the perl first. For this I rely on the perlbrew
toolset. I have used it for years and in all the time I only have two
complaints about it both of which I will explain below but which are
extremely minor. An excellent piece of software I must say.

To do this I follow a recipe like this:

cd ~/git_tree/perl
git pull
perlbrew uninstall latest_blead
perlbrew install --notest --noman --as latest_blead --debug --thread
--multi -Doptimize=-g -Dusedevel -j 16 ./

The need for the "uninstall" step is one of my two complaints about
perlbrew. If you accidentally run the install step with a name that
already exists perlbrew does something, but the end result is not the
perl you wanted, and the -f flag doesnt fix anything, so always
explicitly run the uninstall first. NO i havent filed a ticket about
this or anything, it is just something that I work around since I have
been bitten by it. Configuring bash for large command history logs is
very helpful for stuff like this. You can set that up with something
like this in your .bashrc:

# don't put duplicate lines in the history. See bash(1) for more options
# ... or force ignoredups and ignorespace
export HISTCONTROL=erasedups:ignorespace

# for setting history length see HISTSIZE and HISTFILESIZE in bash(1)
export HISTSIZE=100000
export HISTFILESIZE=100000

Sometimes I need to test against a specific public release instead of
blead. To do that I use:

cd ~/git_tree/perl
git reset --hard v5.37.8 # warning potentially destructive!
perlbrew install --notest --noman --as perl-5.37.8 --debug
--thread --multi -Doptimize=-g -Dusedevel -j 16 ./

Adjust the arguments to perlbrew install to taste.

Sometimes you want to build and install with a specific define
enabled, to support options that Configure doesn't know about. For
instance, recently I was debugging Copy-On-Write (COW) and how it
interacts with the regex engine, and I wanted to build with
PERL_SAWAMPERSAND defined and not defined. So I did this:

perlbrew install --notest --noman --as blead_NSA -Doptimize=-O2
-Dusedevel -j 16 ./
perlbrew install --notest --noman --as blead_SA -Doptimize=-O2
-Accflags=-DPERL_SAWAMPERSAND -Dusedevel -j 16 ./

Note the spelling on that last line. The arguments you have to provide
to configure to enable a define are relatively cryptic and at least in
the past the Configure and INSTALL docs were not that clear on this
point (ISTR I went through and fixed a bunch of the cases in INSTALL)
if you want to add a define that the compiler will see (perhaps based
on a comment or whatnot in the code saying "define PERL_FOO to disable
this" or something like that), you DON'T pass Configure a -D option
directly. You MUST pass it as

--Accflags=-DPERL_FOO

instead, otherwise Configure will silently (at least in the past) eat
the argument, do nothing, and leave you frustrated and wondering what
you did wrong. So you must remember the --Accflags= option prefix is
*required*.

I usually do these kinds of installs with --notest and --noman. I do
this because I tend to not need the docs to be installed as I have
them already in the repo, and I do not bother testing because I have
usually tests the perl already in my git checkout with 'make_test'
documented above. I only install when I have a tested perl anyway, and
having perlbrew do the tests just gets in the way.

* INSTALLING A CORPUS OF HISTORICAL PERLS *

Another task that may be convenient is to pre-populate perlbrew with
many historical versions. For instance I have these commands in my
history to do bulk installs with perlbrew:

perlbrew install-multiple -j 8 -n $(perlbrew available| grep 5.32)
--both thread --both debug
perlbrew install-multiple -j 8 -n $(perlbrew available) --both
thread --both debug

This does not require a git install. This is the other place I have a
minor complaint about perlbrew, there is no way to tail the full build
process with install-multiple. Each build uses its own log file, and
you need to explicitly name the log file to tail it. I rather wish
that it would create a symlink to the latest logfile which could be
tailed with -F. [.I have a hazy recollection that I filed a bug report
and patch to do this, but either i misremember or it was rejected.]
This process tends to take many hours, its a good one for right before
bed.

* USING CPANM FOR BBC TICKETS *

From there I tend to use `cpanm` to test and build the modules. One
thing you have to remember is that if you do:

perlbrew use latest_blead
cpanm --look Some::Module

then you MUST execute the

perlbrew use latest_blead

again once cpanm gives you a prompt. It does not respect the "perlbrew
use" from the previous shell. [.I think this is a bug, but I havent
filed a ticket as Its easier to work around than it is to file the
ticket.]

Using cpanm like this tends to result in the dependencies for the
module being preinstalled. But sometimes you have to manually add
specific dependencies.

* RESOLVING BBC TICKETS *

To resolve a BBC ticket there are two cases (and occasionally
sometimes both apply), a) you have to change perl, and b) you have to
change the module.

For case a) where I need to change perl I go back to my perl git repo
and hack there, and then repeat the full uninstall/install process to
get back to a proper installed build. (I do not use shared modules,
and I do not mix installed modules from one perl to another, that has
been a recipe for wasted time and frustration in my experience.)

For case b) I visit metacpan check for a repository link, and then
fork the repo to my namespace, and then clone that to my laptop. I
then write my patch in place. Test it and then create a PR for the
owner fo the repo to apply. If there is no repo then I hack inside of
a cpanm --look session and file an RT ticket with my patch in it. Or
sometimes for this case I just describe what the owner needs to do.
Some people recommend ONLY doing the cpanm --look session (or
equivalent). But I dislike this approach, more than once I have spent
hours patching a module only to discover that the author has made
conflicting changes and much of my time fixing the bug has been
wasted. Some authors just reject conflicting changes, others will try
to merge. My experience is that if a CPAN module does not have a
github (or equivalent) repo, the owner is either a) a super-hacker who
dislikes github or dislikes sharing their repo for some reason, in
which case a description of what they need to fix will suffice, or it
is abandonware and writing a patch is a waste of time as the author
wont respond anyway. Like any generalization however there are always
exceptions to the rule. Some people do not use github and are
responsive to tickets and patches, YMMV.

Generally for modules using EUMM the overhead for fixing the module is
low, a few seconds to at most a few minutes. When the module uses
Module::Build it is longer, 5-15 minutes maybe, and for Dist::Zilla it
can be over an hour to install all the dependencies needed to build.
In another thread people have recommended approaches to minimizing
this overhead but I have not tried them yet so I am not documenting
them here. One very helpful trick that speeds up the process for all
cases is from Leon T. who suggests running cpanm like this example
from my command history.

HARNESS_OPTIONS=j16 cpanm Dist::Zilla::Plugin::PrereqsClean

This causes cpanm/Test::Harness to run the tests in parallel and can
massively speed up cpanm installations. Unfortunately test_harness and
Test::Harness do not understand the same options, so you need to
remember that HARNESS_OPTIONS=j16 for Test::Harness is equivalent to
TEST_JOBS=16 for `make test_harness`. IMO it would be nice if
Test::Harness respected TEST_JOBS, or maybe vice versa and
test_harness respected HARNESS_OPTIONS - a nice opportunity for
someone to hack up a patch if they want to help contribute.

NOTE: To repeat, others have recommended what they consider to be
better processes for the above, which automate or ameliorate some of
the process above. I havent had any direct experience with them yet so
I have not documented there here. That does not mean they are not good
ideas, and I might follow up on this mail as I gain experience with
them.

* ADDING A C FUNCTION TO PERL *

First you need to decide where to put the function. Usually this is a
no brainer. Functions related to arrays go into av.c, those related to
hashes go into hv.c, those related to the regex engine go into one of
the re* files (depends if it is compile time or run time or what),
functions related to "magic" go into mg.c, etc. The naming is
relatively intuitive. The next thing to do is record your function in
embed.fnc. There are a lot of docs in that file which you should read,
but you can also consider it all TMI and just crib from a function
that is similar to the one you want to add. So for instance
sv_setsv_cow() looks like this:

#if defined(PERL_ANY_COW)
: Used in regexec.c
pxXE |SV * |sv_setsv_cow |NULLOK SV *dsv \
|NN SV *ssv
#endif /* defined(PERL_ANY_COW) */

Note that the function definition does NOT have a Perl_ or S_ prefix
on it, and it does not specify the pTHX_ argument. This is managed by
the infra itself based on the flags (the "pxXE" stuff). Also note that
the arguments have non-standard prefixes on them NULLOK for "pointer
allowed to be null" and "NN" for pointer not allowed to be null. We
also support NZ for "not allowed to be zero". For pointers the embed
infra will throw an error if you do not specify one or the other.

After you do this you run `make regen` and various files will be
updated. This will generate various macros One key part will be that
an assert macro will be generated for your function. For instance
PERL_ARGS_ASSERT_SV_SETSV_COW is defined to check the argument for
sv_setsv_cow(). This uses the NN/NZ/NULLOK data to generate the
appropriate asserts. Our test infra will check if the assert is used
in your function as well, so dont forget to add it. Hand rolling these
asserts is not ok, and your patch will almost certainly be rejected or
required to be changed if you do not use this infra properly. There
are a *very* small set of functions, universally S_ prefixed functions
which are not listed in embed.fnc, but really all of our functions
should be listed in embed.fnc.

Another point is that you *should* put your embed definitions inside
of the appropriate #if defined(...) flags for the files which have
access. This makes it much easier to understand what subsystems of
perl use which functions, and to restrict the exposure of those subs
to only the files which should be using them. Each .c file in our
codebase *should* define a symbol like PERL_IN_TOKE_C which says they
are allowed to be used in that file. Eg,

$ git grep "define PERL_IN" *.c | head -n 3
av.c:#define PERL_IN_AV_C
deb.c:#define PERL_IN_DEB_C
doio.c:#define PERL_IN_DOIO_C

An example would be this:

#if defined(PERL_IN_DOOP_C) || defined(PERL_IN_OP_C) ||
defined(PERL_IN_PP_C) \
|| defined(PERL_IN_REGCOMP_ANY) ||
defined(PERL_IN_REGEXEC_C) || \
defined(PERL_IN_TOKE_C) || defined(PERL_IN_UTF8_C)
EiRT |UV * |invlist_array |NN SV * const invlist
EiRT |bool |is_invlist |NULLOK const SV * const invlist
EiRT |bool * |get_invlist_offset_addr \
|NN SV *invlist
EiRT |UV |_invlist_len |NN SV * const invlist
EiRT |bool |_invlist_contains_cp \
|NN SV * const invlist \
|const UV cp
EXpRT |SSize_t|_invlist_search|NN SV * const invlist \
|const UV cp
#endif /* defined(PERL_IN_DOOP_C) || defined(PERL_IN_OP_C) || \
defined(PERL_IN_PP_C) || defined(PERL_IN_REGCOMP_ANY) || \
defined(PERL_IN_REGEXEC_C) || defined(PERL_IN_TOKE_C) || \
defined(PERL_IN_UTF8_C) */

You shouldn't worry about the format of the embed.fnc file, nor
sorting it. Running 'make regen' will automatically format embed.fnc
and sort your code into the right place. You dont even have to worry
about finding the right existing set of guard clauses. Just put the
guard clauses you think are appropriate around your new function, at
the "toplevel" of guard clause nesting, and formatted as you like, and
embed.pl will automatically determine if there is an existing section
with the same rules and put your function into the right place, and
format the defines accordingly. We have a complete implementation of
#if/#ifdef parser which understands how to construct if/else style
blocks and whatnot, so just relax and let the machinery do your
thinking for you.

The only reason a function should NOT be guarded by such guard clauses
is if it is part of the public API and is specifically intended for
the wider world to use. Note that this is a contract, and if you go
this route and need to change the function then you have to deal with
any BBC breakage from doing so. A more restricted form of this is to
put it inside of a PERL_EXT or PERL_CORE define. Which make it
"semi-public", either to the core code, or to extensions intended to
built with Perl. IN theory if people use these defines to access your
function and you change the function they get to keep the pieces. In
practice we tend to treat any BBC breakage the same, even if people
are abusing our facilities. The best case is to restrict it to
specific core files only, which massively reduces the chance that
random developers abuse it. We also have some "functionality" or
"subsystem based defines". For instance all of the files related to
compiling a regex define PERL_IN_REGCOMP_ANY.

* ADDING A GLOBAL OR INTERPRETER VARIABLE *

These get added to either globvar.h or intrpvar.h depending on their
nature. Again, after modifying these files you need to run `make
regen` to regenerate the relevant files. Note the files are loaded
multiple times in one compile with different definitions for the
macros they use, we may ALSO parse these files. Be careful with how
you change them.

For a global variable that has a static intiializer you can get away
by defining it explicitly in the appropriate file, and then listing it
in globvar.sym

* FINDING AN OPCODE *

Perl functions and operators and etc are implemented as "pp" functions
which live in one of several pp_ prefixed C files. If you want to find
the implementation of such a function you grep for 'PP(pp_XXXX)' for
instance the following to find where 'require' is implemented. This is
not always intuitive, for instance pp_subst is in one place, and and
pp_substcont is another.

git grep 'PP(pp_require)'
pp_ctl.c:PP(pp_require)

* ADDING A NEW C FILE *

This is one of the more complex processes involved, and most devs
avoid it and stuff new functionality into one of the existing files.
If you do wish to add a new file be aware you will need to modify
multiple files manually. For instance there are two makefiles in the
win32 directory will need to be modified, and the *nix Makefile.SH
will need to be modifed (this file is used to generate the *nix
Makefile), you will also need to add an entry to MANIFEST. Our CI
processes are pretty good in raising issues related to this. Again,
remember that all .c files in the root directory of the perl repo
*should* define a PERL_IN_xxxx define, and *should* use it in
embed.fnc to maximally guard their functions. Do not expose functions
you do not need to. People WILL use them and they WILL become a
maintenance burden. Also start from the perspective of minimizing the
exposure of your new functions as much as possible.

* BE AWARE OF MANIFEST FILES AND HEADER FILES THAT GET PARSED *

A bunch of our functionality is generated from input/manifest files or
from parsing header files. For instance the opcodes used in the regex
engine are defined in regcomp.sym. We have logic to parse header files
for metadata. Be careful when you mess with these files. Dont just
rename or modify these defines without great thought. You may break
the auto-generation code by making certain changes. In some cases we
perl eval snippets of code generated from the C header files, so be
careful. If you write code that parses a header file use
regen/HeaderParser.pm do NOT hand roll your own header file parser. It
is harder than it might seem to do it right, and HeaderParser.pm does
it right already, and if it doesn't you should patch it so all our
header parsing code benefits.

* SEE ALSO *

- pod/perlhack.pod

- pod/perlhacktips.pod

- pod/perlguts.pod

- INSTALL

- Karen 'ether' Etheridge has published

https://github.com/karenetheridge/misc/blob/master/install/generic/bin/newperl

with her workflow for building a new perl.

- Paul 'LeoNerd' Evans has published

https://metacpan.org/dist/App-sourcepan/view/bin/sourcepan

To automate getting a working environment for a CPAN module based on
its distributed form.

- Tatsuhiko Miyagawa

Has published tooling that Dist::Zilla authors can use to make
contributing easier.

https://metacpan.org/pod/Dist::Milla#Dist::Zilla-makes-contributing-difficult1

- Dan "Grinzz' Book

Has published docs and tooling on installing deps for Dist::Zilla.

https://metacpan.org/pod/Dist::Zilla::Starter#COMMANDS
https://metacpan.org/pod/Dist::Zilla::App::Command::installdeps

And also maintains a package that Dist::Zilla can use to make
contributing easier.

https://metacpan.org/pod/Dist::Zilla::PluginBundle::Starter::Git

============

And that is about as much as I can think of right now about how I hack
perl, the workflows I use, and things I think are worth being aware
of. I havent reviewed Perlhacktips for how duplicative this is.
Apologies in advance if I missed any advice from the previous thread
about Dist::Zilla or mangled a name or anything like that. I tried to
faithfully replicate as much feedback as I could. Please feel free to
reply with your own advice or workflows, and again, if anyone wants to
turn this into a pod document or merge bits of it into one of the
existing documents go right ahead. I likely will not do it myself. I
have a stack of PR's, BBC tickets, and new functionality I am working
on right now so I really dont have the time to polish this more. I
think someone a bit further away from the trees will do a better job
of recognizing the forest than I will right now anyway.

Happy hacking.

Cheers,
Yves














--
perl -Mre=debug -e "/just|another|perl|hacker/"