Mailing List Archive: Regenerating Unicode tables vs. committing them to source

Regenerating Unicode tables vs. committing them to source

Nov 16, 2023, 4:48 AM

Post #1 of 3 (65 views)

It's often necessary when rebuilding a bleadperl to need to run
`git clean -dxf` to tidy up all the generated files (especially after
changing configuration options or a version number bump). This has the
effect of deleting every file not managed by git.

One annoyance is then that the next time you rebuild perl it spends a
very long time regenerating unicode tables, as part of

Processing Blocks.txt
Processing PropList.txt
Processing SpecialCasing.txt
...

I don't think that data changes very often, does it? Is it possible we
can instead build it in a regen script and keep the generated output
committed to git instead? This would make the repo/downloads slightly
larger, but make builds slightly faster. I don't know about other
people but I'd prefer that trade-off.

Thoughts?

--
Paul "LeoNerd" Evans

leonerd@leonerd.org.uk | https://metacpan.org/author/PEVANS
http://www.leonerd.org.uk/ | https://www.tindie.com/stores/leonerd/

Re: Regenerating Unicode tables vs. committing them to source [ In reply to ]

demerphq at gmail

Nov 16, 2023, 4:57 AM

Post #2 of 3 (65 views)

Permalink

On Thu, 16 Nov 2023 at 13:48, Paul "LeoNerd" Evans <leonerd@leonerd.org.uk>
wrote:

> It's often necessary when rebuilding a bleadperl to need to run
> `git clean -dxf` to tidy up all the generated files (especially after
> changing configuration options or a version number bump). This has the
> effect of deleting every file not managed by git.
>
> One annoyance is then that the next time you rebuild perl it spends a
> very long time regenerating unicode tables, as part of
>
> Processing Blocks.txt
> Processing PropList.txt
> Processing SpecialCasing.txt
> ...
>
> I don't think that data changes very often, does it? Is it possible we
> can instead build it in a regen script and keep the generated output
> committed to git instead? This would make the repo/downloads slightly
> larger, but make builds slightly faster. I don't know about other
> people but I'd prefer that trade-off.
>

I have wanted this for a long time, but Karl was opposed. I'll let Karl
explain why himself as I don't remember all the reasons. We do pregenerate
and save parts of the unicode data set, but not all of it.

cheers,
Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

Re: Regenerating Unicode tables vs. committing them to source [ In reply to ]

public at khwilliamson

Nov 16, 2023, 1:13 PM

Post #3 of 3 (65 views)

Permalink

On 11/16/23 05:57, demerphq wrote:
> On Thu, 16 Nov 2023 at 13:48, Paul "LeoNerd" Evans
> <leonerd@leonerd.org.uk <mailto:leonerd@leonerd.org.uk>> wrote:
>
> It's often necessary when rebuilding a bleadperl to need to run
> `git clean -dxf` to tidy up all the generated files (especially after
> changing configuration options or a version number bump). This has the
> effect of deleting every file not managed by git.
>
> One annoyance is then that the next time you rebuild perl it spends a
> very long time regenerating unicode tables, as part of
>
> Processing Blocks.txt
> Processing PropList.txt
> Processing SpecialCasing.txt
> ...
>
> I don't think that data changes very often, does it? Is it possible we
> can instead build it in a regen script and keep the generated output
> committed to git instead? This would make the repo/downloads slightly
> larger, but make builds slightly faster. I don't know about other
> people but I'd prefer that trade-off.
>
>
> I have wanted this for a long time, but Karl was opposed. I'll let Karl
> explain why himself as I don't remember all the reasons. We do
> pregenerate and save parts of the unicode data set, but not all of it.
>
> cheers,
> Yves
>
>

Actually originally it was Nicholas who opposed it.

The problem now is it can't be done because of EBCDIC. Those generated
files differ depending on the target machine's character set, so can't
be put under source control.

I got quite a ways along several years ago in having mktables generate
#ifdef'd tables that can be put under source control, to address the
very issue brought up in this email thread. But then I ran into a road
block that required a significant other effort to resolve. I am nearly
done with that detour and then can continue on with the original work.

mktables then should only need to be run when fixing a bug in it, or
when changing to a new Unicode version.

(I have noticed mktables runs much much faster if perl is compiled with
C optimization enabled.)
--
> perl -Mre=debug -e "/just|another|perl|hacker/"