Mailing List Archive

Definitions WAS Re: Pre-RFC: Rename SVf_UTF8 et al.
> On Sep 3, 2021, at 9:09 AM, Leon Timmermans <fawaka@gmail.com> wrote:
>
> It rather sounds to me like your disagreement is mostly on definitions. This happens a lot in discussing perl unicode support

That’s a major part of the problem, yes, but that disagreement reflects the more fundamental divergence of mental models regarding Perl’s character-storage abstraction.

Defining “UTF-8 string” in terms of the UTF8 flag would imply that Perl users *should* think about the flag, in which light it was natural for Yves to propose recently that the documentation should stop discouraging users from checking it. It would still confuse those same users, though, when they use feature bundles or compare upgraded & downgraded strings.

If we retain the notion that Perl users shouldn’t care about the flag, though, then “UTF-8 string” only makes sense as referring to the code points.

Given that most parts of Perl *do* abstract away the distinction, and that the feature bundles fix many (not all) of the outliers, IMO it’s much more sensible to define “UTF-8 string” in terms of the logical code points. JSON encoders output JSON. UTF-8 encoders output UTF-8. Perl can store either however it pleases.

-F