On 2021-07-31 12:17 p.m., Darren Duncan wrote:
> Now conversely, I don't have a problem with actually waiting until v5.38 to
> fully implement the change IF 5.36 contained some kind of precursor to prepare
> the way, such as that 5.36 would issue warnings for code with a "use 5.36" that
> wasn't valid UTF-8, saying that this code might parse differently under "use
> 5.38". That would let users know in a transitional version what might be a
> problem before it is.
So to clarify, I have a very specific proposal:
1. That a "use 5.36;" will behave the same with respect to the uft8 stuff as
"use 5.34;", but that if the source file / input stream is not entirely valid
UTF-8 under a strict interpretation, the Perl interpreter will issue a warning
saying so and why it matters.
2. That a "use 5.38;", if the source file / input stream is not entirely valid
UTF-8 under a strict interpretation, the Perl interpreter will issue a fatal
error / die saying so and why it matters, and that as a result the parsing has
failed.
So a key thing is that the UTF-8 mode triggered by 5.36/5.38 is strict, doesn't
use substitution characters or delete characters, it either passes the input
unchanged as valid UTF-8 or it complains. If "use utf8;" already does this then
its the same, and otherwise it is stricter.
Since this isn't spelled the same as "use utf8;" the new feature doesn't need to
be identical in every way, we don't have to limit ourselves to that and the
issues of silent corruption from substitution/deleting being the implicit
operation, if that is what it used to do.
On a further point, unlike a lot of the other "use" statements, I assume there
is no good reason for a single file to be a mixture of literal encodings, and so
having multiple "use encoding" statements in a file, either explicit or implied
by a "use 5.38" etc, should be considered an error, and any occurrence of one
would be expected to describe the entire file and not just the lexical scope it
appears in, unlike strict/warnings/etc, its not flipped on or off mid-file.
-- Darren Duncan
> Now conversely, I don't have a problem with actually waiting until v5.38 to
> fully implement the change IF 5.36 contained some kind of precursor to prepare
> the way, such as that 5.36 would issue warnings for code with a "use 5.36" that
> wasn't valid UTF-8, saying that this code might parse differently under "use
> 5.38". That would let users know in a transitional version what might be a
> problem before it is.
So to clarify, I have a very specific proposal:
1. That a "use 5.36;" will behave the same with respect to the uft8 stuff as
"use 5.34;", but that if the source file / input stream is not entirely valid
UTF-8 under a strict interpretation, the Perl interpreter will issue a warning
saying so and why it matters.
2. That a "use 5.38;", if the source file / input stream is not entirely valid
UTF-8 under a strict interpretation, the Perl interpreter will issue a fatal
error / die saying so and why it matters, and that as a result the parsing has
failed.
So a key thing is that the UTF-8 mode triggered by 5.36/5.38 is strict, doesn't
use substitution characters or delete characters, it either passes the input
unchanged as valid UTF-8 or it complains. If "use utf8;" already does this then
its the same, and otherwise it is stricter.
Since this isn't spelled the same as "use utf8;" the new feature doesn't need to
be identical in every way, we don't have to limit ourselves to that and the
issues of silent corruption from substitution/deleting being the implicit
operation, if that is what it used to do.
On a further point, unlike a lot of the other "use" statements, I assume there
is no good reason for a single file to be a mixture of literal encodings, and so
having multiple "use encoding" statements in a file, either explicit or implied
by a "use 5.38" etc, should be considered an error, and any occurrence of one
would be expected to describe the entire file and not just the lexical scope it
appears in, unlike strict/warnings/etc, its not flipped on or off mid-file.
-- Darren Duncan