Porters,
I recently posted the suggestion <http://markmail.org/message/wywgcbwhu2nhykxc> that "use v5.36.0" should imply "use utf8", which led to a pretty large thread in which Felipe Gasper repeatedly said "This is going to make things worse, not better." I spent a lot of time grumbling about this to myself, figuring out exactly how to rebut this, and then deciding that I tentatively, partly, agreed with him.
We want each improvement to be a ratcheting up in language usability, when possible, rather than "we made things worse so we could make them better." At present, because we don't (and can't) know whether a string is text or bytes, we don't (and can't) automatically encode it when it hits a bytestream. We also don't know reliably whether a given output handle is already expecting to do that encoding for us.
I am 100% certain that adding "use utf8" to the feature bundle would be better *for me*, but I already have a pretty strong grasp of the I/O model of Perl. I'm not sure it's better enough for everybody.
At the PSC, we had a long talk about this, and another proposal was made:
We introduce a new stricture, which I'll call "source_encoding". Under "use strict 'source_encoding'", the compiler will raise an exception when the source contains non-ASCII content unless the utf8 pragma is in effect. The error raised can drive the programmer to documentation explaining the various trade-offs. That is: you can turn on utf8 and deal with how this affects your I/O, or you can disable the stricture, or you can restate your non-ASCII content as ASCII by using escaping constructs.
I'm not *sure* this is an improvement, but I think it is. This prevents the "I forgot to add utf8 and so only discovered after runtime that I have doubly-encoded my output" bug.
--
rjbs
I recently posted the suggestion <http://markmail.org/message/wywgcbwhu2nhykxc> that "use v5.36.0" should imply "use utf8", which led to a pretty large thread in which Felipe Gasper repeatedly said "This is going to make things worse, not better." I spent a lot of time grumbling about this to myself, figuring out exactly how to rebut this, and then deciding that I tentatively, partly, agreed with him.
We want each improvement to be a ratcheting up in language usability, when possible, rather than "we made things worse so we could make them better." At present, because we don't (and can't) know whether a string is text or bytes, we don't (and can't) automatically encode it when it hits a bytestream. We also don't know reliably whether a given output handle is already expecting to do that encoding for us.
I am 100% certain that adding "use utf8" to the feature bundle would be better *for me*, but I already have a pretty strong grasp of the I/O model of Perl. I'm not sure it's better enough for everybody.
At the PSC, we had a long talk about this, and another proposal was made:
We introduce a new stricture, which I'll call "source_encoding". Under "use strict 'source_encoding'", the compiler will raise an exception when the source contains non-ASCII content unless the utf8 pragma is in effect. The error raised can drive the programmer to documentation explaining the various trade-offs. That is: you can turn on utf8 and deal with how this affects your I/O, or you can disable the stricture, or you can restate your non-ASCII content as ASCII by using escaping constructs.
I'm not *sure* this is an improvement, but I think it is. This prevents the "I forgot to add utf8 and so only discovered after runtime that I have doubly-encoded my output" bug.
--
rjbs