Mailing List Archive

Pre-RFC: Width-aware (s)printf flag
Consider

printf "%-40s : %s\n", $_->@* for @rows;

The intention is to print a nice neat table on the terminal.

This works fine in ASCII but gets all confused if any ->[0] element
contains Unicode text. While Perl will count in Unicode codepoints,
this won't help if there are combining chars (because combining chars
count as codepoints but do not consume terminal columns), or if there
are any emoji or other double-width characters (because these single
graphemes count as two columns).

I propose a new printf flag, perhaps `|`, to tell (s)printf to count
these strings by terminal width instead. Thus

printf "%-|40s : %s\n", $_->@* for @rows;

would now print a neat table even in the presence of Weird Unicode.


If no objections I'll write up an RFC.

--
Paul "LeoNerd" Evans

leonerd@leonerd.org.uk | https://metacpan.org/author/PEVANS
http://www.leonerd.org.uk/ | https://www.tindie.com/stores/leonerd/
Re: Pre-RFC: Width-aware (s)printf flag [ In reply to ]
On Thu, Jul 1, 2021 at 12:46 PM Paul "LeoNerd" Evans <leonerd@leonerd.org.uk>
wrote:

> Consider
>
> printf "%-40s : %s\n", $_->@* for @rows;
>
> The intention is to print a nice neat table on the terminal.
>
> This works fine in ASCII but gets all confused if any ->[0] element
> contains Unicode text. While Perl will count in Unicode codepoints,
> this won't help if there are combining chars (because combining chars
> count as codepoints but do not consume terminal columns), or if there
> are any emoji or other double-width characters (because these single
> graphemes count as two columns).
>
> I propose a new printf flag, perhaps `|`, to tell (s)printf to count
> these strings by terminal width instead. Thus
>
> printf "%-|40s : %s\n", $_->@* for @rows;
>
> would now print a neat table even in the presence of Weird Unicode.
>

As mentioned on IRC, I think it would be nice to have more grapheme-aware
capability in core; right now the only grapheme-aware functionality I know
of is the \X regular expression matcher which matches a single grapheme
(and more manual stuff using Unicode::UCD).

There is one potential problem here: you normally need to encode characters
to bytes in order to print them. The grapheme determination would need to
happen before encoding. This would work out if you're printing to a handle
with an encoding layer, but probably cause confusion in the usual case.

-Dan
Re: Pre-RFC: Width-aware (s)printf flag [ In reply to ]
2021-7-2 1:52 Dan Book <grinnz@gmail.com> wrote:

> On Thu, Jul 1, 2021 at 12:46 PM Paul "LeoNerd" Evans <
> leonerd@leonerd.org.uk> wrote:
>
>> Consider
>>
>> printf "%-40s : %s\n", $_->@* for @rows;
>>
>> There is one potential problem here: you normally need to encode
> characters to bytes in order to print them. The grapheme determination
> would need to happen before encoding. This would work out if you're
> printing to a handle with an encoding layer, but probably cause confusion
> in the usual case.
>
> -Dan
>

Does the following code work well?

use utf8;
use Encode 'encode';

my $ouptput = '';

for my $row (@rows) {
$ouptput .= sprintf "????? %-|40s : %s\n", $row->@*;
}

print encode('UTF-8', $output);
Re: Pre-RFC: Width-aware (s)printf flag [ In reply to ]
On Thu, Jul 1, 2021 at 7:50 PM Yuki Kimoto <kimoto.yuki@gmail.com> wrote:

>
> 2021-7-2 1:52 Dan Book <grinnz@gmail.com> wrote:
>
>> On Thu, Jul 1, 2021 at 12:46 PM Paul "LeoNerd" Evans <
>> leonerd@leonerd.org.uk> wrote:
>>
>>> Consider
>>>
>>> printf "%-40s : %s\n", $_->@* for @rows;
>>>
>>> There is one potential problem here: you normally need to encode
>> characters to bytes in order to print them. The grapheme determination
>> would need to happen before encoding. This would work out if you're
>> printing to a handle with an encoding layer, but probably cause confusion
>> in the usual case.
>>
>> -Dan
>>
>
> Does the following code work well?
>
> use utf8;
> use Encode 'encode';
>
> my $ouptput = '';
>
> for my $row (@rows) {
> $ouptput .= sprintf "????? %-|40s : %s\n", $row->@*;
> }
>
> print encode('UTF-8', $output);
>
>
Yes, encoding after this operation is done with sprintf would work fine.
The documentation could explain this.

-Dan
Re: Pre-RFC: Width-aware (s)printf flag [ In reply to ]
I have no objection.

I always use sprintf and print combination.

I feel that it is good that the convenience of sprintf increased although
there is a little confusion.
Re: Pre-RFC: Width-aware (s)printf flag [ In reply to ]
> I propose a new printf flag, perhaps `|`, to tell (s)printf to countthese strings by terminal width instead.

It would be very useful if this change also provided a `width` function, as a sister to `length`.

There are many CPAN modules that generate tables, columnar output, etc, which would benefit from such a function. It would be a pain to get this completely right, and keep it right with new emojis every year, but it would be a good Unicode feature to have in Perl’s toolbox.

Neil
Re: Pre-RFC: Width-aware (s)printf flag [ In reply to ]
On Fri, 2 Jul 2021 11:24:34 +0100
Neil Bowers <neilb@neilb.org> wrote:

> It would be very useful if this change also provided a `width`
> function, as a sister to `length`.
>
> There are many CPAN modules that generate tables, columnar output,
> etc, which would benefit from such a function. It would be a pain to
> get this completely right, and keep it right with new emojis every
> year, but it would be a good Unicode feature to have in Perl’s
> toolbox.

I too have some precedent here:

https://metacpan.org/pod/Tickit::Utils#textwidth

--
Paul "LeoNerd" Evans

leonerd@leonerd.org.uk | https://metacpan.org/author/PEVANS
http://www.leonerd.org.uk/ | https://www.tindie.com/stores/leonerd/
Re: Pre-RFC: Width-aware (s)printf flag [ In reply to ]
We discussed this at last week’s PSC meeting:

• We like the idea of the sprintf capability, and a "width" function analogous to length.
• It needs more thought to write a thorough RFC which covers the various points, and potential gotchas, and that needs someone who groks all the issues / relevant internals, and has both the time and the inclination.

In the meantime we’ve marked this as Parked.

Neil