On Tue, Feb 2, 2021 at 9:40 PM Emily Bowman <silverbacknet@gmail.com> wrote:
>
> On Tue, Feb 2, 2021 at 3:47 AM Inada Naoki <songofacandy@gmail.com> wrote:
>>
>> But when wchar_t* is UTF-16, ucs2_utf8_encoder() can not handle
>> surrogate escape.
>> We need to use a temporary Unicode object. That is what "inefficient" means.
>
>
> Since real UCS-2 is effectively dead, maybe it should be flipped around: Make UTF-16 be the efficient path and UCS-2 be the path that needs to round-trip through Unicode. But I suppose that's out of scope for this PEP.
>
> -Em
Note the ucs2_utf8_encoder() is used only for encoding Python Unicode
object for now.
Unicode object is latin1, UCS2, or UCS4. It never be UTF-16.
So if we support add UTF-16 support to ucs2_utf8_encoder(), it means
we need to add code and maintain only for PyUnicode_EncodeUTF8 (encode
from wchar_t* into char*).
I don't think it is a good deal. As described in the PEP, encoder APIs
are used very rarely.
We must not add any maintainece costs for them.
Regards,
--
Inada Naoki <songofacandy@gmail.com>
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/KDYTBQDA4UFE6XWYENOV32ZRTCTAYEPC/
Code of Conduct: http://python.org/psf/codeofconduct/
>
> On Tue, Feb 2, 2021 at 3:47 AM Inada Naoki <songofacandy@gmail.com> wrote:
>>
>> But when wchar_t* is UTF-16, ucs2_utf8_encoder() can not handle
>> surrogate escape.
>> We need to use a temporary Unicode object. That is what "inefficient" means.
>
>
> Since real UCS-2 is effectively dead, maybe it should be flipped around: Make UTF-16 be the efficient path and UCS-2 be the path that needs to round-trip through Unicode. But I suppose that's out of scope for this PEP.
>
> -Em
Note the ucs2_utf8_encoder() is used only for encoding Python Unicode
object for now.
Unicode object is latin1, UCS2, or UCS4. It never be UTF-16.
So if we support add UTF-16 support to ucs2_utf8_encoder(), it means
we need to add code and maintain only for PyUnicode_EncodeUTF8 (encode
from wchar_t* into char*).
I don't think it is a good deal. As described in the PEP, encoder APIs
are used very rarely.
We must not add any maintainece costs for them.
Regards,
--
Inada Naoki <songofacandy@gmail.com>
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/KDYTBQDA4UFE6XWYENOV32ZRTCTAYEPC/
Code of Conduct: http://python.org/psf/codeofconduct/