Mailing List Archive

[Bug 2998] utf8clean should mask surrogate code points (U00D800 to U00DFFFF)
https://bugs.exim.org/show_bug.cgi?id=2998

--- Comment #1 from Jeremy Harris <jgh146exb@wizmail.org> ---
The patch looks simple, but I can't pretend to understand that bit of
RFC 2279. It seems to be taking about UCS-2 rather than UTF-8.
Is a better description possible?

--
You are receiving this mail because:
You are on the CC list for the bug.

--
## subscription configuration (requires account):
## https://lists.exim.org/mailman3/postorius/lists/exim-dev.lists.exim.org/
## unsubscribe (doesn't require an account):
## exim-dev-unsubscribe@lists.exim.org
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/
Re: [Bug 2998] utf8clean should mask surrogate code points (U00D800 to U00DFFFF) [ In reply to ]
On 2023-07-22, Exim Bugzilla via Exim-dev <exim-dev@lists.exim.org> wrote:
> https://bugs.exim.org/show_bug.cgi?id=2998
>
> --- Comment #1 from Jeremy Harris <jgh146exb@wizmail.org> ---
> The patch looks simple, but I can't pretend to understand that bit of
> RFC 2279. It seems to be taking about UCS-2 rather than UTF-8.
> Is a better description possible?

interestingly that RFC seems to use UCS-2 interchanably with UTF-16


There was an excellent discussion of WTF-8 (like UTF-8 but with
surrogates) somewhere on the ineternet (I thought wikipedia, but I
can't find it now)


https://unicodebook.readthedocs.io/unicode_encodings.html
section 7.5. UTF-16 surrogate pairs

This bug is mainly motiviated by postgresql only accepting well formed
UTF-8. so UTF-8 that encodes uFE01 is rejected and leads to
mis-behaviour.


--
Jasen.
???????? ????? ???????

--
## subscription configuration (requires account):
## https://lists.exim.org/mailman3/postorius/lists/exim-dev.lists.exim.org/
## unsubscribe (doesn't require an account):
## exim-dev-unsubscribe@lists.exim.org
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/
[Bug 2998] utf8clean should mask surrogate code points (U00D800 to U00DFFFF) [ In reply to ]
https://bugs.exim.org/show_bug.cgi?id=2998

Jeremy Harris <jgh146exb@wizmail.org> changed:

What |Removed |Added
----------------------------------------------------------------------------
Assignee|unallocated@exim.org |jgh146exb@wizmail.org
Status|NEW |ASSIGNED

--
You are receiving this mail because:
You are on the CC list for the bug.

--
## subscription configuration (requires account):
## https://lists.exim.org/mailman3/postorius/lists/exim-dev.lists.exim.org/
## unsubscribe (doesn't require an account):
## exim-dev-unsubscribe@lists.exim.org
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/