Mailing List Archive

ATTN: BUG BUMP: [Bug 7656] UTF8 rules, normalize_charset etc overhaul
This bug is part of the complex related to smoothing out all the edge
and corner cases of character set encoding for v4. There is some concern
that changing the default for normalize_charset (to enable it) or even
removing the switch altogether to nail down documentation of how to
match problem characters like the Latin-1 "extended ASCII" range:
basically any 8-bit character >127.

Making the change requires some work on rules that look for those
high-bit-set characters by people who understand encoding issues and
common failings (e.g. using a 1-byte high-bit-set character in a
notionally UTF-8 document.) My personal opinion is that the change is
worth the work, but I admit that I've not completely audited the default
rules for problematic cases. I have been writing rules to work with
normalize_charset for many years however. With reasonably modern Perl,
there's no strong argument for normalize_charset=0 beyond the technical
debt of code and rules written to accommodate it.


On 15 Apr 2021, at 8:55, bugzilla-daemon@spamassassin.apache.org wrote:

> https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7656
>
> Bill Cole <billcole@apache.org> changed:
>
> What |Removed |Added
> ----------------------------------------------------------------------------
> CC| |billcole@apache.org
>
> --- Comment #15 from Bill Cole <billcole@apache.org> ---
> (In reply to Henrik Krohns from comment #12)
>> Bumping this bug. Comments? Monologs are getting a bit tiresome.. :-)
>
> +1
>
> The minor pain of revamping rules that match non-ASCII characters is
> compensated by the fact that this is a *normalization* and so reduces
> the
> frequency of edge cases that escape rules written (perhaps
> inadvertently) to
> depend on a particular subset of possible encodings. My personal
> experience
> running SA instances that see a lot of non-ASCII messages is that
> enabling
> normalize_charset is a best practice, and the default is basically
> tech debt.
>
> As for requiring discussion on-list, these comments are sent to the
> dev list.
> I'm going to bump it there to get the attention of anyone filtering
> out
> Bugzilla mail (!? if that's a thing...) and will also post on the
> Users list to
> get a broader audience.
>
> --
> You are receiving this mail because:
> You are the assignee for the bug.


--
Bill Cole
bill@scconsult.com or billcole@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire