On 2023-04-12 20:42, Greg Troxel wrote:
> Alan<spamassassin.twyie@ambitonline.com> writes:
>
>> A lovely message from a reputable sender with a penchant for fancy
>> email formatting has CSS rules expressed in JSON, presumably so it can
>> adjust for the mail client or some such.
>>
>> A segment contains the text:
>>
>> "items":[{"type":"Input.Date","id":"date"}]}
>>
>> The KAM_SOMETLD_ARE_BAD_TLD rule is triggering on Input.Date. The rule is weighed quite high by default (5.0 here).
>> This is pushing messages over the spam threshold. I've adjusted the weight locally but it's probably something that should be tweaked globally.
> (The KAM rules are on the aggressive side, and downscoring is appropriate
> for those who like to be a bit less aggressive, especially those who are
> not comfortable with single rules over 4ish. But I am still running
> them, because I think they help a lot more than they hurt.)
>
> You seem to be suggesting reducing score, but that's not the real issue
> in this case. What you have found, I think, is treating something like
> a URL that isn't. However, that's really hard to fix given the MUA
> so-called feature of treating things that sort of look like URLs as
> URLs.
>
> If you haven't, I would send the message in question to KAM for analysis
> and perhaps rule adjustment.
>
> FWIW, I find that I have adjusted score to 1.5.
KAM is on this list and has replied off list. I trust him to find the
best way to mitigate the problem.
I just lowered the score knowing it will take some time for any update
to make it through my upstream. Short of running a headless Chromium and
parsing the entire HTML and then inspecting the resulting DOM there are
always going to be issues like this. I've been doing battle with a
particularly persistent spammer (multiple spams per user per day from
different sources) who always used long URLs that followed a specific
format. Now he uses three formats, so I have to only match on the
handful of users who I know are on his list to avoid my own FPs. With
that one, I really wish I had the DOM because the [curse words] follows
a format that would be easy to catch with an XPATH query.
All in a day's work...
--
For SpamAssassin Users List