Mailing List Archive: FP on KAM_SOMETLD_ARE_BAD

FP on KAM_SOMETLD_ARE_BAD_TLD

Apr 12, 2023, 8:26 AM

Post #1 of 3 (131 views)

A lovely message from a reputable sender with a penchant for fancy email formatting has CSS rules expressed in JSON, presumably so it can adjust for the mail client or some such.

A segment contains the text:

"items":[{"type":"Input.Date","id":"date"}]}

The KAM_SOMETLD_ARE_BAD_TLD rule is triggering on Input.Date. The rule is weighed quite high by default (5.0 here).
This is pushing messages over the spam threshold. I've adjusted the weight locally but it's probably something that should be tweaked globally.

--
For SpamAssassin Users List

Re: FP on KAM_SOMETLD_ARE_BAD_TLD [ In reply to ]

gdt at lexort

Apr 12, 2023, 5:42 PM

Post #2 of 3 (131 views)

Permalink

Alan <spamassassin.twyie@ambitonline.com> writes:

> A lovely message from a reputable sender with a penchant for fancy
> email formatting has CSS rules expressed in JSON, presumably so it can
> adjust for the mail client or some such.
>
> A segment contains the text:
>
> "items":[{"type":"Input.Date","id":"date"}]}
>
> The KAM_SOMETLD_ARE_BAD_TLD rule is triggering on Input.Date. The rule is weighed quite high by default (5.0 here).
> This is pushing messages over the spam threshold. I've adjusted the weight locally but it's probably something that should be tweaked globally.

(The KAM rules are on the aggressive side, and downscoring is appropriate
for those who like to be a bit less aggressive, especially those who are
not comfortable with single rules over 4ish. But I am still running
them, because I think they help a lot more than they hurt.)

You seem to be suggesting reducing score, but that's not the real issue
in this case. What you have found, I think, is treating something like
a URL that isn't. However, that's really hard to fix given the MUA
so-called feature of treating things that sort of look like URLs as
URLs.

If you haven't, I would send the message in question to KAM for analysis
and perhaps rule adjustment.

FWIW, I find that I have adjusted score to 1.5.

Re: FP on KAM_SOMETLD_ARE_BAD_TLD [ In reply to ]

spamassassin.twyie at ambitonline

Apr 13, 2023, 4:39 AM

Post #3 of 3 (131 views)

Permalink

On 2023-04-12 20:42, Greg Troxel wrote:
> Alan<spamassassin.twyie@ambitonline.com> writes:
>
>> A lovely message from a reputable sender with a penchant for fancy
>> email formatting has CSS rules expressed in JSON, presumably so it can
>> adjust for the mail client or some such.
>>
>> A segment contains the text:
>>
>> "items":[{"type":"Input.Date","id":"date"}]}
>>
>> The KAM_SOMETLD_ARE_BAD_TLD rule is triggering on Input.Date. The rule is weighed quite high by default (5.0 here).
>> This is pushing messages over the spam threshold. I've adjusted the weight locally but it's probably something that should be tweaked globally.
> (The KAM rules are on the aggressive side, and downscoring is appropriate
> for those who like to be a bit less aggressive, especially those who are
> not comfortable with single rules over 4ish. But I am still running
> them, because I think they help a lot more than they hurt.)
>
> You seem to be suggesting reducing score, but that's not the real issue
> in this case. What you have found, I think, is treating something like
> a URL that isn't. However, that's really hard to fix given the MUA
> so-called feature of treating things that sort of look like URLs as
> URLs.
>
> If you haven't, I would send the message in question to KAM for analysis
> and perhaps rule adjustment.
>
> FWIW, I find that I have adjusted score to 1.5.

KAM is on this list and has replied off list. I trust him to find the
best way to mitigate the problem.

I just lowered the score knowing it will take some time for any update
to make it through my upstream. Short of running a headless Chromium and
parsing the entire HTML and then inspecting the resulting DOM there are
always going to be issues like this. I've been doing battle with a
particularly persistent spammer (multiple spams per user per day from
different sources) who always used long URLs that followed a specific
format. Now he uses three formats, so I have to only match on the
handful of users who I know are on his list to avoid my own FPs. With
that one, I really wish I had the DOM because the [curse words] follows
a format that would be easy to catch with an XPATH query.

All in a day's work...

--
For SpamAssassin Users List