Mailing List Archive

TLD rules catch non-domain data
Hello,

it seems that some TLD rules catch strings that are not domains:

* 2.0 PDS_OTHER_BAD_TLD Untrustworthy TLDs
* [URI: ups.mfr.date (date)]

* 5.0 KAM_SOMETLD_ARE_BAD_TLD .stream, .trade, .pw, .top, .press,
* .guru, .casa, .online, .cam, .shop, .club & .date TLD Abuse



| ups.date | Internal UPS clock date
| ups.mfr.date | UPS manufacturing date
| ups.test.date | Date of last self test
| battery.date | Battery change date (opaque string) | 11/14/00
| battery.mfr.date | Battery manufacturing date

--
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Save the whales. Collect the whole set.
Re: TLD rules catch non-domain data [ In reply to ]
On 8/20/2021 6:23 AM, Matus UHLAR - fantomas wrote:
>
> it seems that some TLD rules catch strings that are not domains:
>
>     *  2.0 PDS_OTHER_BAD_TLD Untrustworthy TLDs
>     *      [URI: ups.mfr.date (date)]
>
>     *  5.0 KAM_SOMETLD_ARE_BAD_TLD .stream, .trade, .pw, .top, .press,
>     *      .guru, .casa, .online, .cam, .shop, .club & .date TLD Abuse

The KAM rule was just recently fixed. If you have an example that's
still tripping it, post it to a pastebin and share the link here.

This version has the revised rule: 1629386681

Look in this file for the version:

/var/lib/spamassassin/3.004004/kam_sa-channels_mcgrail_com.cf
Re: TLD rules catch non-domain data [ In reply to ]
Kenneth Porter <shiva@sewingwitch.com> writes:

>>     *  5.0 KAM_SOMETLD_ARE_BAD_TLD .stream, .trade, .pw, .top, .press,
>>     *      .guru, .casa, .online, .cam, .shop, .club & .date TLD
>> Abuse
>
> The KAM rule was just recently fixed. If you have an example that's
> still tripping it, post it to a pastebin and share the link here.

I just had it falsely hit, in that it triggered on mail that was ham.
There was a .club URL, but it was to a club website mentioned in mail
that I actually agreed to get and that was on topic.

So I would suggest that rules that do not show actual evidence of spam,
but merely "other people have abused things that seem like you", be
limited to 2 or 3 points.
Re: TLD rules catch non-domain data [ In reply to ]
On 8/20/2021 1:53 PM, Greg Troxel wrote:
> I just had it falsely hit, in that it triggered on mail that was ham.
> There was a .club URL, but it was to a club website mentioned in mail
> that I actually agreed to get and that was on topic.
>
> So I would suggest that rules that do not show actual evidence of spam,
> but merely "other people have abused things that seem like you", be
> limited to 2 or 3 points.

That's a different issue, a matter of policy. The rule correctly
identified a uri with the "bad" domain but the score is more  than you
want. I addressed that by adding my own score in
/etc/mail/spamassassin/KAM-tweaks.cf.
Re: TLD rules catch non-domain data [ In reply to ]
On Fri, 20 Aug 2021 14:16:14 -0700
Kenneth Porter wrote:

> On 8/20/2021 1:53 PM, Greg Troxel wrote:
> > I just had it falsely hit, in that it triggered on mail that was
> > ham. There was a .club URL, but it was to a club website mentioned
> > in mail that I actually agreed to get and that was on topic.
> >
> > So I would suggest that rules that do not show actual evidence of
> > spam, but merely "other people have abused things that seem like
> > you", be limited to 2 or 3 points.
>
> That's a different issue, a matter of policy. The rule correctly
> identified a uri with the "bad" domain but the score is more  than
> you want. I addressed that by adding my own score in
> /etc/mail/spamassassin/KAM-tweaks.cf.

The problem is that overlap between the core and KAM rules can make it
difficult to come-up with a sane value. The same applies to the
various TLD core rules too.

The core rules handle TLDs quite badly because they treat the URI and
address versions as independent indicators even though they obviously
aren't. In particular the author domain commonly leaks into the URI
list via a DKIM signature.

The combined URI and address KAM rule is a better approach, but it's
overlapping with the core rules.

Personally I'm not happy about treating URI hits as the equal of
address hits. For one thing the URI list isn't designed to be
reliable. For another, while there's a wide understanding that abused
TLDs shouldn't be used in email addresses, there's less of a consensus
about websites, and email users don't care at all about what TLDs they
link to. My preference would be to score the URI only if there is no
address hit, and at a lower score.
Re: TLD rules catch non-domain data [ In reply to ]
>On 8/20/2021 6:23 AM, Matus UHLAR - fantomas wrote:
>>it seems that some TLD rules catch strings that are not domains:
>>
>>????*? 2.0 PDS_OTHER_BAD_TLD Untrustworthy TLDs
>>????*????? [URI: ups.mfr.date (date)]
>>
>>????*? 5.0 KAM_SOMETLD_ARE_BAD_TLD .stream, .trade, .pw, .top, .press,
>>????*????? .guru, .casa, .online, .cam, .shop, .club & .date TLD
>>Abuse

On 20.08.21 12:56, Kenneth Porter wrote:
>The KAM rule was just recently fixed. If you have an example that's
>still tripping it, post it to a pastebin and share the link here.
>
>This version has the revised rule: 1629386681
>
>Look in this file for the version:
>
>/var/lib/spamassassin/3.004004/kam_sa-channels_mcgrail_com.cf

FYI I received another mail from the same list and it doesn't look like the
problem has been solved (updated KAM to check)

https://alioth-lists.debian.net/pipermail/nut-upsuser/2021-August/012540.html

I will report it to KAM (as documented by kam.cf) later.


header PDS_OTHER_BAD_TLD eval:check_uri_host_listed('SUSP_URI_NTLD')

meta KAM_SOMETLD_ARE_BAD_TLD (__KAM_SOMETLD_ARE_BAD_TLD_FROM) || (__KAM_SOMETLD_ARE_BAD_TLD_URI && !__KAM_SOMETLD_ARE_BAD_TLD_URI_NEGATIVE)

Aug 24 14:33:29.567 [14949] dbg: rules: uri host enlisted (SUSP_URI_NTLD): battery.date (date)
Aug 24 14:33:29.568 [14949] dbg: rules: ran eval rule PDS_OTHER_BAD_TLD ======> got hit (1)

Aug 24 14:33:30.866 [14949] dbg: rules: ran uri rule __KAM_SOMETLD_ARE_BAD_TLD_URI ======> got hit: "://battery.date"

...there's no :// in the original mail, perhaps added by SA preprocessing?

My original intent was only focused on the battery.date{,.maintenance}.
al intent was only focused on the battery.date{,.maintenance}.</div><div>Ho=


these two are somehow redundant.

uri __KAM_SOMETLD_ARE_BAD_TLD_URI /:\/{2}([a-z0-9-\.]+)\.(pw|stream|trade|press|top|date|guru|casa|online|cam|shop|club|bar)($|\/|\:)/i

header PDS_OTHER_BAD_TLD eval:check_uri_host_listed('SUSP_URI_NTLD')

enlist_uri_host (SUSP_URI_NTLD) icu
enlist_uri_host (SUSP_URI_NTLD) online
enlist_uri_host (SUSP_URI_NTLD) work
enlist_uri_host (SUSP_URI_NTLD) date
enlist_uri_host (SUSP_URI_NTLD) top
enlist_uri_host (SUSP_URI_NTLD) fun
enlist_uri_host (SUSP_URI_NTLD) life
enlist_uri_host (SUSP_URI_NTLD) review
enlist_uri_host (SUSP_URI_NTLD) xyz
enlist_uri_host (SUSP_URI_NTLD) bid
enlist_uri_host (SUSP_URI_NTLD) stream
enlist_uri_host (SUSP_URI_NTLD) site
enlist_uri_host (SUSP_URI_NTLD) space
enlist_uri_host (SUSP_URI_NTLD) gdn
enlist_uri_host (SUSP_URI_NTLD) click
enlist_uri_host (SUSP_URI_NTLD) world
enlist_uri_host (SUSP_URI_NTLD) fit
enlist_uri_host (SUSP_URI_NTLD) ooo
enlist_uri_host (SUSP_URI_NTLD) faith
enlist_uri_host (SUSP_URI_NTLD) buzz
enlist_uri_host (SUSP_URI_NTLD) trade
enlist_uri_host (SUSP_URI_NTLD) cyou
enlist_uri_host (SUSP_URI_NTLD) vip


--
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Spam = (S)tupid (P)eople's (A)dvertising (M)ethod
Re: TLD rules catch non-domain data [ In reply to ]
> FYI I received another mail from the same list and it doesn't look
> like the
> problem has been solved (updated KAM to check)

Different problem but a good example.? Thanks.? KAM Ruleset is updated
to handle it and appreciate you posting about it.


> these two are somehow redundant.
>
> uri???????????? __KAM_SOMETLD_ARE_BAD_TLD_URI
> /:\/{2}([a-z0-9-\.]+)\.(pw|stream|trade|press|top|date|guru|casa|online|cam|shop|club|bar)($|\/|\:)/i
>
> header?? PDS_OTHER_BAD_TLD eval:check_uri_host_listed('SUSP_URI_NTLD')

They do have overlap but both have good purposes.? I use them in concert
with live mail and handle FPs/FNs based on that so the overlap is
considered with the scoring and impact.

Regards,

KAM

--
Kevin A. McGrail
KMcGrail@Apache.org

Member, Apache Software Foundation
Chair Emeritus Apache SpamAssassin Project
https://www.linkedin.com/in/kmcgrail - 703.798.0171