Mailing List Archive

Spamhaus spurious positives - how does SpamAssassin check Spamhaus?
I have set up SpamAssassin with the following in
/etc/spamassassin/mycustomscores.cf:

score RCVD_IN_SBL 10.0
score RCVD_IN_XBL 10.0
score RCVD_IN_PBL 10.0
score RCVD_IN_SBL_CSS 10.0
score URIBL_SBL 10.0
score URIBL_CSS 10.0
score URIBL_CSS_A 10.0
score URIBL_SBL_A 10.0

I do not otherwise block using Spamhaus at the MTA or elsewhere.

I occasionally see false positives because of these scores and it is
when a domain is in the body of a message. When I check the Spamhaus
website[1], the domain is not there. Each time this has occurred, it has
been for a website currently in the news and usually something to do
with politics.

A few days ago I happened to be on my computer exactly when one of these
false positives came in[2]. I immediately went and checked the Spamhaus
site and the domain was not listed. I checked several times throughout
the day and never saw the domain there.

So I am trying to figure out why there is a disparity between what
SpamAssassin reports and the Spamhaus website reports, but I'm not clear
how SpamAssassin checks Spamhaus, and since these are usually domains I
rarely have in a message any place, I don't have a good feel for whether
or not this is some regular problem.

If anyone can point me to how this check is performed, that would be
very helpful.

Thank you,


Paul

[1] https://check.spamhaus.org/
[2] Scores:
* 10 URIBL_SBL_A Contains URL's A record listed in the Spamhaus SBL
* blocklist
* [URIs: wikileaksdotorg]
* 10 URIBL_SBL Contains an URL's NS IP listed in the Spamhaus SBL
* blocklist
* [URIs: wikileaksdotorg]
Re: Spamhaus spurious positives - how does SpamAssassin check Spamhaus? [ In reply to ]
On 2022-05-07 16:42, Paul Pace wrote:
> I have set up SpamAssassin with the following in
> /etc/spamassassin/mycustomscores.cf:

> * 10 URIBL_SBL Contains an URL's NS IP listed in the Spamhaus SBL
> * blocklist
> * [URIs: wikileaksdotorg]

add to /etc/spamassassin/mycustomskipuribl.cf:

skip_uribl_domains wikileaksdotorg

or reduce spamhaus score
Re: Spamhaus spurious positives - how does SpamAssassin check Spamhaus? [ In reply to ]
On 2022-05-07 07:53, Benny Pedersen wrote:
> On 2022-05-07 16:42, Paul Pace wrote:
>> I have set up SpamAssassin with the following in
>> /etc/spamassassin/mycustomscores.cf:
>
>> * 10 URIBL_SBL Contains an URL's NS IP listed in the Spamhaus SBL
>> * blocklist
>> * [URIs: wikileaksdotorg]
>
> add to /etc/spamassassin/mycustomskipuribl.cf:
>
> skip_uribl_domains wikileaksdotorg

The problem with this solution is I don't know which domain is going to
be next, plus I'm not so much looking for a solution to this specific
result, but rather I want to understand why there is a disparity between
what SpamAssassin is reporting and what the Spamhaus website is
reporting.

>
> or reduce spamhaus score

With this I will get more spam in my inbox, especially spam sent from
compromised accounts which usually have lots of positive modifiers.
Re: Spamhaus spurious positives - how does SpamAssassin check Spamhaus? [ In reply to ]
On Sat, May 07, 2022 at 09:35:31AM -0700, Paul Pace wrote:
> On 2022-05-07 07:53, Benny Pedersen wrote:
> > On 2022-05-07 16:42, Paul Pace wrote:
> > > * 10 URIBL_SBL Contains an URL's NS IP listed in the Spamhaus SBL
> > > * blocklist
> > > * [URIs: wikileaksdotorg]
>
> The problem with this solution is I don't know which domain is going to be
> next, plus I'm not so much looking for a solution to this specific result,
> but rather I want to understand why there is a disparity between what
> SpamAssassin is reporting and what the Spamhaus website is reporting.

If you do:

grep -r URIBL_SBL /var/lib/spamassassin/
you'll see it does this:

/var/lib/spamassassin/3.004006/updates_spamassassin_org/25_uribl.cf:uridnssub URIBL_SBL zen.spamhaus.org. A 127.0.0.2
/var/lib/spamassassin/3.004006/updates_spamassassin_org/25_uribl.cf:body URIBL_SBL eval:check_uridnsbl('URIBL_SBL')
/var/lib/spamassassin/3.004006/updates_spamassassin_org/25_uribl.cf:describe URIBL_SBL Contains an URL's NS IP listed in the Spamhaus SBL blocklist

which means if it wanted to check (for example) 195.35.109.44 it would do
DNS A record lookup on "44.109.35.195.zen.spamhaus.org" (note reversed quads),
and check if the result is "127.0.0.2" (which happens to be true in this case
at the moment - but might not be some time later):

% host -t a 44.109.35.195.zen.spamhaus.org
44.109.35.195.zen.spamhaus.org has address 127.0.0.2

Same procedure can be used for others RBLs.

As to why web lookup returns different result, is might be because
DNS results was cached earlier (maybe by some previous spam message),
and/or because you did not look it up fast enough. Data on RBL
servers changes all the time, and there is usually delay between
their current database (which is likely what the web interface looks
up directly) and their published DNS records (which would lag behind
it).

Anyway if you do DNS check at the same time (or very close; I think
default TTL there is 60 seconds) as spamassasin does it, you should
get the same result. If you do it minutes or hours later, the results
might be different again (how often they change depend on the RBL in
question, as well as your luck).

--
Opinions above are GNU-copylefted.
Re: Spamhaus spurious positives - how does SpamAssassin check Spamhaus? [ In reply to ]
On 2022-05-07 10:37, Matija Nalis wrote:
> On Sat, May 07, 2022 at 09:35:31AM -0700, Paul Pace wrote:
>> On 2022-05-07 07:53, Benny Pedersen wrote:
>> > On 2022-05-07 16:42, Paul Pace wrote:
>> > > * 10 URIBL_SBL Contains an URL's NS IP listed in the Spamhaus SBL
>> > > * blocklist
>> > > * [URIs: wikileaksdotorg]
>>
>> The problem with this solution is I don't know which domain is going
>> to be
>> next, plus I'm not so much looking for a solution to this specific
>> result,
>> but rather I want to understand why there is a disparity between what
>> SpamAssassin is reporting and what the Spamhaus website is reporting.
>
> If you do:
>
> grep -r URIBL_SBL /var/lib/spamassassin/
> you'll see it does this:
>
> /var/lib/spamassassin/3.004006/updates_spamassassin_org/25_uribl.cf:uridnssub
> URIBL_SBL zen.spamhaus.org. A 127.0.0.2
> /var/lib/spamassassin/3.004006/updates_spamassassin_org/25_uribl.cf:body
> URIBL_SBL eval:check_uridnsbl('URIBL_SBL')
> /var/lib/spamassassin/3.004006/updates_spamassassin_org/25_uribl.cf:describe
> URIBL_SBL Contains an URL's NS IP listed in the Spamhaus
> SBL blocklist
>
> which means if it wanted to check (for example) 195.35.109.44 it would
> do
> DNS A record lookup on "44.109.35.195.zen.spamhaus.org" (note reversed
> quads),
> and check if the result is "127.0.0.2" (which happens to be true in
> this case
> at the moment - but might not be some time later):
>
> % host -t a 44.109.35.195.zen.spamhaus.org
> 44.109.35.195.zen.spamhaus.org has address 127.0.0.2
>
> Same procedure can be used for others RBLs.
>
> As to why web lookup returns different result, is might be because
> DNS results was cached earlier (maybe by some previous spam message),
> and/or because you did not look it up fast enough. Data on RBL
> servers changes all the time, and there is usually delay between
> their current database (which is likely what the web interface looks
> up directly) and their published DNS records (which would lag behind
> it).
>
> Anyway if you do DNS check at the same time (or very close; I think
> default TTL there is 60 seconds) as spamassasin does it, you should
> get the same result. If you do it minutes or hours later, the results
> might be different again (how often they change depend on the RBL in
> question, as well as your luck).

Thank you, this is exactly what I was looking for. Using dig it looks
like the TTL is 2100.
Re: Spamhaus spurious positives - how does SpamAssassin check Spamhaus? [ In reply to ]
On 2022-05-07 at 10:42:59 UTC-0400 (Sat, 07 May 2022 07:42:59 -0700)
Paul Pace <paul@mostlybsd.com>
is rumored to have said:

> I have set up SpamAssassin with the following in /etc/spamassassin/mycustomscores.cf:
>
> score RCVD_IN_SBL 10.0
> score RCVD_IN_XBL 10.0
> score RCVD_IN_PBL 10.0
> score RCVD_IN_SBL_CSS 10.0

Not entirely unreasonable. Cheaper to do most of that in the MTA, unless you have complex whitelisting needs.

> score URIBL_SBL 10.0
> score URIBL_CSS 10.0
> score URIBL_CSS_A 10.0
> score URIBL_SBL_A 10.0

I'm surprised that this is anywhere near usable.

> I do not otherwise block using Spamhaus at the MTA or elsewhere.
>
> I occasionally see false positives because of these scores and it is when a domain is in the body of a message.

So: the URIBL_* rules.

> When I check the Spamhaus website[1], the domain is not there. Each time this has occurred, it has been for a website currently in the news and usually something to do with politics.
>
> A few days ago I happened to be on my computer exactly when one of these false positives came in[2]. I immediately went and checked the Spamhaus site and the domain was not listed. I checked several times throughout the day and never saw the domain there.

The Spamhaus SBL will never show any domain name as listed because it does not list domain names. It lists IP addresses.

> So I am trying to figure out why there is a disparity between what SpamAssassin reports and the Spamhaus website reports, but I'm not clear how SpamAssassin checks Spamhaus, and since these are usually domains I rarely have in a message any place, I don't have a good feel for whether or not this is some regular problem.
>
> If anyone can point me to how this check is performed, that would be very helpful.
>
> Thank you,
>
>
> Paul
>
> [1] https://check.spamhaus.org/
> [2] Scores:
> * 10 URIBL_SBL_A Contains URL's A record listed in the Spamhaus SBL
> * blocklist
> * [URIs: wikileaksdotorg]
> * 10 URIBL_SBL Contains an URL's NS IP listed in the Spamhaus SBL
> * blocklist
> * [URIs: wikileaksdotorg]


Read the rule descriptions carefully. Also see the rule definitions and ` perldoc Mail::SpamAssassin::Plugin::URIDNSBL`

SBL, including its CSS component, lists IP addresses, NOT domain names. In these cases, as documented, SA looks up a specific record type (A, NS, or MX) for a name extracted from an URL to get one or more IP addresses, and then those IP addresses are checked against the DNSBL.

--
Bill Cole
bill@scconsult.com or billcole@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire
Re: Spamhaus spurious positives - how does SpamAssassin check Spamhaus? [ In reply to ]
To me it looks like a a DNS cache times issue...
Paul, what resolver are you using?
is your server under heavy load when this happens? if it is Linux, run    netstat -suna    and check for any errors in the Udp area. In FreeBSD  netstat -sa
----Pedro.

On Saturday, May 7, 2022, 06:36:43 PM GMT+2, Paul Pace <paul@mostlybsd.com> wrote:

>On 2022-05-07 07:53, Benny Pedersen wrote:
> On 2022-05-07 16:42, Paul Pace wrote:
>> I have set up SpamAssassin with the following in
>> /etc/spamassassin/mycustomscores.cf:
>
>>     *  10 URIBL_SBL Contains an URL's NS IP listed in the Spamhaus SBL
>>     *      blocklist
>>     *      [URIs: wikileaksdotorg]
>
> add to /etc/spamassassin/mycustomskipuribl.cf:
>
> skip_uribl_domains wikileaksdotorg

>The problem with this solution is I don't know which domain is going to
>be next, plus I'm not so much looking for a solution to this specific
>result, but rather I want to understand why there is a disparity between
>what SpamAssassin is reporting and what the Spamhaus website is
>reporting.

>
> or reduce spamhaus score

>With this I will get more spam in my inbox, especially spam sent from
>compromised accounts which usually have lots of positive modifiers.