Mailing List Archive: [Bug 7995] New: Add more lookups to properly leverage all Spamhaus DQS potential (and a note about DBL)

https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7995

Bug ID: 7995
Summary: Add more lookups to properly leverage all Spamhaus DQS
potential (and a note about DBL)
Product: Spamassassin
Version: 4.0.0
Hardware: All
OS: All
Status: NEW
Severity: enhancement
Priority: P2
Component: spamassassin
Assignee: dev@spamassassin.apache.org
Reporter: riccardo.alfieri@spamteq.com
Target Milestone: Undefined

Good morning,

I was happy to read bug 7993 and thanks to Michael Storz for porting our
inefficient plugin feature to the core SpamAssassin 4.x codebase. I'm not
kidding, I'm happy that someone more knowledgeable than me is rewriting some of
our functions!

I'm not really in the loop with all the new SA 4.x features, so I'm going to
describe here all the lookups we do for DQS hoping that someone will find a way
to port them in the official codebase.

Good morning,

I was happy to read bug 7993 and thanks to Michael Storz for porting and
improving our inefficient plugin code to the core SpamAssassin 4.x codebase.
I'm not kidding, I'm happy that someone more knowledgeable than me is rewriting
some of our functions!

I'm not really in the loop with all the new SA 4.x features (sorry!), so I'm
going to describe here all the lookups we do for DQS hoping that someone will
find a way to port them in the official codebase. If those lookups are already
possible with 4.x (or even 3.4.6!) I also hope someone will maybe show an
example on how to do it correctly.

Here follows a breakdown of the custom functions we use in our plugin

Function: check_sh_helo.
Explanation: This function checks the domain used in the HELO/EHLO string
against DBL and ZRD.
Example: header SH_HELO_DBL eval:check_sh_helo('<key>.dbl.dq.spamhaus.net',
'^127\.0\.1\.[2-6]$')
The regex at the end is the one that is ran against the answer and defines when
to trigger a positive match

Function: check_sh_bodyemail.
Explanation: This function scans the email body looking for email addresses.
For all email addresses found, it extracts the domain and check it against DBL
and ZRD.
Example: body SH_DBL_BODY eval:check_sh_bodyemail('<key>.dbl.dq.spamhaus.net',
'^127\.0\.1\.[2-6]$')
The regex at the end is the one that is ran against the answer and defines when
to trigger a positive match

Function: check_sh_reverse
Explanation: This function checks the reverse DNS (rDNS) of the last untrusted
relay in both DBL and ZRD
Example: header SH_REVERSE_DBL
eval:check_sh_reverse('<key>.dbl.dq.spamhaus.net', '^127\.0\.1\.[2-6]$')
The regex at the end is the one that is ran against the answer and defines when
to trigger a positive match

Function: check_sh_bodyuri_a
Explanation: This function scans the email body and looks for URLs; when one is
found the hostname is then resolved, and the resulting IP address is checked in
SBL and CSS.
Example: body SH_BODYURI_REVERSE_SBL
eval:check_sh_bodyuri_a('<key>.zen.dq.spamhaus.net', '^127\.0\.0\.2$')
The regex at the end is the one that is ran against the answer and defines when
to trigger a positive match

Function: check_sh_hostname
Explanation: This function extracts whole hostnames starting from URLs in the
email body and is used to check them in the abused-legit component of DBL
Example: body SH_DBL_ABUSED_FULLHOST
eval:check_sh_hostname('<key>.dbl.dq.spamhaus.net', '^127\.0\.1\.10[2-6]$')
The regex at the end is the one that is ran against the answer and defines when
to trigger a positive match

Function: check_sh_crypto
Explanation: This functions looks for cryptowallets in the email body and
checks them in HBL.
Example: body SH_HBL_CW_BTC
eval:check_sh_crypto('_cw.<key>.hbl.dq.spamhaus.net.', '^127\.0\.3\.20$',
'\b(?:bc1|[13])[a-zA-HJ-NP-Z0-9]{25,39}\b', 'BTC')
The two regexes are, respectively, the one that looks at the DNS responses and
the one that matches the particular crypto we are looking for. In this case
it's Bitcoin, and the last parameter is the name of the crypto we want in the
logs.

I think some of those could be replaced by AskDNS, but I'm not really sure how.

Lastly, about DBL lookups. For all intent and purposes, all DBL lookups should
be done at the full hostname level instead of the domain level. all
Spam/Phish/Malware domains are wildcarded so, ie, a lookup for
"www.spamsite.com" or "www2.spamsite.com" will always return a positive match,
as the whole *.spamsite.com domain would be listed. Otherwise, abused-legit
websites are listed at a hostname level, so, ie "subsite1.wordpress.com" could
be listed, while "subsite2.wordpress.com" could be not.

I'm available either here or offlist for any clarification needed. Our final
goal is to make SpamAssassin work in the more efficient way possible with our
data, ensuring the best possible protection for all SpamAssassin users.

--
You are receiving this mail because:
You are the assignee for the bug.