Mailing List Archive

Managing long welcome_senders list
I have a score-reducing algorithm for SA based on known 'good' senders.
From a simple one-address-per-line file (which can easily be manually
or automatically edited) is built a local_welcoming.cf file which is
used by SA - with lines like this:

score LOCAL_WELCOMING_4 -4
header LOCAL_WELCOMING_4 From =~
/(\@myfriend\.com|jennifer_smith\@btinternet\.com|\fred321@gmail\.com)>?\s*$/i

But this is a just a short example with 3 addresses. In reality I have a
single line with c.2000 addresses all concatenated like this, and it is
growing. It works fine, but I suspect it is sub-optimal i.e. horrible to
read and perhaps slow to parse. Is there a line length limit in SA? Is
there a better way? Most of the listed items are full email addresses
but some are domains only.

Thanks for any suggestions.
Re: Managing long welcome_senders list [ In reply to ]
You can break up the rule into multiple rules and use a meta rule,
definitely more readable and gives you some flexibility as well as more
information when debugging rules and timing things.

header __WELCOMING_LIST1 From =~ ...
header __WELCOMING_LIST2 From =~ ...

score LOCAL_WELCOMING_4 -4
meta LOCAL_WELCOMING_4 __WELCOMING_LIST1 || __WELCOMING_LIST2 ...
Re: Managing long welcome_senders list [ In reply to ]
On 2021-12-02 14:42, Dominic Raferd wrote:

> Thanks for any suggestions.

+1

https://spamassassin.apache.org/full/3.4.x/doc/Mail_SpamAssassin_Conf.html

see enlist_addrlist, easy to maintain, does imho what you want
Re: Managing long welcome_senders list [ In reply to ]
On Thu, 2 Dec 2021, Dominic Raferd wrote:

> I have a score-reducing algorithm for SA based on known 'good' senders. From
> a simple one-address-per-line file (which can easily be manually or
> automatically edited) is built a local_welcoming.cf file which is used by SA
> - with lines like this:
>
> score LOCAL_WELCOMING_4 -4
> header LOCAL_WELCOMING_4 From =~
> /(\@myfriend\.com|jennifer_smith\@btinternet\.com|\fred321@gmail\.com)>?\s*$/i
>
> But this is a just a short example with 3 addresses. In reality I have a
> single line with c.2000 addresses all concatenated like this, and it is
> growing.

The tools available in the MTA may be easier to leverage for this than SA
- for example, something like matching the envelope sender to a pattern or
list in a dynamic database and modifying the message if it hits.

In that case you have the option of conditionally adding a custom header
to the message prior to passing it off to SA for scanning. Then you could
have a SA rule that hits on something like "header
X-LOCAL-WELCOME-SENDER-salt exists".

You could also potentially hard-whitelist those senders in the MTA and
just bypass SA scanning for them entirely, but that does have the downside
of accepting spam from them if their account gets hacked, for example.

--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
Activist: Someone who gets involved.
Unregistered Lobbyist: Someone who gets involved
with something the MSM doesn't approve of. -- WizardPC
-----------------------------------------------------------------------
5 days until The 80th anniversary of Pearl Harbor
Re: Managing long welcome_senders list [ In reply to ]
On Thu, 2021-12-02 at 13:42 +0000, Dominic Raferd wrote:
> I have a score-reducing algorithm for SA based on known 'good' senders.
>  From a simple one-address-per-line file (which can easily be manually
> or automatically edited) is built a local_welcoming.cf file which is
> used by SA - with lines like this:
>
> score LOCAL_WELCOMING_4 -4
> header LOCAL_WELCOMING_4 From =~
> /(\@myfriend\.com|jennifer_smith\@btinternet\.com|\
> fred321@gmail\.com)>?\s*$/i
>
I ran into this problem quite some time ago and wrote 'portmanteau', a
tool easing the maintenance of Spamassassin rules which consist of very
large lists of alternates. It does this by storing the rule definition
in a form that's much easier to edit than it would be if it was written
as a single very long line or split into a set of subrules plus a meta
rule to combine them. It is a bash shell script that uses a gawk script
to do the heavy lifting.

The elements of each SA rule it constructs are held in an easily edited
'.def' file which, among other things, expects each regex in a list of
alternates to be on a separate line. Normallly, you'd hold the set of
rule definitions in the same directory. Running the 'portmanteau' script
constructs one valid SA rule, which may be built from subrules and
metas, from each .def file in thr directory and writes the complete set
of generated rules to a single .cf file. The rule building process is
fast enough that its not worthwhile building them separately.

The rule constructor is an awk script, so there's nothing exotic in it
and no external dependencies, always assuming yo have awk or gawk
installed.

I use 'portmanteau' rules for everything from maintaining personal
blacklists to constructing complex rules that do things like recognising
toxic attachment types or sets of phrases that, if found in several
headers and/or body text that together identify specific spam types and
score the message accordingly.

You can find the 'portmanteau' tool here:
https://www.libelle-systems.com/free/portmanteau/portmanteau.tgz

Martin
Re: Managing long welcome_senders list [ In reply to ]
On 02/12/2021 16:26, Martin Gregorie wrote:
> On Thu, 2021-12-02 at 13:42 +0000, Dominic Raferd wrote:
>> I have a score-reducing algorithm for SA based on known 'good' senders.
>>  From a simple one-address-per-line file (which can easily be manually
>> or automatically edited) is built a local_welcoming.cf file which is
>> used by SA - with lines like this:
>>
>> score LOCAL_WELCOMING_4 -4
>> header LOCAL_WELCOMING_4 From =~
>> /(\@myfriend\.com|jennifer_smith\@btinternet\.com|\
>> fred321@gmail\.com)>?\s*$/i
>>
> I ran into this problem quite some time ago and wrote 'portmanteau'...
Thanks to all for the suggestions and comments. I am looking into
enlist_addrlist and portmanteau.
Re: Managing long welcome_senders list [ In reply to ]
For Dominic Raferd:

Another approach also works for me: if you can automatically capture the
addresses you've sent mail to, these addresses make a perfect, self-
maintaining whitelist.

If you're running Postfix then you can use its automatic BCC option to
feed a copy of all mail, including outbound messages, whatever process
you use to build a list of your mail recipients. Other MTAs probably
have a similar ability, but I don't use them, so can't comment further.

A database makes a convenient place to keep the your correspondent list
because discarding duplicate addresses then becomes a built-in facility
and writing an SA plugin plus associated rule to interrogate the list
and add negative points to the message is simple.

My correspondent list is part of my mail archive, which is held as a
PostgreSQL database. The associated functions I use to maintain and
interrogate the correspondent list are:

a) a BCC directive added to the Postfix configuration or the equivalent
if you use a different MTA
b) a Java application run each night to load the previous day's mail,
both received and sent, into the database
c) an SQL view that selects any message(s) in the archive that were sent
to the address being checked
d) a Perl plugin to execute the view using the message's sender as its
search key and return TRUE if any messages were selected
e) an SA rule to trigger the Perl plugin and add a negative score
if the Perl plugin returns TRUE

You'd need code to implement all five functions, but if you store your
correspondent address list as a sorted text file, then all the code
would be much simplified: 

- 'b' could be a Perl or awk script run as an additional 'logwatch'
report that scans the previous day's part of the mail log, adds any
new addresses to the sorted list

- 'c' and 'd' could be combined as a single Perl plugin.

Martin
Re: Managing long welcome_senders list [ In reply to ]
Shameless plug:

https://mailfud.org/postpals/



On Fri, Dec 03, 2021 at 12:55:59PM +0000, Martin Gregorie wrote:
> For Dominic Raferd:
>
> Another approach also works for me: if you can automatically capture the
> addresses you've sent mail to, these addresses make a perfect, self-
> maintaining whitelist.
>
> If you're running Postfix then you can use its automatic BCC option to
> feed a copy of all mail, including outbound messages, whatever process
> you use to build a list of your mail recipients.?Other MTAs probably
> have a similar ability, but I don't use them, so can't comment further.
>
> A database makes a convenient place to keep the your correspondent list
> because discarding duplicate addresses then becomes a built-in facility
> and writing an SA plugin plus associated rule to interrogate the list
> and add negative points to the message is simple.
>
> My correspondent list is part of my mail archive, which is held as a
> PostgreSQL database. The associated functions I use to maintain and
> interrogate the correspondent list are:
>
> a) a BCC directive added to the Postfix configuration or the equivalent
> if you use a different MTA
> b) a Java application run each night to load the previous day's mail,
> both received and sent, into the database
> c) an SQL view that selects any message(s) in the archive that were sent
> to the address being checked
> d) a Perl plugin to execute the view using the message's sender as its
> search key and return TRUE if any messages were?selected
> e) an SA rule to trigger the Perl plugin and add a negative score
> if the Perl plugin returns TRUE
>
> You'd need code to implement all five functions, but if you store your
> correspondent address list as a sorted text file, then all the code
> would be much simplified:?
>
> - 'b' could be a Perl or awk script run as an additional 'logwatch'
> report that scans the previous day's part of the mail log, adds any
> new addresses to the sorted list
>
> - 'c' and 'd' could be combined as a single Perl plugin.
>
> Martin
>