Mailing List Archive

Pre-processor for spamassassin
Hi,

I am in the process of writing a pre-processor for Spamassassin. It would
be a pre-processor because I do not read or write Perl.

The I idea would be to analyse the each email and based on the analysis add
extra fields to the email header before passing the email to spamassassin
to do its thing.

My first questions is, will SA detect these new headers and use them as part
of its analysis?

Assuming the above is true, I have a couple of options:

1) Always add each new header with a score (which I do not think would
be very effective).
2) Only add a new header if the detected feature is probably spam.


Any hints would be appreciated.

Cheers,
Erik
--
----------------------------------------------------------------------
Erik de Castro Lopo
http://www.mega-nerd.com/
Re: Pre-processor for spamassassin [ In reply to ]
On 2023-10-08 at 03:38:00 UTC-0400 (Sun, 8 Oct 2023 18:38:00 +1100)
Erik de Castro Lopo <mle+tools@mega-nerd.com>
is rumored to have said:

> Hi,
>
> I am in the process of writing a pre-processor for Spamassassin. It
> would
> be a pre-processor because I do not read or write Perl.

That would be a solid reason not to attempt a SA 'plugin' but you really
should be considering whether the analysis you are trying to do can be
implemented as local custom rules. SA rules do not require knowledge of
Perl.

> The I idea would be to analyse the each email and based on the
> analysis add
> extra fields to the email header before passing the email to
> spamassassin
> to do its thing.
>
> My first questions is, will SA detect these new headers and use them
> as part
> of its analysis?

SA has access to the entire message. SA rules can examine any header,
but SA won't do anything more than treat an arbitrary header as a series
of meaningless tokens for Bayesian classification unless you add rules
that specifically interpret those headers.

> Assuming the above is true, I have a couple of options:
>
> 1) Always add each new header with a score (which I do not think
> would
> be very effective).

SA does not look for 'scores' in headers, as it has no way to know what
a score might look like or mean and it can't *generally* trust anything
inside a message as meaning what it claims to mean, e.g. you can't just
send mail with a "X-Spam-Score: -200" and expect SA to treat that as a
score. For SA to interpret the content of a header as justifying a
score, it needs rules that interpret it.

> 2) Only add a new header if the detected feature is probably spam.

If you are absolutely set on the design concept of putting something in
front of SA, that's probably better. OR, if this is intended mainly to
protect non-spam, only tag that. In any case, you'll need custom SA
rules to understand any sort of meaning in what you add.

Depending on the specific sort of analysis you are doing, it may be
feasible to do it with a construct of SA rules, and that would avoid the
housekeeping issues of how to integrate a 'preprocessor' with your
existing MTA and whatever yopu're using as 'glue' for SA.
(content_filter script, spamass-milter, MIMEDefang, etc.)


--
Bill Cole
bill@scconsult.com or billcole@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire
Re: Pre-processor for spamassassin [ In reply to ]
Bi Bill, thanks for your reply.

Bill Cole wrote:

> Depending on the specific sort of analysis you are doing, it may be
> feasible to do it with a construct of SA rules, and that would avoid the
> housekeeping issues of how to integrate a 'preprocessor' with your
> existing MTA and whatever yopu're using as 'glue' for SA.
> (content_filter script, spamass-milter, MIMEDefang, etc.)

I am running SA from Procmail. Integrating my analysis is rather easy for
this case. I also think its not possible to do this purely as SA rules
because my analysis is statistical and requires the use of an external
databse.

> If you are absolutely set on the design concept of putting something in
> front of SA, that's probably better. OR, if this is intended mainly to
> protect non-spam, only tag that. In any case, you'll need custom SA
> rules to understand any sort of meaning in what you add.

Its not so much that I am set on a design concept as that I because there
is a learning step, that looks at Spam and Ham (much like SA iteself)
and then butilds a database which is used during the analysts of each
input email.

I have used SA rules before and I think the best option is to add headers
to email (tagging both spam and ham) and then using SA rules to detect
these new headers.

Thanks Bill, your response helped me clarify my direction.

Cheers,
Erik
--
----------------------------------------------------------------------
Erik de Castro Lopo
http://www.mega-nerd.com/