Mailing List Archive

spamtrap strategies
Hi all,

I'm wanting to setup a spam trap, that should receive nothing but actual
spam, and feed that into spamassassin in some way. I'm wondering the
best way to automate feeding that data back to the system.

Would it be best used for bayes tuning? It seems not, because it would
be 100% spam. Would it be better to use it for mass-check and contribute
some to the overall rule scoring? Or would it be better to just build
some kind of RBL out of whatever it receives?

Thanks for any ideas/suggestions!

--
micah
Re: spamtrap strategies [ In reply to ]
On Fri, 15 May 2020 17:58:17 -0400
micah anderson wrote:

> Hi all,
>
> I'm wanting to setup a spam trap, that should receive nothing but
> actual spam, and feed that into spamassassin in some way. I'm
> wondering the best way to automate feeding that data back to the
> system.
>
> Would it be best used for bayes tuning? It seems not, because it would
> be 100% spam.

As long as there is ham from other sources and it doesn't ruin
token retention, it shouldn't be a problem. Ideally you would only
feed spam that doesn't reach BAYES_99 and is low-scoring.

> Would it be better to use it for mass-check and
> contribute some to the overall rule scoring?

If you use it for Bayes or mass-checks I'd suggest not relaxing any
pre-SpamAssassin checks. Some people do that to keep the numbers up,
but optimizing around spam that doesn't reach SpamAssassin seems like a
bad idea to me.
Re: spamtrap strategies [ In reply to ]
RW <rwmaillists@googlemail.com> writes:

>> I'm wanting to setup a spam trap, that should receive nothing but
>> actual spam, and feed that into spamassassin in some way. I'm
>> wondering the best way to automate feeding that data back to the
>> system.
>>
>> Would it be best used for bayes tuning? It seems not, because it would
>> be 100% spam.
>
> As long as there is ham from other sources and it doesn't ruin
> token retention, it shouldn't be a problem. Ideally you would only
> feed spam that doesn't reach BAYES_99 and is low-scoring.

That is the problem, our bayes database is not well fed. Its a global
database, and even with trusted 'feeders', it would drift fairly
the wrong way because usually people only trained with spam that did not
get caught, and didn't feel comfortable using their ham.

I've considered the idea of creating a per-user bayes dbs, but then I
couldn't use a spam-trap's caught spam to train all of those dbs,
because I wouldn't really have a clear idea of if those individual bayes
dbs were getting any ham.

>> Would it be better to use it for mass-check and contribute some to
>> the overall rule scoring?
>
> If you use it for Bayes or mass-checks I'd suggest not relaxing any
> pre-SpamAssassin checks. Some people do that to keep the numbers up,
> but optimizing around spam that doesn't reach SpamAssassin seems like a
> bad idea to me.

Each of the mails is 100% spam, so what I'd like to do is have an
automated way to tune my rule scoring, or improve/add rules based on
what gets sent there.

If I have to manually inspect each message by hand, and manually craft
rules, then it doesn't seem like this will scale very well at all.

--
micah
Re: spamtrap strategies [ In reply to ]
On Sat, 16 May 2020 09:26:23 -0400
micah anderson wrote:

> RW <rwmaillists@googlemail.com> writes:

> >> Would it be better to use it for mass-check and contribute some to
> >> the overall rule scoring?
> >
> > If you use it for Bayes or mass-checks I'd suggest not relaxing any
> > pre-SpamAssassin checks. Some people do that to keep the numbers up,
> > but optimizing around spam that doesn't reach SpamAssassin seems
> > like a bad idea to me.
>
> Each of the mails is 100% spam,

or backscatter presumably.

Just to be clear, by "pre-SpamAssassin checks" I meant checks that
would be run on the MTA before mail is passed to SpamAssassin, e.g.
IP blocklists, rDNS etc.

> so what I'd like to do is have an
> automated way to tune my rule scoring, or improve/add rules based on
> what gets sent there.

You can't expect to do that without a large ham corpus.