Mailing List Archive

txrep_autolearn range - how does the range influence autolearning
Hi guys,

I’m currently configuring a new setup for passing through all emails, and I opted for SA as my filtering – one thing I also configured are txrep ( https://cwiki.apache.org/confluence/display/SPAMASSASSIN/TxRep )

One thing I saw in the docs is that “txrep_autolearn” is a range between 0 and 5 – 0 meaning it’s disabled.

Now, my question is, what effect does the number have? I’d first have thought that it was simply a Boolean to turn it on or off.
It (sadly) doesn’t seem to be really documented what a higher or lower value results in (other than 0 disables it).

I’ve trained my filter with sa-learn with a quite large chunk of emails (both spam and ham), which is why I also want to enable autolearning of txrep – I just ideally want to figure out prior to doing that, what effect the given numbers have on the autolearning process.

I’d be very grateful if anyone got experience with it!

Just for the sake of it, I’m using SA 3.4.6 on Debian 10 currently (Not that I think it really matters in this case).


Thanks in advance!

Best Regards,
Lucas Rolff
Re: txrep_autolearn range - how does the range influence autolearning [ In reply to ]
Lucas Rolff <lucas@lucasrolff.com> writes:

> I’m currently configuring a new setup for passing through all emails,
> and I opted for SA as my filtering – one thing I also configured are
> txrep ( https://cwiki.apache.org/confluence/display/SPAMASSASSIN/TxRep
> )
>
> One thing I saw in the docs is that “txrep_autolearn” is a range between 0 and 5 – 0 meaning it’s disabled.
>
> Now, my question is, what effect does the number have? I’d first have thought that it was simply a Boolean to turn it on or off.
> It (sadly) doesn’t seem to be really documented what a higher or lower value results in (other than 0 disables it).

Unfortunately my suggestion is to read the sources.

> I’ve trained my filter with sa-learn with a quite large chunk of
> emails (both spam and ham), which is why I also want to enable
> autolearning of txrep – I just ideally want to figure out prior to
> doing that, what effect the given numbers have on the autolearning
> process.

* Make sure to use -L with sa-learn if you are using txrep, because
otherwise there is full eval including DNSBL queries. Do not believe
the text in the sa-learn(1) because it was about the bayes module
only, AIUI.

* sa-learn will train txrep

* txrep outgoing is really useful

* Conventional wisdom seems to be that autolearn is dangerous in terms
of getting thfings wrong and if you are running sa-learn on ham/spam
folders, I don't see much point. However people need to refile
mis-filed spam into a spam folder and mis-filed ham back into a ham
folder, and you need to be clear on what Trash means. In my world,
Trash is ham, and any spam that squeaks by is put into spam.manual
from whence it is learned.
Re: txrep_autolearn range - how does the range influence autolearning [ In reply to ]
On Sun, 16 May 2021 15:28:43 +0000
Lucas Rolff wrote:

> Hi guys,
>
> I’m currently configuring a new setup for passing through all emails,
> and I opted for SA as my filtering – one thing I also configured are
> txrep (
> https://cwiki.apache.org/confluence/display/SPAMASSASSIN/TxRep )
>
> One thing I saw in the docs is that “txrep_autolearn” is a range
> between 0 and 5 – 0 meaning it’s disabled.
>
> Now, my question is, what effect does the number have? I’d first have
> thought that it was simply a Boolean to turn it on or off. It (sadly)
> doesn’t seem to be really documented what a higher or lower value
> results in (other than 0 disables it).

I think it probably is just a boolean. The documentation for TxRep is
less accurate than Braveheart.


> I’ve trained my filter with sa-learn with a quite large chunk of
> emails (both spam and ham), which is why I also want to enable
> autolearning of txrep

That's why you shouldn't use autolearning. Autolearning is something
that should be used as a last resort. The affect of mistraining on
TxRep is potentially much worse than its affect on Bayes.

TxRep will still do its score averaging without it.
Re: txrep_autolearn range - how does the range influence autolearning [ In reply to ]
Thanks to both of you!

> Unfortunately my suggestion is to read the sources.

My Perl skills sadly isn't too great, what I did gather from it though, is that it doesn't really seem to use the various values to anything, just checking whether it's between 0 and 5 for the config, but rest of it, doesn't seem to do anything other than acting as a Boolean - obviously that might be me that's reading it wrong.

Thanks for the notes about sa-learn, txrep outgoing and the autolearn itself.
In my particular case, I'll only use it as an inbound filter, since I handle outbound very differently (I let other people take care of the filtering using an external relay); For inbound I've used a commercial solution for years, they sadly decided to 5x the cost starting 2022, which then doesn't really make it worth it anymore, so time to change!

I may even just leave autolearn off, and just adjust things if I see false positives overall

To my surprise things are already having a rather low number of false positives (And those false positives are often in fact things that should be fixed by senders anyway).

Cheers!

- Lucas Rolff

?On 16/05/2021, 19.41, "RW" <rwmaillists@googlemail.com> wrote:

On Sun, 16 May 2021 15:28:43 +0000
Lucas Rolff wrote:

> Hi guys,
>
> I’m currently configuring a new setup for passing through all emails,
> and I opted for SA as my filtering – one thing I also configured are
> txrep (
> https://cwiki.apache.org/confluence/display/SPAMASSASSIN/TxRep )
>
> One thing I saw in the docs is that “txrep_autolearn” is a range
> between 0 and 5 – 0 meaning it’s disabled.
>
> Now, my question is, what effect does the number have? I’d first have
> thought that it was simply a Boolean to turn it on or off. It (sadly)
> doesn’t seem to be really documented what a higher or lower value
> results in (other than 0 disables it).

I think it probably is just a boolean. The documentation for TxRep is
less accurate than Braveheart.


> I’ve trained my filter with sa-learn with a quite large chunk of
> emails (both spam and ham), which is why I also want to enable
> autolearning of txrep

That's why you shouldn't use autolearning. Autolearning is something
that should be used as a last resort. The affect of mistraining on
TxRep is potentially much worse than its affect on Bayes.

TxRep will still do its score averaging without it.
Re: txrep_autolearn range - how does the range influence autolearning [ In reply to ]
On Sun, 16 May 2021 13:36:34 -0400
Greg Troxel wrote:

>
> * txrep outgoing is really useful

Did you find a reason why that's right?

As I said before, my understanding is that it updates a reputation that
only gets used on incoming mail that passes neither spf nor dkim.

In other words it adds a negative score on condition the mail does
*not* authenticate. IMO that makes it somewhere between useless and
dangerous.
Re: txrep_autolearn range - how does the range influence autolearning [ In reply to ]
Lucas Rolff <lucas@lucasrolff.com> writes:

> Thanks for the notes about sa-learn, txrep outgoing and the autolearn itself.
> In my particular case, I'll only use it as an inbound filter, since I
> handle outbound very differently (I let other people take care of the
> filtering using an external relay); For inbound I've used a commercial
> solution for years, they sadly decided to 5x the cost starting 2022,
> which then doesn't really make it worth it anymore, so time to change!

It's unfortunate that you can't use it on tx. On outbound, all it does
keep track of who mail was sent to, and that causes it to get better
treatment on inbound.

So probably if you can arrange to a get a log feed somehow of the
outbound and write something to adjust the database you can get better
results.

In particular, as you give negative points to mail that is a reverse
match to outbound mail, you can turn up the agressiveness knob with
somewhat less trouble.
Re: txrep_autolearn range - how does the range influence autolearning [ In reply to ]
On Sun, 16 May 2021 16:50:57 -0400
Greg Troxel wrote:

> Lucas Rolff <lucas@lucasrolff.com> writes:
>
> > Thanks for the notes about sa-learn, txrep outgoing and the
> > autolearn itself. In my particular case, I'll only use it as an
> > inbound filter, since I handle outbound very differently (I let
> > other people take care of the filtering using an external relay);
> > For inbound I've used a commercial solution for years, they sadly
> > decided to 5x the cost starting 2022, which then doesn't really
> > make it worth it anymore, so time to change!
>
> It's unfortunate that you can't use it on tx. On outbound, all it
> does keep track of who mail was sent to, and that causes it to get
> better treatment on inbound.
...
> In particular, as you give negative points to mail that is a reverse
> match to outbound mail, you can turn up the agressiveness knob with
> somewhat less trouble.

As I said I don't think this will work well.

If you really want to do it you would need to set

auto_whitelist_distinguish_signed 0
txrep_spf 0

Turning off SPF support really should be the default until TxRep's
broken handling of it is fixed.

If I used TxRep I'd rather have DKIM support than outgoing learning.

It's not a particularly good way of doing whitelisting anyway because
it doesn't involve any authentication.
Re: txrep_autolearn range - how does the range influence autolearning [ In reply to ]
Even for only inbound, do you suggest disabling txrep_spf there as well, or only particularly important for outbound?

- Lucas


?On 17/05/2021, 17.14, "RW" <rwmaillists@googlemail.com> wrote:

On Sun, 16 May 2021 16:50:57 -0400
Greg Troxel wrote:

> Lucas Rolff <lucas@lucasrolff.com> writes:
>
> > Thanks for the notes about sa-learn, txrep outgoing and the
> > autolearn itself. In my particular case, I'll only use it as an
> > inbound filter, since I handle outbound very differently (I let
> > other people take care of the filtering using an external relay);
> > For inbound I've used a commercial solution for years, they sadly
> > decided to 5x the cost starting 2022, which then doesn't really
> > make it worth it anymore, so time to change!
>
> It's unfortunate that you can't use it on tx. On outbound, all it
> does keep track of who mail was sent to, and that causes it to get
> better treatment on inbound.
...
> In particular, as you give negative points to mail that is a reverse
> match to outbound mail, you can turn up the agressiveness knob with
> somewhat less trouble.

As I said I don't think this will work well.

If you really want to do it you would need to set

auto_whitelist_distinguish_signed 0
txrep_spf 0

Turning off SPF support really should be the default until TxRep's
broken handling of it is fixed.

If I used TxRep I'd rather have DKIM support than outgoing learning.

It's not a particularly good way of doing whitelisting anyway because
it doesn't involve any authentication.
Re: txrep_autolearn range - how does the range influence autolearning [ In reply to ]
On Mon, 17 May 2021 15:32:48 +0000
Lucas Rolff wrote:

> Even for only inbound, do you suggest disabling txrep_spf there as
> well, or only particularly important for outbound?

For anything

TxRep treats the header "From" address as having been authenticated by
an SPF pass even if the pass came from an envelope address with
a completely different domain.