Mailing List Archive

Disabling autolearn on given rule
Hi!

I recently noticed my bayes was rarely matching any spam, and it turns out this was due to
autolearn=ham'ing occurring on lots of list traffic that I only occasionally read, some of which was
blatant spam. Sadly, list traffic can be pretty hard to categorize and ends up getting through due
to good sending IP and domain reputation.

While correcting the filter through sa-learn solves this issue temporarily, I don't want to have to
always read lists that I previously only occasionally read just to re-classify spam. Thus, I'd like
to disable autolearn entirely for mails that match a given rule (eg MAILING_LIST_MULTI).

"tflags MAILING_LIST_MULTI noautolearn" doesn't seem like quite what I want, it just reduces the
score used to decide whether to learn. There's some old bugzilla mentions asking for this feature,
but it seems the response was "write a plugin". Is there a plugin available for this or how would
one go about writing one?

Thanks,
Matt
Re: Disabling autolearn on given rule [ In reply to ]
On 2021-09-21 22:11, Matt Corallo wrote:

> "tflags MAILING_LIST_MULTI noautolearn" doesn't seem like quite what I
> want, it just reduces the score used to decide whether to learn.
> There's some old bugzilla mentions asking for this feature, but it
> seems the response was "write a plugin". Is there a plugin available
> for this or how would one go about writing one?

https://spamassassin.apache.org/full/3.3.x/doc/Mail_SpamAssassin_Plugin_AutoLearnThreshold.html

set ham learn lower then default with that plugin

# bayes_auto_learn_threshold_nonspam n.nn (default: 0.1)

bayes_auto_learn_threshold_nonspam -10

# bayes_auto_learn_threshold_spam n.nn (default: 12.0)

bayes_auto_learn_threshold_spam 7.5

# bayes_auto_learn_on_error (0 | 1) (default: 0)

bayes_auto_learn_on_error 1
Re: Disabling autolearn on given rule [ In reply to ]
On 9/21/21 15:53, Benny Pedersen wrote:
> On 2021-09-21 22:11, Matt Corallo wrote:
>
>> "tflags MAILING_LIST_MULTI noautolearn" doesn't seem like quite what I
>> want, it just reduces the score used to decide whether to learn.
>> There's some old bugzilla mentions asking for this feature, but it
>> seems the response was "write a plugin". Is there a plugin available
>> for this or how would one go about writing one?
>
> https://spamassassin.apache.org/full/3.3.x/doc/Mail_SpamAssassin_Plugin_AutoLearnThreshold.html

None of these seem to accomplish disabling learning for a specific rule - I don't particularly want
to change the bayes learn thresholds, as I think they seem to work quite well for non-list mail. For
list mail, I'd prefer to disable the bayes learning entirely (though I suppose somehow magically
forcing it in between the bayes thresholds would work too, if there were a way to do that without
impacting non-bayes scoring).

Matt
Re: Disabling autolearn on given rule [ In reply to ]
> None of these seem to accomplish disabling learning for a specific rule

I think the problem is that I believe Bayes works off of the total score,
and probably only sees rule names as more tokens, if it sees them at all. If
it indeed works off the total score, about all you can do is somehow tweak
that score for a given rule or rule combination.

Loren


---
This email has been checked for viruses by AVG.
https://www.avg.com
Re: Disabling autolearn on given rule [ In reply to ]
On 9/21/21 18:01, Loren Wilton wrote:
>> None of these seem to accomplish disabling learning for a specific rule
>
> I think the problem is that I believe Bayes works off of the total score, and probably only sees
> rule names as more tokens, if it sees them at all. If it indeed works off the total score, about all
> you can do is somehow tweak that score for a given rule or rule combination.

Right, I expected roughly as much from the docs I could find. Two things, then:

(1) maybe time to revisit the old discussions of providing this as a default feature?,
(2) where would I go to look at building a plugin for this? Ideally something that ends up upstream,
but though I can write code, I know no perl :).

Matt
Re: Disabling autolearn on given rule [ In reply to ]
> (2) where would I go to look at building a plugin for this? Ideally
> something that ends up upstream, but though I can write code, I know no
> perl :).

Well, from the few I've seen, they all seem to have a relatively constant
structure. Someone pointed you to a plugin that is at least dealing in this
general area, that might be a good starting point, barring anyone else
having a better suggestion.

While I wrote a little Perl a decade ago I've forgotten many of the
pecularities, but there are some good web sites out there, and there is one
of the animal books on the subject. Perl is a bit pecular in syntax and
function compared to the C/C++ I did much of my career, but I didn't have
much trouble picking up enough to make some local SA hacks long ago, so if
you can program in most anything it probably won't be too much trouble.

I don't recall if Bayes itself is called from a plugin or from the main SA
code, but I'm pretty sure it is only called if an internal 'autolearn' token
is true for the message. If you make a plugin that runs late in the rule
evaluation it should be able to look at the score and rule hits and items in
the message header and body and decide if it wants to turn off the autolearn
flag for the message. Hopefully there isn't something in main SA code that
determines the value of this flag after all of the rules have run.

I guess one thing you might be able to do is implement a tflags flag of
absolutely_no_autolearn or some such that would force-disable the autolearn
decision if the rule had hit, but that might be something that would have to
be put into the main SA code itself. Maybe Henrick will chime in here. This
may be really trivial if you know where to look.

Loren


---
This email has been checked for viruses by AVG.
https://www.avg.com
Re: Disabling autolearn on given rule [ In reply to ]
On Tue, Sep 21, 2021 at 06:57:22PM -0700, Loren Wilton wrote:
>
> I guess one thing you might be able to do is implement a tflags flag of
> absolutely_no_autolearn or some such that would force-disable the autolearn
> decision if the rule had hit, but that might be something that would have to
> be put into the main SA code itself. Maybe Henrick will chime in here. This
> may be really trivial if you know where to look.

There is only "autolearn_force", though even it's not an absolute force..

https://spamassassin.apache.org/full/3.4.x/doc/Mail_SpamAssassin_Plugin_AutoLearnThreshold.html

I guess one could force shortcircuiting, so no learning will be done.
Re: Disabling autolearn on given rule [ In reply to ]
I think having a look at the code itself is a good idea. I'm not sure if
it's up-to-date but you can find some information on
https://cwiki.apache.org/confluence/display/SPAMASSASSIN/DevelopmentStuff

I've found that just reporting issues on SA's bugzilla is completely
useless since it's just used as a fancy interface to display email
conversations of the development list. Newly reported bugs or issues
often go ignored by email and their status is never changed since no one
uses the interface to manage bugs, this means that bugzilla is filled to
the brim with hundreds of bugs marked as new, of which some are actual
bugs and large parts are just questions or fixed problems that were
never closed. Bugzilla is also very buggy, for example when I press "my
bugs", I get a list of 373 bugs, some predating the existence of my
account, and obviously I didn't take part in the discussion of almost
all of them. So keep in mind that Bugzilla can be untrustworthy and that
the dev mailing list mentioned on
https://cwiki.apache.org/confluence/display/SPAMASSASSIN/mailinglists is
connected to that.

If you're planning to work on the Bayes plugin, I can tell you there are
several problems with it I've reported in the past that have gone ignored:
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7904
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7905
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7906
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7907
I assume many others have also reported valid bugs, but they can be hard
to find between the many questions that have been asked on
https://bz.apache.org/SpamAssassin/buglist.cgi?quicksearch=bayes&list_id=34478
and I'm also not too sure we can trust the search functionality.

I hope I'm not passing on too much of a negative message. It would be
great of someone had a look at the Bayes autolearn code. I think it
would be a great service to the community!

Bert

On 22/09/2021 03:29, Matt Corallo wrote:
>
>
> On 9/21/21 18:01, Loren Wilton wrote:
>>> None of these seem to accomplish disabling learning for a specific rule
>>
>> I think the problem is that I believe Bayes works off of the total
>> score, and probably only sees rule names as more tokens, if it sees
>> them at all. If it indeed works off the total score, about all you
>> can do is somehow tweak that score for a given rule or rule combination.
>
> Right, I expected roughly as much from the docs I could find. Two
> things, then:
>
> (1) maybe time to revisit the old discussions of providing this as a
> default feature?,
> (2) where would I go to look at building a plugin for this? Ideally
> something that ends up upstream, but though I can write code, I know
> no perl :).
>
> Matt
Re: Disabling autolearn on given rule [ In reply to ]
On Wed, Sep 22, 2021 at 10:45:43AM +0200, Bert Van de Poel wrote:
>
> I hope I'm not passing on too much of a negative message. It would be great
> of someone had a look at the Bayes autolearn code. I think it would be a
> great service to the community!

The fact is that there really aren't any active developers around these
days. We are no different from any other semi-active open source project.
I can only give so much of personal free time to "service the community".
The community is supposed to try to take care of itself, so where are all
the volunteers? :-) Doing Perl is not rocket science, but getting familiar
with SA internals can be daunting. I can help with that, but someone needs
to step up with decend effort.
Re: Disabling autolearn on given rule [ In reply to ]
This is complete news to me! Based on the activity on the dev list, I
had assumed there were still 10-20 people devoting some of their time to
developing SA. If you are the only one, that of course changes my view
very much, and would be something worth communicating in some spot. When
I asked about my Bayes bugs in this list a long time ago, I also got
very mixed responses on whether my suggested solutions to the bugs I
found through discussion on the list were actually the right ones, so I
filed those bugs specifically to get feedback on whether my solutions
were deemed acceptable by SA developers (assuming there was a whole team
working on SA either in the evenings or as part of their job at a
company that heavily uses SA). If the idea is that bugs will most
probably never get resolved except if you write and submit patches to
solve them, that's completely understandable if there are barely any
developers or maintainers, but then people have to be told of course.

Maybe it would then also be a good idea to start some kind of bug review
project, similar to how projects like Inkscape have been asking their
community to retest *all* bugs, where members from the mailing list and
other SA users are encouraged to go through a few bugs at a time,
starting with the very oldest ones, to check whether they're still valid
and otherwise close them. There are currently 373 unresolved bugs on
bugzilla (if that counter can be trusted, it's the same amount of bugs I
get under "my bugs", which seems suspicious), I wouldn't be surprised if
over half of those were questions or about things that have long been
resolved or become irrelevant. For example, I'm guessing
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=5679 can be closed
since if this problem had persisted, there would be a ton of reports of
those still ongoing.

What do you think?

I would also like to point out, as a sort of PS, that while I do
understand that Perl isn't rocket science, there is quite a barrier due
to Perl's reputation and the decreasing number of people with experience
in Perl. If I'm brutally honest, I would have probably already fixed
those 4 bugs I reported myself if SA was on GitHub and written in
Python, since I could most probably read the code more easily, and
especially submit my changes more easily. I do understand that SA is
like that for historic reasons, and I don't think a rewrite would be
sensible at all, but I wouldn't underestimate how much of a deterrent
the combination of Perl, Bugzilla, SVN and email patch submission is for
new FOSS developers used to the newer languages and GitHub. I for one
have no idea how I would submit a fix to SA once I've written it, to
give a concrete example. I'm guessing I just paste the patch to a
Bugzilla comment and hope someone merges it?

Anyway, this is way offtopic for Matt's initial issue, but probably
still relevant since he's hoping to fix it himself.

On 22/09/2021 10:54, Henrik K wrote:
> On Wed, Sep 22, 2021 at 10:45:43AM +0200, Bert Van de Poel wrote:
>> I hope I'm not passing on too much of a negative message. It would be great
>> of someone had a look at the Bayes autolearn code. I think it would be a
>> great service to the community!
> The fact is that there really aren't any active developers around these
> days. We are no different from any other semi-active open source project.
> I can only give so much of personal free time to "service the community".
> The community is supposed to try to take care of itself, so where are all
> the volunteers? :-) Doing Perl is not rocket science, but getting familiar
> with SA internals can be daunting. I can help with that, but someone needs
> to step up with decend effort.
>
Re: Disabling autolearn on given rule [ In reply to ]
On Tue, 2021-09-21 at 18:57 -0700, Loren Wilton wrote:
>
> Well, from the few I've seen, they all seem to have a relatively
> constant structure. Someone pointed you to a plugin that is at least
> dealing in this having a better suggestion.
>
> While I wrote a little Perl a decade ago I've forgotten many of the
> pecularities, but there are some good web sites out there, and there
> is one of the animal books on the subject. Perl is a bit pecular in
> syntax and function compared to the C/C++ I did much of my career, but
> I didn't have much trouble picking up enough to make some local SA
> hacks long ago, so if you can program in most anything it probably
> won't be too much trouble.
>
What Loren said. The book you need in "The Camel Book": Its an O'Reilly
publication, "Programming Perl by Larry Wall, Tom Christiansen & Jon
Orwant - my copy is the 3rd edidtin, dated 2000, so there are probably
more recent editions. Its well written and organised and, equally
important, has a whole chapter on Perl regular expressions, which are
not the same as,e.g C or Java regexes.

I also know very little perl, but this book, together with an example SA
plugin, were enough to let me write an SA plugin for doing lookups on a
PostgreSQL database containing my mail archive I use this plugin to
whitelist mail from anywhere I've previously sent mail to).

Martin
Re: Disabling autolearn on given rule [ In reply to ]
Morning all,

So I'd recommend a different take. Autolearn is an abomination we never
should have published. It is, in effect, a switch to allow a inherent bias
in the modelling to grow and continue.

Disable autolearn, wipe your Bayes store, and manually train from hand
classified ham and spam. Oh, and use Redis for the backend store. The
difference is usually night and day.

Regards, KAM

On Wed, Sep 22, 2021, 06:18 Martin Gregorie <martin@gregorie.org> wrote:

> On Tue, 2021-09-21 at 18:57 -0700, Loren Wilton wrote:
> >
> > Well, from the few I've seen, they all seem to have a relatively
> > constant structure. Someone pointed you to a plugin that is at least
> > dealing in this having a better suggestion.
> >
> > While I wrote a little Perl a decade ago I've forgotten many of the
> > pecularities, but there are some good web sites out there, and there
> > is one of the animal books on the subject. Perl is a bit pecular in
> > syntax and function compared to the C/C++ I did much of my career, but
> > I didn't have much trouble picking up enough to make some local SA
> > hacks long ago, so if you can program in most anything it probably
> > won't be too much trouble.
> >
> What Loren said. The book you need in "The Camel Book": Its an O'Reilly
> publication, "Programming Perl by Larry Wall, Tom Christiansen & Jon
> Orwant - my copy is the 3rd edidtin, dated 2000, so there are probably
> more recent editions. Its well written and organised and, equally
> important, has a whole chapter on Perl regular expressions, which are
> not the same as,e.g C or Java regexes.
>
> I also know very little perl, but this book, together with an example SA
> plugin, were enough to let me write an SA plugin for doing lookups on a
> PostgreSQL database containing my mail archive I use this plugin to
> whitelist mail from anywhere I've previously sent mail to).
>
> Martin
>
>
>
>
Re: Disabling autolearn on given rule [ In reply to ]
On 2021-09-22 14:11, Kevin A. McGrail wrote:
> Morning all,
>
> So I'd recommend a different take. Autolearn is an abomination we
> never should have published. It is, in effect, a switch to allow a
> inherent bias in the modelling to grow and continue.
>
> Disable autolearn, wipe your Bayes store, and manually train from hand
> classified ham and spam. Oh, and use Redis for the backend store. The
> difference is usually night and day.

tflag nice

should not be used on negative scores

if that rule is part of the problem with to much autolearn :/

we all have to live with badness sometimes, but i have posted how to
reduce madness, users can listen or come up with another solution other
then disable autolearn ?

its 2021 we still need things solved in spamassassin, or was it
mimedefang ? :=)

i dont agre on redis is better then postgresql btw

remember spamassassin is open source and that means anyone can do what
thay like with it, perfekt bummer :)
Re: Disabling autolearn on given rule [ In reply to ]
On 21.09.21 13:11, Matt Corallo wrote:
>I recently noticed my bayes was rarely matching any spam, and it turns
>out this was due to autolearn=ham'ing occurring on lots of list
>traffic that I only occasionally read, some of which was blatant spam.
>Sadly, list traffic can be pretty hard to categorize and ends up
>getting through due to good sending IP and domain reputation.
>
>While correcting the filter through sa-learn solves this issue
>temporarily, I don't want to have to always read lists that I
>previously only occasionally read just to re-classify spam. Thus, I'd
>like to disable autolearn entirely for mails that match a given rule
>(eg MAILING_LIST_MULTI).

unfortunately there are no common rules designed to autolearn ham (wonder
why? :-), thus ham autolearning depends on a few negative scores, of which
most are DNS allowlists.

I use to mark them all as noautolearn because many business notifications
too close to spam hot autolearned.

>"tflags MAILING_LIST_MULTI noautolearn" doesn't seem like quite what I
>want, it just reduces the score used to decide whether to learn.

"tflags MAILING_LIST_MULTI noautolearn" means that score of
"MAILING_LIST_MULTI" won't be used tor autolearn decision.
It does not mean that mail hitting MAILING_LIST_MULTI won't be used for
autolearn.

>There's some old bugzilla mentions asking for this feature, but it
>seems the response was "write a plugin". Is there a plugin available
>for this or how would one go about writing one?



--
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Despite the cost of living, have you noticed how popular it remains?
Re: Disabling autolearn on given rule [ In reply to ]
On 9/22/2021 8:11 AM, Kevin A. McGrail wrote:
> Morning all,
>
> So I'd recommend a different take.  Autolearn is an abomination we
> never should have published.  It is, in effect, a switch to allow a
> inherent bias in the modelling to grow and continue.
>

Agreed, predictable Garbage Out (FP) becomes Cascading Garbage Out.

> Disable autolearn, wipe your Bayes store, and manually train from hand
> classified ham and spam.

1000% Correct, IMO.  If you must run Bayes, train it once and leave it
be.  Repeat as needed.

> Regards, KAM
>
-- Jared Hall

*
*
Re: Disabling autolearn on given rule [ In reply to ]
>On 9/22/2021 8:11 AM, Kevin A. McGrail wrote:
>>So I'd recommend a different take.? Autolearn is an abomination we
>>never should have published.? It is, in effect, a switch to allow a
>>inherent bias in the modelling to grow and continue.

On 22.09.21 10:39, Jared Hall wrote:
>Agreed, predictable Garbage Out (FP) becomes Cascading Garbage Out.

>>Disable autolearn, wipe your Bayes store, and manually train from
>>hand classified ham and spam.

>1000% Correct, IMO.? If you must run Bayes, train it once and leave it
>be.? Repeat as needed.

I noticed a few that repeated spam gets finally trained and gets BAYES_99.

the main problem is lack of safe rules with negative scores.

of course, nothing defeats manual training.
--
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
(R)etry, (A)bort, (C)ancer
Re: Disabling autolearn on given rule [ In reply to ]
On 2021-09-22 at 05:19:48 UTC-0400 (Wed, 22 Sep 2021 11:19:48 +0200)
Bert Van de Poel <bert@ulyssis.org>
is rumored to have said:

> I for one have no idea how I would submit a fix to SA once I've
> written it, to give a concrete example. I'm guessing I just paste the
> patch to a Bugzilla comment and hope someone merges it?

Actual attachments of patch files to a bug report is vastly preferred
over pasting into a BZ comment. There is some ASF overhead, as anything
significant requires a contributor to submit a standard "Individual
Contributor License Agreement" to the ASF Secretary, which takes all of
about 15 minutes. Contributions of larger functional enhancements that
are not just to address specific bugs can also be discussed and
submitted via the dev list, which is entirely open to the public just
like this list. Anyone making ongoing contributions (code or otherwise)
is likely to be invited to become a committer. We work on a 'commit then
review' model except when in the last stage of release prep, so if you
don't watch the commit stream you won't see much of the activity that
isn't discussed actively.

In my opinion, the low pace of activity in the SA project is organic. SA
is mature software whose core code has been "good enough" for widespread
use for a long time. As a result there is not a lot of quick-hit
development work to be done on it. There's not a lot of places for
people to get started working on the SA code where one can see
meaningful improvement in a short time, outside of rule development.
Henrik is by far the most active member of the project as far as
non-rule code contributions in the recent past, but he is not alone.
John is doing rule commits daily. There are about a half-dozen
committers who have made commits in 2021. SA is (and always has been) a
*community* project without a major corporate backer. As such, it is
fully dependent on the capacity of *the community* to maintain it.
Everyone reading this is potentially part of that capacity. Anyone who
wants something fixed in or added to SA needs to be involved in getting
it done, even if all that means is poking us here or opening a bug
report and bumping it as needed if everyone ignores it. MAYBE it also
means designing and implementing a fix. That's the nature of the
project. No one (as far as I know) is funded specifically to work on it
on an ongoing basis.


--
Bill Cole
bill@scconsult.com or billcole@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire