Mailing List Archive: auto-learn? no: too few head hits

auto-learn? no: too few head hits

Feb 16, 2004, 9:31 AM

Post #1 of 3 (634 views)

I found that many of my 20+ spams don't get auto-learned and ran a debug
on them. That's makes SA not auto-learn them:

debug: auto-learn? ham=0.1, spam=12, body-hits=10.755, head-hits=2.5
debug: auto-learn: currently using scoreset 3. recomputing score based on
scoreset 1.
debug: Score set 1 chosen.
debug: auto-learn: original score: 18.746, recomputed score: 18.941
debug: Score set 3 chosen.
debug: auto-learn? no: too few head hits (2.5 < 3)
debug: is spam? score=24.146 required=5

Did I miss a discussion? It's sometimes hard to follow the list because of
the many postings ;-) Can I adjust that? What's the reason for skipping
auto-learn if there are only a few head (I assume header?) hits?

Kai

--

Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com
IE-Center: http://ie5.de & http://msie.winware.org

Re: auto-learn? no: too few head hits [ In reply to ]

mkettler at evi-inc

Feb 16, 2004, 9:49 AM

Post #2 of 3 (628 views)

Permalink

At 11:31 AM 2/16/2004, Kai Schaetzl wrote:
>Did I miss a discussion? It's sometimes hard to follow the list because of
>the many postings ;-) Can I adjust that?

No, it's hard-coded in PerMsgStatus.pm

>What's the reason for skipping
>auto-learn if there are only a few head (I assume header?) hits?

This is kind of going with the "only auto-learn if you're fairly sure"
doctrine... Skipping autolearning a message is better than autolearning
something that you're unsure of:

It's also documented in the man pages:

bayes_auto_learn_threshold_spam n.nn(default: 12.0)
The score threshold above which a mail has to score, to be fed into
SpamAssassin's learning systems automatically as a spam message.

Note: SpamAssassin requires at least 3 points from the header, and 3 points
from the body to auto-learn as spam. Therefore, the minimum working value
for this option is 6.

Overall, SA will only auto-learn as spam if:
1) the score without bayes, whitelists, or userconf rules is over
bayes_auto_learn_threshold_spam
2) at least 3.0 of the score comes from the body
3) at least 3.0 of the score comes from the headers.
4) (SA 2.61 and higher) the bayes score doesn't contradict
previous learning about this message.
(ie: the score of the BAYES_* rule matched must not be
less than -1.0)

Re: auto-learn? no: too few head hits [ In reply to ]

maillists at conactive

Feb 16, 2004, 12:31 PM

Post #3 of 3 (621 views)

Permalink

Matt Kettler wrote on Mon, 16 Feb 2004 11:49:48 -0500:

> It's also documented in the man pages:
>

Ah, thanks!

Kai

--

Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com
IE-Center: http://ie5.de & http://msie.winware.org