Mailing List Archive

When does autolearn kick in (REALLY)?
I have set the following bayes settings in /etc/mail/spamassassin/local.cf

bayes_auto_learn 1
bayes_path /var/local/spamassassin/bayes
bayes_auto_learn_threshold_spam 9.0

So according to me any message scoring above 9 should go into bayes as spam.
Now some messages with high scores are marked for training and others not,
and it does not have to do with the score alone. See these two where the
one with the lower score is marked for training:

X-Spam-Status: Yes, hits=17.8 required=5.0 tests=BAYES_99,CLICK_BELOW,
DCC_CHECK,HTML_60_70,HTML_FONTCOLOR_RED,HTML_FONT_INVISIBLE,
HTML_IMAGE_ONLY_04,HTML_MESSAGE,HTML_TITLE_EMPTY,MIME_HTML_NO_CHARSET,
MIME_HTML_ONLY,MIME_HTML_ONLY_MULTI,OBFUSCATING_COMMENT,
PRIORITY_NO_NAME autolearn=spam version=2.63

X-Spam-Status: Yes, hits=26.3 required=5.0 tests=ADVERT_CODE2,BAYES_99,
BIZ_TLD,CLICK_BELOW,DATE_SPAMWARE_Y2K,DCC_CHECK,HTML_FONT_BIG,
HTML_IMAGE_ONLY_06,HTML_IMAGE_RATIO_10,HTML_LINK_CLICK_HERE,
HTML_MESSAGE,MIME_HTML_ONLY,MIME_HTML_ONLY_MULTI,MISSING_MIMEOLE,
MISSING_OUTLOOK_NAME,RATWARE_EGROUPS,RAZOR2_CF_RANGE_51_100,
RAZOR2_CHECK autolearn=no version=2.63

What am I missing?

Thanks,
Danie
RE: When does autolearn kick in (REALLY)? [ In reply to ]
This bit in the spamassassin Wiki is informative:

http://wiki.spamassassin.org/w/AutolearningNotWorking

And from the docs

"Note that certain tests are ignored when determining whether a message
should be trained upon: - auto-whitelist (AWL) - rules with tflags set to
'learn' (the Bayesian rules) - rules with tflags set to 'userconf' (user
white/black-listing rules, etc)

Also note that auto-training occurs using scores from either scoreset 0 or
1, depending on what scoreset is used during message check. It is likely
that the message check and auto-train scores will be different."

and

"bayes_auto_learn_threshold_nonspam n.nn (default: 0.1)
The score threshold below which a mail has to score, to be fed into
SpamAssassin's learning systems automatically as a non-spam message.

bayes_auto_learn_threshold_spam n.nn (default: 12.0)
The score threshold above which a mail has to score, to be fed into
SpamAssassin's learning systems automatically as a spam message.

Note: SpamAssassin requires at least 3 points from the header, and 3
points from the body to auto-learn as spam. Therefore, the minimum working
value for this option is 6."

So the upshot is, you can can get highish scoring spam which aren't
auto-learnt.

Cheers,

Phil

---------------------------------------------
Phil Randal
Network Engineer
Herefordshire Council
Hereford, UK

> -----Original Message-----
> From: Danie Marais [mailto:danie.marais@attix5.com]
> Sent: 10 February 2004 11:39
> To: spamassassin-users@incubator.apache.org
> Subject: When does autolearn kick in (REALLY)?
>
>
> I have set the following bayes settings in
> /etc/mail/spamassassin/local.cf
>
> bayes_auto_learn 1
> bayes_path /var/local/spamassassin/bayes
> bayes_auto_learn_threshold_spam 9.0
>
> So according to me any message scoring above 9 should go into
> bayes as spam.
> Now some messages with high scores are marked for training
> and others not,
> and it does not have to do with the score alone. See these
> two where the
> one with the lower score is marked for training:
>
> X-Spam-Status: Yes, hits=17.8 required=5.0 tests=BAYES_99,CLICK_BELOW,
> DCC_CHECK,HTML_60_70,HTML_FONTCOLOR_RED,HTML_FONT_INVISIBLE,
>
> HTML_IMAGE_ONLY_04,HTML_MESSAGE,HTML_TITLE_EMPTY,MIME_HTML_NO_CHARSET,
> MIME_HTML_ONLY,MIME_HTML_ONLY_MULTI,OBFUSCATING_COMMENT,
> PRIORITY_NO_NAME autolearn=spam version=2.63
>
> X-Spam-Status: Yes, hits=26.3 required=5.0
> tests=ADVERT_CODE2,BAYES_99,
> BIZ_TLD,CLICK_BELOW,DATE_SPAMWARE_Y2K,DCC_CHECK,HTML_FONT_BIG,
> HTML_IMAGE_ONLY_06,HTML_IMAGE_RATIO_10,HTML_LINK_CLICK_HERE,
> HTML_MESSAGE,MIME_HTML_ONLY,MIME_HTML_ONLY_MULTI,MISSING_MIMEOLE,
> MISSING_OUTLOOK_NAME,RATWARE_EGROUPS,RAZOR2_CF_RANGE_51_100,
> RAZOR2_CHECK autolearn=no version=2.63
>
> What am I missing?
>
> Thanks,
> Danie
>
Re: When does autolearn kick in (REALLY)? [ In reply to ]
Ah - that makes more sense!

Thanks, Phil



> This bit in the spamassassin Wiki is informative:
>
> http://wiki.spamassassin.org/w/AutolearningNotWorking
>
> And from the docs
>
> "Note that certain tests are ignored when determining whether a message
> should be trained upon: - auto-whitelist (AWL) - rules with tflags set
to
> 'learn' (the Bayesian rules) - rules with tflags set to 'userconf' (user
> white/black-listing rules, etc)
>
> Also note that auto-training occurs using scores from either scoreset 0 or
> 1, depending on what scoreset is used during message check. It is likely
> that the message check and auto-train scores will be different."
>
> and
>
> "bayes_auto_learn_threshold_nonspam n.nn (default: 0.1)
> The score threshold below which a mail has to score, to be fed into
> SpamAssassin's learning systems automatically as a non-spam message.
>
> bayes_auto_learn_threshold_spam n.nn (default: 12.0)
> The score threshold above which a mail has to score, to be fed into
> SpamAssassin's learning systems automatically as a spam message.
>
> Note: SpamAssassin requires at least 3 points from the header, and 3
> points from the body to auto-learn as spam. Therefore, the minimum
working
> value for this option is 6."
>
> So the upshot is, you can can get highish scoring spam which aren't
> auto-learnt.
>
> Cheers,
>
> Phil
>
> ---------------------------------------------
> Phil Randal
> Network Engineer
> Herefordshire Council
> Hereford, UK
>
> > -----Original Message-----
> > From: Danie Marais [mailto:danie.marais@attix5.com]
> > Sent: 10 February 2004 11:39
> > To: spamassassin-users@incubator.apache.org
> > Subject: When does autolearn kick in (REALLY)?
> >
> >
> > I have set the following bayes settings in
> > /etc/mail/spamassassin/local.cf
> >
> > bayes_auto_learn 1
> > bayes_path /var/local/spamassassin/bayes
> > bayes_auto_learn_threshold_spam 9.0
> >
> > So according to me any message scoring above 9 should go into
> > bayes as spam.
> > Now some messages with high scores are marked for training
> > and others not,
> > and it does not have to do with the score alone. See these
> > two where the
> > one with the lower score is marked for training:
> >
> > X-Spam-Status: Yes, hits=17.8 required=5.0 tests=BAYES_99,CLICK_BELOW,
> > DCC_CHECK,HTML_60_70,HTML_FONTCOLOR_RED,HTML_FONT_INVISIBLE,
> >
> > HTML_IMAGE_ONLY_04,HTML_MESSAGE,HTML_TITLE_EMPTY,MIME_HTML_NO_CHARSET,
> > MIME_HTML_ONLY,MIME_HTML_ONLY_MULTI,OBFUSCATING_COMMENT,
> > PRIORITY_NO_NAME autolearn=spam version=2.63
> >
> > X-Spam-Status: Yes, hits=26.3 required=5.0
> > tests=ADVERT_CODE2,BAYES_99,
> > BIZ_TLD,CLICK_BELOW,DATE_SPAMWARE_Y2K,DCC_CHECK,HTML_FONT_BIG,
> > HTML_IMAGE_ONLY_06,HTML_IMAGE_RATIO_10,HTML_LINK_CLICK_HERE,
> > HTML_MESSAGE,MIME_HTML_ONLY,MIME_HTML_ONLY_MULTI,MISSING_MIMEOLE,
> > MISSING_OUTLOOK_NAME,RATWARE_EGROUPS,RAZOR2_CF_RANGE_51_100,
> > RAZOR2_CHECK autolearn=no version=2.63
> >
> > What am I missing?
> >
> > Thanks,
> > Danie
> >