Mailing List Archive

Question regarding Bayes from a newbie
Hi,

I apologize if this is a duplicate post. In my
/etc/mail/spamassassin/local.cf file I have only one entry, which is:

bayes_auto_learn 1

which I imagine that Bayes is learning? Any idea how to 'turn Bayes
on'? From what I see, this is constantly updating the files in
~/.spamassassin/bayes_*, but I don't think that spamassassin is
utilizing the rules it learned from these files.

Any help clarifying this would be greatly appreciated.

Best Regards,

Jason
Re: Question regarding Bayes from a newbie [ In reply to ]
I did, in fact notice that you need 200 mails of spam and ham. But how
do you know where Bayes is at with the amount of spam/ham it's analyzed?

Thanks,

Jason

Jean-Christophe Valiere wrote:

> Jason Novak wrote:
>
>> Hi,
>>
>> I apologize if this is a duplicate post. In my
>> /etc/mail/spamassassin/local.cf file I have only one entry, which is:
>>
>> bayes_auto_learn 1
>>
>> which I imagine that Bayes is learning? Any idea how to 'turn Bayes
>> on'? From what I see, this is constantly updating the files in
>> ~/.spamassassin/bayes_*, but I don't think that spamassassin is
>> utilizing the rules it learned from these files.
>>
>> Any help clarifying this would be greatly appreciated.
>>
>> Best Regards,
>>
>> Jason
>
>
> By default mail that get more than 12 pts or less than 0.1 pts are
> learned as spam or ham respectively.
> But to use the bayes database you need to have 200 mail of spam and
> ham (by default).
>
>
Re: Question regarding Bayes from a newbie [ In reply to ]
At 10:35 AM 2/17/2004, Jason Novak wrote:
>which I imagine that Bayes is learning? Any idea how to 'turn Bayes
>on'? From what I see, this is constantly updating the files in
>~/.spamassassin/bayes_*, but I don't think that spamassassin is utilizing
>the rules it learned from these files.
>
>Any help clarifying this would be greatly appreciated.

The bayes system must accrue 500 spam, and 500 ham (nonspam) emails in it's
learning before it gets used in message scoring.

Via autolearn alone this will take quite some time. It also produces a
sub-optimal bayes database (at least a small amount of manual learning is
desirable to produce a well balanced bayes db)

I'd recommend getting some emails and feeding them to sa-learn to manually
train it... You can get to 500 of each much faster this way, and since you
are manually classifying them, you can feed many kinds of spam and ham that
SA ordinarily would not autolearn from because it didn't score strongly
enough to cross the learning threshold.
Re: Question regarding Bayes from a newbie [ In reply to ]
At 11:05 AM 2/17/2004, Jason Novak wrote:
>I did, in fact notice that you need 200 mails of spam and ham. But how do
>you know where Bayes is at with the amount of spam/ham it's analyzed?

sa-learn --dump magic
Re: Question regarding Bayes from a newbie [ In reply to ]
Hi Matt,

I ran that...

What number(s) am I looking at?

0.000 0 2 0 non-token data: bayes db version
0.000 0 16 0 non-token data: nspam
0.000 0 232 0 non-token data: nham
0.000 0 13091 0 non-token data: ntokens
0.000 0 1076618946 0 non-token data: oldest atime
0.000 0 1077035539 0 non-token data: newest atime
0.000 0 0 0 non-token data: last journal
sync atime
0.000 0 0 0 non-token data: last expiry atime
0.000 0 0 0 non-token data: last expire
atime delta
0.000 0 0 0 non-token data: last expire
reduction count

Thanks,

Jason

Matt Kettler wrote:

> At 11:05 AM 2/17/2004, Jason Novak wrote:
>
>> I did, in fact notice that you need 200 mails of spam and ham. But
>> how do you know where Bayes is at with the amount of spam/ham it's
>> analyzed?
>
>
> sa-learn --dump magic
>
>
>
Re: Question regarding Bayes from a newbie [ In reply to ]
200 not 500.
{o.o}
----- Original Message -----
From: "Matt Kettler" <mkettler@evi-inc.com>


> At 10:35 AM 2/17/2004, Jason Novak wrote:
> >which I imagine that Bayes is learning? Any idea how to 'turn Bayes
> >on'? From what I see, this is constantly updating the files in
> >~/.spamassassin/bayes_*, but I don't think that spamassassin is utilizing
> >the rules it learned from these files.
> >
> >Any help clarifying this would be greatly appreciated.
>
> The bayes system must accrue 500 spam, and 500 ham (nonspam) emails in
it's
> learning before it gets used in message scoring.
>
> Via autolearn alone this will take quite some time. It also produces a
> sub-optimal bayes database (at least a small amount of manual learning is
> desirable to produce a well balanced bayes db)
>
> I'd recommend getting some emails and feeding them to sa-learn to manually
> train it... You can get to 500 of each much faster this way, and since you
> are manually classifying them, you can feed many kinds of spam and ham
that
> SA ordinarily would not autolearn from because it didn't score strongly
> enough to cross the learning threshold.
RE: Question regarding Bayes from a newbie [ In reply to ]
> -----Original Message-----
> From: Jason Novak [mailto:jason@sheffieldave.com]
> Sent: Tuesday, February 17, 2004 10:56 AM
> To: Matt Kettler
> Cc: Jean-Christophe Valiere; spamassassin-users@incubator.apache.org
> Subject: Re: Question regarding Bayes from a newbie
>
> What number(s) am I looking at?
>
> 0.000 0 2 0 non-token data: bayes
> db version
> 0.000 0 16 0 non-token data: nspam
> 0.000 0 232 0 non-token data: nham
> 0.000 0 13091 0 non-token data: ntokens
> 0.000 0 1076618946 0 non-token data: oldest atime
> 0.000 0 1077035539 0 non-token data: newest atime
> 0.000 0 0 0 non-token data: last journal
> sync atime
> 0.000 0 0 0 non-token data: last
> expiry atime
> 0.000 0 0 0 non-token data: last expire
> atime delta
> 0.000 0 0 0 non-token data: last expire
> reduction count

nspam = number of spam messages
nham = number of ham messages

-Joe
Re: Question regarding Bayes from a newbie [ In reply to ]
At 11:55 AM 2/17/2004, Jason Novak wrote:
>What number(s) am I looking at?

>0.000 0 16 0 non-token data: nspam
>0.000 0 232 0 non-token data: nham

Those two... this means that 232 ham messages have been trained, and 16
spam messages.