Mailing List Archive: Question regarding Bayes from a newbie

Question regarding Bayes from a newbie

Feb 17, 2004, 8:35 AM

Post #1 of 8 (965 views)

Hi,

I apologize if this is a duplicate post. In my
/etc/mail/spamassassin/local.cf file I have only one entry, which is:

bayes_auto_learn 1

which I imagine that Bayes is learning? Any idea how to 'turn Bayes
on'? From what I see, this is constantly updating the files in
~/.spamassassin/bayes_*, but I don't think that spamassassin is
utilizing the rules it learned from these files.

Any help clarifying this would be greatly appreciated.

Best Regards,

Jason

Re: Question regarding Bayes from a newbie [ In reply to ]

jason at sheffieldave

Feb 17, 2004, 9:05 AM

Post #2 of 8 (954 views)

Permalink

I did, in fact notice that you need 200 mails of spam and ham. But how
do you know where Bayes is at with the amount of spam/ham it's analyzed?

Thanks,

Jason

Jean-Christophe Valiere wrote:

> Jason Novak wrote:
>
>> Hi,
>>
>> I apologize if this is a duplicate post. In my
>> /etc/mail/spamassassin/local.cf file I have only one entry, which is:
>>
>> bayes_auto_learn 1
>>
>> which I imagine that Bayes is learning? Any idea how to 'turn Bayes
>> on'? From what I see, this is constantly updating the files in
>> ~/.spamassassin/bayes_*, but I don't think that spamassassin is
>> utilizing the rules it learned from these files.
>>
>> Any help clarifying this would be greatly appreciated.
>>
>> Best Regards,
>>
>> Jason
>
>
> By default mail that get more than 12 pts or less than 0.1 pts are
> learned as spam or ham respectively.
> But to use the bayes database you need to have 200 mail of spam and
> ham (by default).
>
>

Re: Question regarding Bayes from a newbie [ In reply to ]

mkettler at evi-inc

Feb 17, 2004, 9:06 AM

Post #3 of 8 (951 views)

Permalink

At 10:35 AM 2/17/2004, Jason Novak wrote:
>which I imagine that Bayes is learning? Any idea how to 'turn Bayes
>on'? From what I see, this is constantly updating the files in
>~/.spamassassin/bayes_*, but I don't think that spamassassin is utilizing
>the rules it learned from these files.
>
>Any help clarifying this would be greatly appreciated.

The bayes system must accrue 500 spam, and 500 ham (nonspam) emails in it's
learning before it gets used in message scoring.

Via autolearn alone this will take quite some time. It also produces a
sub-optimal bayes database (at least a small amount of manual learning is
desirable to produce a well balanced bayes db)

I'd recommend getting some emails and feeding them to sa-learn to manually
train it... You can get to 500 of each much faster this way, and since you
are manually classifying them, you can feed many kinds of spam and ham that
SA ordinarily would not autolearn from because it didn't score strongly
enough to cross the learning threshold.

Re: Question regarding Bayes from a newbie [ In reply to ]

mkettler at evi-inc

Feb 17, 2004, 9:32 AM

Post #4 of 8 (954 views)

Permalink

At 11:05 AM 2/17/2004, Jason Novak wrote:
>I did, in fact notice that you need 200 mails of spam and ham. But how do
>you know where Bayes is at with the amount of spam/ham it's analyzed?

sa-learn --dump magic

Re: Question regarding Bayes from a newbie [ In reply to ]

jason at sheffieldave

Feb 17, 2004, 9:55 AM

Post #5 of 8 (960 views)

Permalink

Hi Matt,

I ran that...

What number(s) am I looking at?

0.000 0 2 0 non-token data: bayes db version
0.000 0 16 0 non-token data: nspam
0.000 0 232 0 non-token data: nham
0.000 0 13091 0 non-token data: ntokens
0.000 0 1076618946 0 non-token data: oldest atime
0.000 0 1077035539 0 non-token data: newest atime
0.000 0 0 0 non-token data: last journal
sync atime
0.000 0 0 0 non-token data: last expiry atime
0.000 0 0 0 non-token data: last expire
atime delta
0.000 0 0 0 non-token data: last expire
reduction count

Thanks,

Jason

Matt Kettler wrote:

> At 11:05 AM 2/17/2004, Jason Novak wrote:
>
>> I did, in fact notice that you need 200 mails of spam and ham. But
>> how do you know where Bayes is at with the amount of spam/ham it's
>> analyzed?
>
>
> sa-learn --dump magic
>
>
>

Re: Question regarding Bayes from a newbie [ In reply to ]

jdow at earthlink

Feb 17, 2004, 10:08 AM

Post #6 of 8 (954 views)

Permalink

200 not 500.
{o.o}
----- Original Message -----
From: "Matt Kettler" <mkettler@evi-inc.com>

> At 10:35 AM 2/17/2004, Jason Novak wrote:
> >which I imagine that Bayes is learning? Any idea how to 'turn Bayes
> >on'? From what I see, this is constantly updating the files in
> >~/.spamassassin/bayes_*, but I don't think that spamassassin is utilizing
> >the rules it learned from these files.
> >
> >Any help clarifying this would be greatly appreciated.
>
> The bayes system must accrue 500 spam, and 500 ham (nonspam) emails in
it's
> learning before it gets used in message scoring.
>
> Via autolearn alone this will take quite some time. It also produces a
> sub-optimal bayes database (at least a small amount of manual learning is
> desirable to produce a well balanced bayes db)
>
> I'd recommend getting some emails and feeding them to sa-learn to manually
> train it... You can get to 500 of each much faster this way, and since you
> are manually classifying them, you can feed many kinds of spam and ham
that
> SA ordinarily would not autolearn from because it didn't score strongly
> enough to cross the learning threshold.

RE: Question regarding Bayes from a newbie [ In reply to ]

joe.kang at netex

Feb 17, 2004, 10:29 AM

Post #7 of 8 (956 views)

Permalink

> -----Original Message-----
> From: Jason Novak [mailto:jason@sheffieldave.com]
> Sent: Tuesday, February 17, 2004 10:56 AM
> To: Matt Kettler
> Cc: Jean-Christophe Valiere; spamassassin-users@incubator.apache.org
> Subject: Re: Question regarding Bayes from a newbie
>
> What number(s) am I looking at?
>
> 0.000 0 2 0 non-token data: bayes
> db version
> 0.000 0 16 0 non-token data: nspam
> 0.000 0 232 0 non-token data: nham
> 0.000 0 13091 0 non-token data: ntokens
> 0.000 0 1076618946 0 non-token data: oldest atime
> 0.000 0 1077035539 0 non-token data: newest atime
> 0.000 0 0 0 non-token data: last journal
> sync atime
> 0.000 0 0 0 non-token data: last
> expiry atime
> 0.000 0 0 0 non-token data: last expire
> atime delta
> 0.000 0 0 0 non-token data: last expire
> reduction count

nspam = number of spam messages
nham = number of ham messages

-Joe

Re: Question regarding Bayes from a newbie [ In reply to ]

mkettler at evi-inc

Feb 17, 2004, 11:11 AM

Post #8 of 8 (950 views)

Permalink

At 11:55 AM 2/17/2004, Jason Novak wrote:
>What number(s) am I looking at?

>0.000 0 16 0 non-token data: nspam
>0.000 0 232 0 non-token data: nham

Those two... this means that 232 ham messages have been trained, and 16
spam messages.

Mailing List Archive

Mailing List Archive

Attached Files: