Mailing List Archive

making bayes king
Hi all,

I am running a front-end gateway that analyzes all incoming
mail for spam and virii, before sending it off to various internal mail
servers. There are no mailboxes on the gateway so there is a single
Bayes engine (not one for each particular user - not sure how effective
it is to do it this way (ie have it learn from *all* incoming mail, as
opposed to being tailored by individual users)). Right now the Bayes
engine is learning from 367,209 spam and 42,487 ham (I don't know
if that is too much - to be honest, SA has run so well that I haven't
really been paying attention to it and have been too busy to keep up
with the mailing list - is there a way to trim that down, or is that a
solid number to work with?).

Anyway, I have SpamAssassin set to alter the Subject: of
a message, adding "***SPAM***" if the score is 7.0 or higher.

Here is my question: Sometimes I'll get a just-over-the-threshold
score of like 8.3, so it gets flagged OK as spam, but I notice that
the BAYES_90 was triggered with values like 0.988. But
SpamAssassin gives this a score of only 3.0. Can I up that value?
Can I make it 7.0 or so so this message is gauranteed to be flagged
as spam? I know SA recommends you don't tinker with their scores,
but isn't a BAYES_90 a sure thing?

TIA - jim -
RE: making bayes king [ In reply to ]
Sure make the changes your local.cf to override the bayes setting

set: score BAYES_99 0 0 5.400 5.400
to whatevery you want...
to:score BAYES_99 0 0 10.99 10.99

Correct me if I'm wrong (which I'm sure this group will :)) but the local.cf rules have precedence over the stock rules.

Gary Smith

-----Original Message-----
From: Jim Savoy [mailto:savoy@uleth.ca]
Sent: Thu 3/11/2004 3:32 PM
To: spamassassin-users@incubator.apache.org
Cc:
Subject: making bayes king




Hi all,

I am running a front-end gateway that analyzes all incoming
mail for spam and virii, before sending it off to various internal mail
servers. There are no mailboxes on the gateway so there is a single
Bayes engine (not one for each particular user - not sure how effective
it is to do it this way (ie have it learn from *all* incoming mail, as
opposed to being tailored by individual users)). Right now the Bayes
engine is learning from 367,209 spam and 42,487 ham (I don't know
if that is too much - to be honest, SA has run so well that I haven't
really been paying attention to it and have been too busy to keep up
with the mailing list - is there a way to trim that down, or is that a
solid number to work with?).

Anyway, I have SpamAssassin set to alter the Subject: of
a message, adding "***SPAM***" if the score is 7.0 or higher.

Here is my question: Sometimes I'll get a just-over-the-threshold
score of like 8.3, so it gets flagged OK as spam, but I notice that
the BAYES_90 was triggered with values like 0.988. But
SpamAssassin gives this a score of only 3.0. Can I up that value?
Can I make it 7.0 or so so this message is gauranteed to be flagged
as spam? I know SA recommends you don't tinker with their scores,
but isn't a BAYES_90 a sure thing?

TIA - jim -
Re: making bayes king [ In reply to ]
Gary Smith wrote:

>Sure make the changes your local.cf to override the bayes setting
>
>set: score BAYES_99 0 0 5.400 5.400
>to whatevery you want...
>to:score BAYES_99 0 0 10.99 10.99
>
>Correct me if I'm wrong (which I'm sure this group will :)) but the local.cf rules have precedence over the stock rules.
>
>
Thanks Gary. Although I'm no SA wizard, I do know about the local.cf
file, and that its contents override the stuff in the default .cf files.
I wasn't
asking so much *how* to make this change, but rather if it is a good idea
to make Bayes the King (ie can I trust the bayes_90 and bayes_99 rules to
make the call on my email by upping their values to something much higher?).
Do others on this group do this after they've learned from enough messages?

Side-question: I was looking at the 50_scores.cf file and I have this in
there:

score BAYES_60 0 0 1.997 1.101
score BAYES_70 0 0 2.593 2.310
score BAYES_80 0 0 5.300 2.862
score BAYES_90 0 0 4.027 3.002
score BAYES_99 0 0 5.200 3.008

(I am still using SA 2.55, btw). Can someone tell me what these values
are? There
seems to be a steady progression there, but BAYES_80 breaks the pattern
slightly
by dipping to 4.027 (I am assuming that is the high value while 3.002 is
the low
value). The reason I ask is because I was considering upping the score for
BAYES_80 as well.

Thanks!
Re: making bayes king [ In reply to ]
Oops - I see that the questin about why BAYES_80 breaks the progression
was asked
just two days ago, and a link was provided to explain how that works.
I'll read that first.
Sorry!
Re: making bayes king [ In reply to ]
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Jim Savoy writes:
>Gary Smith wrote:
>
>>Sure make the changes your local.cf to override the bayes setting
>>
>>set: score BAYES_99 0 0 5.400 5.400
>>to whatevery you want...
>>to:score BAYES_99 0 0 10.99 10.99
>>
>>Correct me if I'm wrong (which I'm sure this group will :)) but the local.cf rules have precedence over the stock rules.
>>
>>
>Thanks Gary. Although I'm no SA wizard, I do know about the local.cf
>file, and that its contents override the stuff in the default .cf files.
>I wasn't
>asking so much *how* to make this change, but rather if it is a good idea
>to make Bayes the King (ie can I trust the bayes_90 and bayes_99 rules to
>make the call on my email by upping their values to something much higher?).
>Do others on this group do this after they've learned from enough messages?

Yeah, this is definitely a good idea if you want to trust Bayes more
than the other rules.

Note that using chi-squared combining, most "certain" results from
bayes will come out at BAYES_00 and BAYES_99 -- so if you like,
just set those scores and leave the "noise" in the middle alone.

- --j.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Exmh CVS

iD8DBQFAUf+bQTcbUG5Y7woRApoJAJ9AnRkN7gg4yHuVfvkbPxoXUG9PMwCgt2zq
XWs2xYdR2apSiv5hXjFC9xs=
=Jm0i
-----END PGP SIGNATURE-----
Re: making bayes king [ In reply to ]
Justin writes:

>Note that using chi-squared combining, most "certain" results from
>bayes will come out at BAYES_00 and BAYES_99 -- so if you like,
>just set those scores and leave the "noise" in the middle alone.
>
>
>
...and I could always just upgrade my SpamAssassin to the latest version
(I'm still running
SA 2.55), which I will do in a few weeks btw.

The reason I bring this whole Bayes thing up is because I have manually
fed many
thousands of spam and ham into the db, and it has auto-learned from many
hundreds
of thousands of messages since then. It is doing a great job. Hardly any
false-positives
at all this past year.

However, when I check out some of the stuff that is not getting marked
as spam, I see
there aren't too many rules hit, so the scores are fairly low (maybe 4.5
or 5.0 or so).
But in these messages BAYES_90 or BAYES_80 are often hit. If I could
make these
two scores higher, that would put these over the top. I just wanted to
know if others
have done this, or should I not be messing with these very well-thought out
algorithms?

- jim -
Re: making bayes king [ In reply to ]
I'm bumped my BAYES_90 and 99 up to the point where they alone will
cause the email to be regarded as spam. I've bumped BAYES_80 up to
almost the threshold so that it would only take one or two additional
rules to put it over.


Jim Savoy wrote:

>
> Justin writes:
>
>> Note that using chi-squared combining, most "certain" results from
>> bayes will come out at BAYES_00 and BAYES_99 -- so if you like,
>> just set those scores and leave the "noise" in the middle alone.
>>
>>
>>
> ...and I could always just upgrade my SpamAssassin to the latest
> version (I'm still running
> SA 2.55), which I will do in a few weeks btw.
>
> The reason I bring this whole Bayes thing up is because I have
> manually fed many
> thousands of spam and ham into the db, and it has auto-learned from
> many hundreds
> of thousands of messages since then. It is doing a great job. Hardly
> any false-positives
> at all this past year.
>
> However, when I check out some of the stuff that is not getting marked
> as spam, I see
> there aren't too many rules hit, so the scores are fairly low (maybe
> 4.5 or 5.0 or so).
> But in these messages BAYES_90 or BAYES_80 are often hit. If I could
> make these
> two scores higher, that would put these over the top. I just wanted to
> know if others
> have done this, or should I not be messing with these very
> well-thought out
> algorithms?
>
> - jim -
>
>
>
>
>
Re: making bayes king [ In reply to ]
Kevin Peuhkurinen wrote:

> I'm bumped my BAYES_90 and 99 up to the point where they alone will
> cause the email to be regarded as spam. I've bumped BAYES_80 up to
> almost the threshold so that it would only take one or two additional
> rules to put it over.
>
Thanks. I read the link posted by Theo (HowScoresAreAssigned) and I didn't
understand it (duh). Can you tell me what you have for BAYES_80, 90 and 99?
I am not clear on the progression:

score BAYES_80 0 0 5.300 2.862
score BAYES_90 0 0 4.027 3.002
score BAYES_99 0 0 5.200 3.008

and what these ranges imply (eg for bayes_90, does this mean that something
that is evaluated as 90% gets a score of 3.008 and something at 98.9 gets a
score of 5.200? The explanation for why bayes_80 is out of whack did not
make any sense to me:

"This lets the GA lower the "learn" rule score due to the inevitable
false positive,
while also still marking the message as spam via the other rule scores".

But in my case, the other rules are not combining for a high enough score to
put the message over the top, but bayes_80 or bayes_90 is often flagged in
these messages.

- jim -