Mailing List Archive

Bayes scores
Hello

Same score for BAYES_20 and BAYES_40 seems a bit strange

50_scores.cf:score BAYES_20 0 0 -0.001 -0.001
50_scores.cf:score BAYES_40 0 0 -0.001 -0.001



--
Best regards,
Niamh mailto:niamh@fullbore.co.uk
Re: Bayes scores [ In reply to ]
On 5/8/2013 3:07 AM, Niamh Holding wrote:
> Hello
>
> Same score for BAYES_20 and BAYES_40 seems a bit strange
>
> 50_scores.cf:score BAYES_20 0 0 -0.001 -0.001
> 50_scores.cf:score BAYES_40 0 0 -0.001 -0.001

A score of -0.001 basically means that the rule is active, but has no
real effect on the final score. In this case, Bayes thinks the message
is ham, but is not confident enough to merit a negative score. Take a
look at the full list of Bayes scores and it makes a bit more sense.

score BAYES_00 0 0 -1.5 -1.9
score BAYES_05 0 0 -0.3 -0.5
score BAYES_20 0 0 -0.001 -0.001
score BAYES_40 0 0 -0.001 -0.001
score BAYES_50 0 0 2.0 0.8
score BAYES_60 0 0 2.5 1.5
score BAYES_80 0 0 2.7 2.0
score BAYES_95 0 0 3.2 3.0
score BAYES_99 0 0 3.8 3.5

--
Bowie
Re: Bayes scores [ In reply to ]
Hello Bowie,

Wednesday, May 8, 2013, 4:35:35 PM, you wrote:

BB> makes a bit more sense.

Not a lot, though to have BAYES_20 and BAYES_$) scoring the same.

--
Best regards,
Niamh mailto:niamh@fullbore.co.uk
Re: Bayes scores [ In reply to ]
On 5/8/2013 12:06 PM, Niamh Holding wrote:
> Hello Bowie,
>
> Wednesday, May 8, 2013, 4:35:35 PM, you wrote:
>
> BB> makes a bit more sense.
>
> Not a lot, though to have BAYES_20 and BAYES_$) scoring the same.

A 20-40% confidence level is not high enough to have a significant
positive score nor is it low enough to have a significant negative
score, so they are both given a neutral score of -0.001. The score
can't actually be 0, or the rule hit would not show in the reports.

--
Bowie
Re: BAYES scores [ In reply to ]
> From: joe a <joea-lists@j4computers.com>
> Date: Tue, 28 Feb 2023 11:37:34 -0500
>
> Curious as to why these scores, apparently "stock" are what they are.
> I'd expect BAYES_999 BODY to count more than BAYES_99 BODY.
>
> Noted in a header this morning:
>
> * 3.5 BAYES_99 BODY: Bayes spam probability is 99 to 100%
> * [score: 1.0000]
> * 0.2 BAYES_999 BODY: Bayes spam probability is 99.9 to 100%
> * [score: 1.0000]
>
> Was this discussed recently? I added a local score to mollify my sense
> of propriety.

Those two rules overlap. A message with bayes >= 99.9% hits both
rules. BAYES_99 ends at 1.00 not .999.
-jeff
Re: BAYES scores [ In reply to ]
On 2/28/2023 12:05 PM, Jeff Mincy wrote:
> > From: joe a <joea-lists@j4computers.com>
> > Date: Tue, 28 Feb 2023 11:37:34 -0500
> >
> > Curious as to why these scores, apparently "stock" are what they are.
> > I'd expect BAYES_999 BODY to count more than BAYES_99 BODY.
> >
> > Noted in a header this morning:
> >
> > * 3.5 BAYES_99 BODY: Bayes spam probability is 99 to 100%
> > * [score: 1.0000]
> > * 0.2 BAYES_999 BODY: Bayes spam probability is 99.9 to 100%
> > * [score: 1.0000]
> >
> > Was this discussed recently? I added a local score to mollify my sense
> > of propriety.
>
> Those two rules overlap. A message with bayes >= 99.9% hits both
> rules. BAYES_99 ends at 1.00 not .999.
> -jeff
>

I get that they overlap. I guess my thinker gets in a knot wondering
why there is so little weight given to the more certain determination.

In my narrow view, anything that is 99.9% certain is probably worth a 5
on it's own. Or, at least should when, summed with BAYES_99, equal 5.
As that is what the default "SPAM flag" is.

Appears more experienced or thoughtful persons think otherwise.

Yes, it did snow heavily overnight. Yes, I am looking for excuses not
to visit that issue.
Re: BAYES scores [ In reply to ]
From my small experience... I score BAYES_999 with 2.00, it was
suggested to me months ago.

But nowadays I'd be more careful and do some more testing: I'd check which
messages have only BAYES_99 and which have BAYES_999, If you are
absolutely certain that BYES_999 are only and definitively spam, go with 2
or more; if you have several false positives, keep the score low.

I learnt the hard way that BAYES depends on the corpus used to grow the
database.

On Tue, Feb 28, 2023 at 7:39 PM joe a <joea-lists@j4computers.com> wrote:

> On 2/28/2023 12:05 PM, Jeff Mincy wrote:
> > > From: joe a <joea-lists@j4computers.com>
> > > Date: Tue, 28 Feb 2023 11:37:34 -0500
> > >
> > > Curious as to why these scores, apparently "stock" are what they are.
> > > I'd expect BAYES_999 BODY to count more than BAYES_99 BODY.
> > >
> > > Noted in a header this morning:
> > >
> > > * 3.5 BAYES_99 BODY: Bayes spam probability is 99 to 100%
> > > * [score: 1.0000]
> > > * 0.2 BAYES_999 BODY: Bayes spam probability is 99.9 to 100%
> > > * [score: 1.0000]
> > >
> > > Was this discussed recently? I added a local score to mollify my
> sense
> > > of propriety.
> >
> > Those two rules overlap. A message with bayes >= 99.9% hits both
> > rules. BAYES_99 ends at 1.00 not .999.
> > -jeff
> >
>
> I get that they overlap. I guess my thinker gets in a knot wondering
> why there is so little weight given to the more certain determination.
>
> In my narrow view, anything that is 99.9% certain is probably worth a 5
> on it's own. Or, at least should when, summed with BAYES_99, equal 5.
> As that is what the default "SPAM flag" is.
>
> Appears more experienced or thoughtful persons think otherwise.
>
> Yes, it did snow heavily overnight. Yes, I am looking for excuses not
> to visit that issue.
>
Re: BAYES scores [ In reply to ]
On 2023-02-28 at 13:38:35 UTC-0500 (Tue, 28 Feb 2023 13:38:35 -0500)
joe a <joea-lists@j4computers.com>
is rumored to have said:

> On 2/28/2023 12:05 PM, Jeff Mincy wrote:
>> > From: joe a <joea-lists@j4computers.com>
>> > Date: Tue, 28 Feb 2023 11:37:34 -0500
>> >
>> > Curious as to why these scores, apparently "stock" are what they
>> are.
>> > I'd expect BAYES_999 BODY to count more than BAYES_99 BODY.
>> >
>> > Noted in a header this morning:
>> >
>> > * 3.5 BAYES_99 BODY: Bayes spam probability is 99 to 100%
>> > * [score: 1.0000]
>> > * 0.2 BAYES_999 BODY: Bayes spam probability is 99.9 to 100%
>> > * [score: 1.0000]
>> >
>> > Was this discussed recently? I added a local score to mollify my
>> sense
>> > of propriety.
>>
>> Those two rules overlap. A message with bayes >= 99.9% hits both
>> rules. BAYES_99 ends at 1.00 not .999.
>> -jeff
>>
>
> I get that they overlap. I guess my thinker gets in a knot wondering
> why there is so little weight given to the more certain determination.

It is my understanding that an automated rescoring job was run quite
some time ago (before I was on the PMC) to generate the Bayes scores,
which determined that to be the best supplemental score to give to the
greater certainty. Bayes rules are not rescored routinely in the daily
rescoring task because those hits are inherently different at every
site. If you wish to determine the ideal scores for YOUR mix of ham and
spam, I believe all the tools for doing so are in the SA code tree, but
they may not be well-documented.

That's likely to not be a satisfying answer, but as a volunteer project
we have no funding for Customer Satisfaction, so the bare unsatisfying
truth will have to do.

> In my narrow view, anything that is 99.9% certain is probably worth a
> 5 on it's own. Or, at least should when, summed with BAYES_99, equal
> 5. As that is what the default "SPAM flag" is.
>
> Appears more experienced or thoughtful persons think otherwise.

I don't know that I'd go that far. Rescoring is not done based on simple
clear reason, but on numbers. I'm not sure whether any currently active
SA developers are able to explain exactly how the rescoring works.

> Yes, it did snow heavily overnight. Yes, I am looking for excuses not
> to visit that issue.

I vehemently recommend reading all of Justin's scripts and documentation
(I think it's all in the 'build' sub-directory) and figuring out how to
rescore based on your own mail. That's MUCH less unpleasant than dealing
with the snow.


--
Bill Cole
bill@scconsult.com or billcole@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire
Re: BAYES scores [ In reply to ]
joe a skrev den 2023-02-28 17:37:
> Curious as to why these scores, apparently "stock" are what they are.
> I'd expect BAYES_999 BODY to count more than BAYES_99 BODY.
>
> Noted in a header this morning:
>
> * 3.5 BAYES_99 BODY: Bayes spam probability is 99 to 100%
> * [score: 1.0000]
> * 0.2 BAYES_999 BODY: Bayes spam probability is 99.9 to 100%
> * [score: 1.0000]
>
> Was this discussed recently? I added a local score to mollify my
> sense of propriety.

what does it solve for you ?

maybe it could be changed to not overlap on scores, but what should
scores change ?

tag can be splited so it is not overlapping hits, but what should scores
so change to ?
Re: BAYES scores [ In reply to ]
> From: "Bill Cole" <sausers-20150205@billmail.scconsult.com>
>
> It is my understanding that an automated rescoring job was run quite some
> time ago (before I was on the PMC) to generate the Bayes scores, which
> determined that to be the best supplemental score to give to the greater
> certainty.

I was around in those days. My memory isn't the greatest anymore, but what I
recall was that they did automatic rescoring, and then manually tweaked a
few of the values, basically to make them look pretty by rounding off long
fractions. BAYES_999 may have been scored almost completely manually, I
can't quite recall.

Loren
Re: BAYES scores [ In reply to ]
joe a skrev den 2023-02-28 17:37:
> Curious as to why these scores, apparently "stock" are what they are.
> I'd expect BAYES_999 BODY to count more than BAYES_99 BODY.
>
> Noted in a header this morning:
>
> * 3.5 BAYES_99 BODY: Bayes spam probability is 99 to 100%
> * [score: 1.0000]
> * 0.2 BAYES_999 BODY: Bayes spam probability is 99.9 to 100%
> * [score: 1.0000]
>
> Was this discussed recently? I added a local score to mollify my
> sense of propriety.

what does it solve for you ?

maybe it could be changed to not overlap on scores, but what should
scores change ?