Feb 28, 2023, 12:14 PM
Post #8 of 11
(524 views)
Permalink
On 2023-02-28 at 13:38:35 UTC-0500 (Tue, 28 Feb 2023 13:38:35 -0500)
joe a <joea-lists@j4computers.com>
is rumored to have said:
> On 2/28/2023 12:05 PM, Jeff Mincy wrote:
>> > From: joe a <joea-lists@j4computers.com>
>> > Date: Tue, 28 Feb 2023 11:37:34 -0500
>> >
>> > Curious as to why these scores, apparently "stock" are what they
>> are.
>> > I'd expect BAYES_999 BODY to count more than BAYES_99 BODY.
>> >
>> > Noted in a header this morning:
>> >
>> > * 3.5 BAYES_99 BODY: Bayes spam probability is 99 to 100%
>> > * [score: 1.0000]
>> > * 0.2 BAYES_999 BODY: Bayes spam probability is 99.9 to 100%
>> > * [score: 1.0000]
>> >
>> > Was this discussed recently? I added a local score to mollify my
>> sense
>> > of propriety.
>>
>> Those two rules overlap. A message with bayes >= 99.9% hits both
>> rules. BAYES_99 ends at 1.00 not .999.
>> -jeff
>>
>
> I get that they overlap. I guess my thinker gets in a knot wondering
> why there is so little weight given to the more certain determination.
It is my understanding that an automated rescoring job was run quite
some time ago (before I was on the PMC) to generate the Bayes scores,
which determined that to be the best supplemental score to give to the
greater certainty. Bayes rules are not rescored routinely in the daily
rescoring task because those hits are inherently different at every
site. If you wish to determine the ideal scores for YOUR mix of ham and
spam, I believe all the tools for doing so are in the SA code tree, but
they may not be well-documented.
That's likely to not be a satisfying answer, but as a volunteer project
we have no funding for Customer Satisfaction, so the bare unsatisfying
truth will have to do.
> In my narrow view, anything that is 99.9% certain is probably worth a
> 5 on it's own. Or, at least should when, summed with BAYES_99, equal
> 5. As that is what the default "SPAM flag" is.
>
> Appears more experienced or thoughtful persons think otherwise.
I don't know that I'd go that far. Rescoring is not done based on simple
clear reason, but on numbers. I'm not sure whether any currently active
SA developers are able to explain exactly how the rescoring works.
> Yes, it did snow heavily overnight. Yes, I am looking for excuses not
> to visit that issue.
I vehemently recommend reading all of Justin's scripts and documentation
(I think it's all in the 'build' sub-directory) and figuring out how to
rescore based on your own mail. That's MUCH less unpleasant than dealing
with the snow.
--
Bill Cole
bill@scconsult.com or billcole@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire