Mailing List Archive: bayes 0.5

bayes 0.5

Mar 15, 2004, 1:57 PM

Post #1 of 3 (383 views)

Anyone familiar with the Bayes engine in spamassassin? What causes
it to return a value of exactly 0.5? (I'm the one who's getting
either 0.5 or negative values or exponential values from bayes).

Below is the debug output of the bayes section for one message.
Does it look at all correct? First of all, it reports most token
scores as many many digits... is that normal?

Second, it looks like even though almost every single token scores
0.8 or above (many 0.9 or above), the final score is exactly 0.5.

Am I misreading the debug output?

Thanks in advance...

debug: bayes token 'H*c:html' => 0.999800947867298578199052132701422
debug: bayes token 'glittle' => 0.999301059001512859304084720121029
debug: bayes token 'N:NNucsd.edu' => 0.999301059001512859304084720121029
debug: bayes token '40ucsd.edu' => 0.999301059001512859304084720121029
debug: bayes token '120th' => 0.999256038647342995169082125603865
debug: bayes token 'Westminster' => 0.999256038647342995169082125603865
debug: bayes token '80234' => 0.999256038647342995169082125603865
debug: bayes token 'cpaempire.com' => 0.998902612826603325415676959619952
debug: bayes token 'auctionsnap.com' => 0.998412371134020618556701030927835
debug: bayes token 'sk:optinre' => 0.997701492537313432835820895522388
debug: bayes token 'N:sk:email.N' => 0.997701492537313432835820895522388
debug: bayes token 'sk:email.0' => 0.997701492537313432835820895522388
debug: bayes token 'sk:r.email' => 0.997701492537313432835820895522388
debug: bayes token 'N:NNNth' => 0.997559165127640907214618681167051
debug: bayes token 'H*m:tekmailer' => 0.997447513812154696132596685082873
debug: bayes token 'UD:tekmailer.com' => 0.997447513812154696132596685082873
debug: bayes token 'D*tekmailer.com' => 0.997447513812154696132596685082873
debug: bayes token 'H*F:D*tekmailer.com' => 0.997447513812154696132596685082873
debug: bayes token 'H*r:69.6.6' => 0.995837837837837837837837837837838
debug: bayes token 'leroy' => 0.99349695842365065557925335961098
debug: bayes token 'H*r:sk:glittle' => 0.992698772527436476319004436505232
debug: bayes token 'unsubscribe.asp' => 0.00757746478873239436619718309859155
debug: bayes token '1333' => 0.992320390967375632508575050900851
debug: bayes token 'H*F:U*email' => 0.991069979168274052928014813671785
debug: bayes token 'HTo:U*glittle' => 0.987967094733877059780034057590618
debug: bayes token '291' => 0.0131219512195121951219512195121951
debug: bayes token 'N:CDNNNN' => 0.985749834652679084307681041013001
debug: bayes token 'UD:jpg' => 0.982864008485058609247385413513806
debug: bayes token 'Loan' => 0.964844625700920644420244922470696
debug: bayes token 'LLC' => 0.961155288924073763884327020122858
debug: bayes token 'H*r:sk:daemon@' => 0.960220738426547302334562071766433
debug: bayes token 'sk:b10.tek' => 0.958
debug: bayes token 'N:H*F:D*bNN.tekmailer.com' => 0.958
debug: bayes token 'H*m:b10' => 0.958
debug: bayes token 'CD1091' => 0.958
debug: bayes token 'N:sk:bNN.tek' => 0.958
debug: bayes token 'HX-Spamscanner:7.2' => 0.958
debug: bayes token 'N:H*m:bNN' => 0.958
debug: bayes token 'D*b10.tekmailer.com' => 0.958
debug: bayes token 'H*r:sk:b10.tek' => 0.958
debug: bayes token 'N:H*r:sk:bNN.tek' => 0.958
debug: bayes token 'H*F:D*b10.tekmailer.com' => 0.958
debug: bayes token '101' => 0.956073649389416267190889653016379
debug: bayes token 'H*r:8.8.8' => 0.937479505926449451950590519661523
debug: bayes token 'Need' => 0.0654801029661087568160503497605387
debug: bayes token 'URI' => 0.920995680112160652062040112507882
debug: bayes token 'Colorado' => 0.91863105079109250622627542927571
debug: bayes token 'offers' => 0.911251270499804056813577796647629
debug: bayes token 'Ave' => 0.906186595297191649271043983166339
debug: bayes token 'HX-Spamscanner:v1.4' => 0.889181253091756966814099107013495
debug: bayes token 'N:HX-Spamscanner:N.NN' => 0.889181253091756966814099107013495
debug: bayes token 'HX-Spamscanner:sk:mailbox' => 0.889181253091756966814099107013495
debug: bayes token 'N:HX-Spamscanner:N.N' => 0.889181253091756966814099107013495
debug: bayes token 'HX-Spamscanner:5.0' => 0.889181253091756966814099107013495
debug: bayes token 'N:HX-Spamscanner:vN.N' => 0.889181253091756966814099107013495
debug: bayes token 'N:HX-Spamscanner:NNNN' => 0.889181253091756966814099107013495
debug: bayes token 'HX-Spamscanner:2.63' => 0.889025143633905081164154735638867
debug: bayes token 'HX-Spamscanner:Mar' => 0.883223296181928707187780217469972
debug: bayes token 'HX-Spamscanner:2004' => 0.883223296181928707187780217469972
debug: bayes token 'blank' => 0.877738869044784984176037978108126
debug: bayes token 'click' => 0.870277235737138322357951015199984
debug: bayes token 'remove' => 0.859414160596227324415700400764569
debug: bayes: score = 0.5

Re: bayes 0.5 [ In reply to ]

mkettler at evi-inc

Mar 15, 2004, 3:02 PM

Post #2 of 3 (376 views)

Permalink

At 03:57 PM 3/15/2004, little@cs.ucsd.edu wrote:

>Anyone familiar with the Bayes engine in spamassassin? What causes
>it to return a value of exactly 0.5? (I'm the one who's getting
>either 0.5 or negative values or exponential values from bayes).

I'd also suggest you read Bayes.pm. If you need to understand SA's bayes
in-depth, that's the best place to start.

In general, exact 0.50 only happens if bayes is disabled, or no tokens are
found.

It can happen if the chi-squared combine results in 0.50, but that's quite
rare.

>Below is the debug output of the bayes section for one message.
>Does it look at all correct? First of all, it reports most token
>scores as many many digits... is that normal?

Yes. Large numbers of digits is perfectly normal.

>Second, it looks like even though almost every single token scores
>0.8 or above (many 0.9 or above), the final score is exactly 0.5.

That's a little bit odd to me. Usually if the score is forced to 0.5, it's
because bayes got skipped, or very few tokens were found.

>Am I misreading the debug output?

No...

However, going back in your posts, I see you are running on a solaris box..
however, you've never indicated what version of SA you are using. Are you
on 2.63?

Re: bayes 0.5 [ In reply to ]

little at cs

Mar 15, 2004, 3:03 PM

Post #3 of 3 (376 views)

Permalink

Matt Kettler wrote:
> At 03:57 PM 3/15/2004, little@cs.ucsd.edu wrote:
>
>> Anyone familiar with the Bayes engine in spamassassin? What causes
>> it to return a value of exactly 0.5? (I'm the one who's getting
>> either 0.5 or negative values or exponential values from bayes).
>
>
> I'd also suggest you read Bayes.pm. If you need to understand SA's bayes
> in-depth, that's the best place to start.
>
> In general, exact 0.50 only happens if bayes is disabled, or no tokens
> are found.
>
> It can happen if the chi-squared combine results in 0.50, but that's
> quite rare.
>
>
>> Below is the debug output of the bayes section for one message.
>> Does it look at all correct? First of all, it reports most token
>> scores as many many digits... is that normal?
>
>
> Yes. Large numbers of digits is perfectly normal.
>
>
>> Second, it looks like even though almost every single token scores
>> 0.8 or above (many 0.9 or above), the final score is exactly 0.5.
>
>
> That's a little bit odd to me. Usually if the score is forced to 0.5,
> it's because bayes got skipped, or very few tokens were found.
>
>> Am I misreading the debug output?
>
>
> No...
>
> However, going back in your posts, I see you are running on a solaris
> box.. however, you've never indicated what version of SA you are using.
> Are you on 2.63?

Yes, we are running 2.63, and bayes is enabled. Am I correct in assuming
(from the debug output) that the bayes engine got enough tokens to not
force a 0.5?

Another thing from my other posts: if we don't get a 0.5 from bayes, we
get weird numbers (numbers with huge exponential values, or negative
numbers, etc). Does this focus where the problem could be? Seems like
a hideous miscalculation :-)

-glenn