Mailing List Archive

BAYES_00 BODY. Negative score?
Have some annoying SPAM that consistently shows a negative score on
BAYES. Is the default scoring or influenced by BAYES in some way?

*-1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1%
* [score: 0.0000]

SpamAssassin 3.4.5

Thanks for any pointers.
Re: BAYES_00 BODY. Negative score? [ In reply to ]
joe a skrev den 2023-02-13 23:42:
> Have some annoying SPAM that consistently shows a negative score on
> BAYES. Is the default scoring or influenced by BAYES in some way?
>
> *-1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1%
> * [score: 0.0000]
>
> SpamAssassin 3.4.5

time to upgrade imho :=)

or train bayes to know what is spam or not spam, if it fails turn off
autolearn, make a burdon what is autolearned

in local.cf

bayes_auto_learn_threshold_nonspam n.nn (default: 0.1)
The score threshold below which a mail has to score, to be fed into
SpamAssassin's learning systems automatically as a non-spam message.
bayes_auto_learn_threshold_spam n.nn (default: 12.0)
The score threshold above which a mail has to score, to be fed into
SpamAssassin's learning systems automatically as a spam message.

i have changed scores on this 2 :)

now i dont need manuely training

above is a plugin that need to be enabled for this to work

remember to do a spamassassin --lint on changes of config files
Re: BAYES_00 BODY. Negative score? [ In reply to ]
On 2/13/2023 5:51 PM, Benny Pedersen wrote:
> joe a skrev den 2023-02-13 23:42:
>> Have some annoying SPAM that consistently shows a negative score on
>> BAYES.  Is the default scoring or influenced by BAYES in some way?
>>
>> *-1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1%
>> *      [score: 0.0000]
>>
>> SpamAssassin 3.4.5
>
> time to upgrade imho :=)
>
> or train bayes to know what is spam or not spam, if it fails turn off
> autolearn, make a burdon what is autolearned
>
> in local.cf
>
> bayes_auto_learn_threshold_nonspam n.nn (default: 0.1)
> The score threshold below which a mail has to score, to be fed into
> SpamAssassin's learning systems automatically as a non-spam message.
> bayes_auto_learn_threshold_spam n.nn (default: 12.0)
> The score threshold above which a mail has to score, to be fed into
> SpamAssassin's learning systems automatically as a spam message.
>
> i have changed scores on this 2 :)
>
> now i dont need manuely training
>
> above is a plugin that need to be enabled for this to work
>
> remember to do a spamassassin --lint on changes of config files

So, what did you change them to, may I ask? Not sure I really
understand those limits.

In any case, I feed new SPAM and HAM into BAYES twice a day. via
scripts, etc. so I really should have autolearn off, yes?

Maybe I need to retrain BAYES? IIRC last time took "a long time".
Re: BAYES_00 BODY. Negative score? [ In reply to ]
On 2/13/2023 5:51 PM, Benny Pedersen wrote:
> joe a skrev den 2023-02-13 23:42:
>> Have some annoying SPAM that consistently shows a negative score on
>> . . .
>
> time to upgrade imho :=)
> . . .

And, yes, I should upgrade.
Re: BAYES_00 BODY. Negative score? [ In reply to ]
joe a skrev den 2023-02-14 00:12:
> On 2/13/2023 5:51 PM, Benny Pedersen wrote:
>> joe a skrev den 2023-02-13 23:42:
>>> Have some annoying SPAM that consistently shows a negative score on
>>> BAYES.  Is the default scoring or influenced by BAYES in some way?
>>>
>>> *-1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1%
>>> *      [score: 0.0000]
>>>
>>> SpamAssassin 3.4.5
>>
>> time to upgrade imho :=)
>>
>> or train bayes to know what is spam or not spam, if it fails turn off
>> autolearn, make a burdon what is autolearned
>>
>> in local.cf
>>
>> bayes_auto_learn_threshold_nonspam n.nn (default: 0.1)
>> The score threshold below which a mail has to score, to be fed into
>> SpamAssassin's learning systems automatically as a non-spam message.
>> bayes_auto_learn_threshold_spam n.nn (default: 12.0)
>> The score threshold above which a mail has to score, to be fed into
>> SpamAssassin's learning systems automatically as a spam message.
>>
>> i have changed scores on this 2 :)
>>
>> now i dont need manuely training
>>
>> above is a plugin that need to be enabled for this to work
>>
>> remember to do a spamassassin --lint on changes of config files
>
> So, what did you change them to, may I ask? Not sure I really
> understand those limits.

bayes_auto_learn_threshold_nonspam -5
bayes_auto_learn_threshold_spam 5

means all under minus 5 is autolearned as non ham, and all above 5 is
autolearned as spam

but this is just a suggestion not a recomending, spam and ham is
diffrent pr recipient

>
> In any case, I feed new SPAM and HAM into BAYES twice a day. via
> scripts, etc. so I really should have autolearn off, yes?
>
> Maybe I need to retrain BAYES? IIRC last time took "a long time".

yes
Re: BAYES_00 BODY. Negative score? [ In reply to ]
> Have some annoying SPAM that consistently shows a negative score on BAYES.
> Is the default scoring or influenced by BAYES in some way?
>
> *-1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1%
> * [score: 0.0000]

The score is reasonable for guaranteed ham, which is what your Bayes thinks
this spam email is
Of course the score isn't reasonable for spam, but Bayes thinks it is ham.

In addition to being cautious of autolearn as Benny descriped, yes, you need
to retrain your Bayes, because it is very clearly confused on this point.

Loren
Re: BAYES_00 BODY. Negative score? [ In reply to ]
On 13.02.23 17:42, joe a wrote:
>Have some annoying SPAM that consistently shows a negative score on
>BAYES. Is the default scoring or influenced by BAYES in some way?
>
>*-1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1%
>* [score: 0.0000]

This indicates a mistrained database, which means you have trained too many
spams or spam-like messages (commercial messages) as ham.

Proper training of spams should help. Just keep your spam (and optionally
ham) corpora for retraining in case you would drop the database.

I also recommend to abstain from training commercial mail (notices from
e-shops, companies you done business with etc) as ham, unless they generate
BAYES_999 score and you want it lower. I often train them as spam so those
give uncertain BAYES_50 result.

Those mails resemble spam too much to be used for training.

--
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
I don't have lysdexia. The Dog wouldn't allow that.
Re: BAYES_00 BODY. Negative score? [ In reply to ]
On 2/14/2023 2:56 AM, Matus UHLAR - fantomas wrote:
> On 13.02.23 17:42, joe a wrote:
>> Have some annoying SPAM that consistently shows a negative score on
>> BAYES.  Is the default scoring or influenced by BAYES in some way?
>>
>> *-1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1%
>> *      [score: 0.0000]
>
> This indicates a mistrained database, which means you have trained too
> many spams or spam-like messages (commercial messages) as ham.
>
> Proper training of spams should help. Just keep your spam (and
> optionally ham) corpora for retraining in case you would drop the database.
>
> I also recommend to abstain from training commercial mail (notices from
> e-shops, companies you done business with etc) as ham, unless they
> generate BAYES_999 score and you want it lower.  I often train them as
> spam so those give uncertain BAYES_50 result.
>
> Those mails resemble spam too much to be used for training.
>

All,

The term "proper training" has always seemed a bit problematic to me.
That aside, experiencing an error trying attempting:

sa-learn -D --spam /var/mail/spamd/Cabinet.saved-spam

The last line shows:

***************
Learned tokens from 0 message(s) (1 message(s) examined)
ERROR: the Bayes learn function returned an error, please re-run with -D
for more information at /usr/bin/sa-learn line 500.
***************

Which may be permissions related. However, there seem to be some
errors/warning at the beginning, starting with:

***************
Feb 14 17:26:14.956 [2855] dbg: plugin: loading
Mail::SpamAssassin::Plugin::Razo r2 from
@INC
Feb 14 17:26:14.959 [2855] dbg: razor2: razor2 is not available
Feb 14 17:26:14.959 [2855] dbg: plugin: loading
Mail::SpamAssassin::Plugin::SpamCop from @INC
plugin: failed to parse plugin (from @INC): Can't locate
Mail/SpamAssassin/Plugin/SpamCop.pm:
lib/Mail/SpamAssassin/Plugin/SpamCop.pm: Permission denied at (eval 44)
line 1.
***************

While this also suggests a permissions issue the only place I find
SpamCom.pm (even as root) is at:
"/usr/lib/perl5/vendor_perl/5.26.1/Mail/SpamAssassin/Plugin/SpamCop.pm",
which is not in the path sa-learn concocted when invoked.

Sorry if the formatting is weird or if this is useless information.
Re: BAYES_00 BODY. Negative score? [ In reply to ]
Please let this sit for a while, I've discovered a fundamental issue
with my scheme of feeding messages to BAYES. Unfortunately I was
remiss, apparently, it setting up logging for some bits, so have no idea
how long this has been failing.

Sorry for the clutter.

joe a.

On 2/14/2023 5:37 PM, joe a wrote:
> On 2/14/2023 2:56 AM, Matus UHLAR - fantomas wrote:
>> On 13.02.23 17:42, joe a wrote:
>>> Have some annoying SPAM that consistently shows a negative score on
>>> BAYES.  Is the default scoring or influenced by BAYES in some way?
>>>
>>> *-1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1%
>>> *      [score: 0.0000]
>>
>> This indicates a mistrained database, which means you have trained too
>> many spams or spam-like messages (commercial messages) as ham.
>>
>> Proper training of spams should help. Just keep your spam (and
>> optionally ham) corpora for retraining in case you would drop the
>> database.
>>
>> I also recommend to abstain from training commercial mail (notices
>> from e-shops, companies you done business with etc) as ham, unless
>> they generate BAYES_999 score and you want it lower.  I often train
>> them as spam so those give uncertain BAYES_50 result.
>>
>> Those mails resemble spam too much to be used for training.
>>
>
> All,
>
> The term "proper training" has always seemed a bit problematic to me.
> That aside, experiencing an error trying attempting:
>
> sa-learn -D --spam /var/mail/spamd/Cabinet.saved-spam
>
> The last line shows:
>
> ***************
> Learned tokens from 0 message(s) (1 message(s) examined)
> ERROR: the Bayes learn function returned an error, please re-run with -D
> for more information at /usr/bin/sa-learn line 500.
> ***************
>
> Which may be permissions related.  However, there seem to be some
> errors/warning at the beginning, starting with:
>
> ***************
> Feb 14 17:26:14.956 [2855] dbg: plugin: loading
> Mail::SpamAssassin::Plugin::Razo                                 r2 from
> @INC
> Feb 14 17:26:14.959 [2855] dbg: razor2: razor2 is not available
> Feb 14 17:26:14.959 [2855] dbg: plugin: loading
> Mail::SpamAssassin::Plugin::SpamCop from @INC
> plugin: failed to parse plugin (from @INC): Can't locate
> Mail/SpamAssassin/Plugin/SpamCop.pm:
> lib/Mail/SpamAssassin/Plugin/SpamCop.pm: Permission denied at (eval 44)
> line 1.
> ***************
>
> While this also suggests a permissions issue the only place I find
> SpamCom.pm (even as root) is at:
> "/usr/lib/perl5/vendor_perl/5.26.1/Mail/SpamAssassin/Plugin/SpamCop.pm",
> which is not in the path sa-learn concocted when invoked.
>
> Sorry if the formatting is weird or if this is useless information.
Re: BAYES_00 BODY. Negative score? [ In reply to ]
Hi,

>*-1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1%
> >* [score: 0.0000]
>
> This indicates a mistrained database, which means you have trained too
> many
> spams or spam-like messages (commercial messages) as ham.
>
> Proper training of spams should help. Just keep your spam (and optionally
> ham) corpora for retraining in case you would drop the database.
>
> I also recommend to abstain from training commercial mail (notices from
> e-shops, companies you done business with etc) as ham, unless they
> generate
> BAYES_999 score and you want it lower. I often train them as spam so
> those
> give uncertain BAYES_50 result.
>

Is there any ability to distinguish a legitimate newsletter from a spam
newsletter?

In other words, if I train emails from Forbes or Washington Post as ham,
then train similar newsletter emails from other other providers that are
more suspect, will bayes still be able to distinguish Forbes and WP as ham?

The problem is that if I avoid training newsletters or bulk email
altogether, then I'm also left with spam newsletters still only hitting
bayes50.

I'm actually in a situation now where Forbes and WP newsletters are being
marked as spam, so considering retraining, but wondering what approach/best
practices I should be following.

# sa-learn --dump magic
0.000 0 3 0 non-token data: bayes db version
0.000 0 97002 0 non-token data: nspam
0.000 0 90173 0 non-token data: nham
0.000 0 11581565 0 non-token data: ntokens
0.000 0 1054224948 0 non-token data: oldest atime
0.000 0 1676433889 0 non-token data: newest atime
0.000 0 0 0 non-token data: last journal sync
atime
0.000 0 1648164856 0 non-token data: last expiry atime
0.000 0 0 0 non-token data: last expire atime
delta
0.000 0 0 0 non-token data: last expire
reduction count
Re: BAYES_00 BODY. Negative score? [ In reply to ]
>>On 13.02.23 17:42, joe a wrote:
>>>Have some annoying SPAM that consistently shows a negative score
>>>on BAYES.? Is the default scoring or influenced by BAYES in some
>>>way?
>>>
>>>*-1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1%
>>>*????? [score: 0.0000]

>On 2/14/2023 2:56 AM, Matus UHLAR - fantomas wrote:
>>This indicates a mistrained database, which means you have trained
>>too many spams or spam-like messages (commercial messages) as ham.
>>
>>Proper training of spams should help. Just keep your spam (and
>>optionally ham) corpora for retraining in case you would drop the
>>database.
>>
>>I also recommend to abstain from training commercial mail (notices
>>from e-shops, companies you done business with etc) as ham, unless
>>they generate BAYES_999 score and you want it lower.? I often train
>>them as spam so those give uncertain BAYES_50 result.
>>
>>Those mails resemble spam too much to be used for training.

On 14.02.23 17:37, joe a wrote:
>The term "proper training" has always seemed a bit problematic to me.
>That aside, experiencing an error trying attempting:
>
>sa-learn -D --spam /var/mail/spamd/Cabinet.saved-spam

just FYI, there are multiple ways to train:

spamassassin -r < mail
- will train single message as spam.

spamc -C spam < mail
- will tell spamd to train message as spam. spamd must run with -l
(--allow-tell) option to do that

sa-learn --spam mail
- will train single message as

sa-learn --mbox --spam mbox
- will train multiple messages in single file in mbox format.

spamd must run as root with -H option in order to train your own database,
unless you use sql/redis for bayes storage.

when using amavis, spamd is not used and the database is stored under amavis
users' home directory (unless you changed DB to sql/redis).

you can still use spamassassin or sa-learn, but either run it under su/sudo:

su amavisd -c "spamassassin -r" < message

sa-learn --dbpath /var/lib/amavis/.spamassassin/ --mbox --spam mbox


when you scan messages sized over standard 500K, you must also increase size
of trained messages too.


>The last line shows:
>
>***************
>Learned tokens from 0 message(s) (1 message(s) examined)
>ERROR: the Bayes learn function returned an error, please re-run with
>-D for more information at /usr/bin/sa-learn line 500.
>***************
>
>Which may be permissions related. However, there seem to be some
>errors/warning at the beginning, starting with:
>
>***************
>Feb 14 17:26:14.956 [2855] dbg: plugin: loading
>Mail::SpamAssassin::Plugin::Razo r2
>from @INC
>Feb 14 17:26:14.959 [2855] dbg: razor2: razor2 is not available
>Feb 14 17:26:14.959 [2855] dbg: plugin: loading
>Mail::SpamAssassin::Plugin::SpamCop from @INC
>plugin: failed to parse plugin (from @INC): Can't locate
>Mail/SpamAssassin/Plugin/SpamCop.pm:
>lib/Mail/SpamAssassin/Plugin/SpamCop.pm: Permission denied at (eval
>44) line 1.

there have nothing to do with training, although spamcop.pm can be used to
report mail to spamcop.

--
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Despite the cost of living, have you noticed how popular it remains?
Re: BAYES_00 BODY. Negative score? [ In reply to ]
>>*-1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1%
>> >* [score: 0.0000]
>>
>> This indicates a mistrained database, which means you have trained too
>> many
>> spams or spam-like messages (commercial messages) as ham.
>>
>> Proper training of spams should help. Just keep your spam (and optionally
>> ham) corpora for retraining in case you would drop the database.
>>
>> I also recommend to abstain from training commercial mail (notices from
>> e-shops, companies you done business with etc) as ham, unless they
>> generate
>> BAYES_999 score and you want it lower. I often train them as spam so
>> those
>> give uncertain BAYES_50 result.

On 14.02.23 23:05, Alex wrote:
>Is there any ability to distinguish a legitimate newsletter from a spam
>newsletter?

Very hard.

That's why I recommend not to train newsletters unless you know you/users
want them and they produce BAYES_99 result.


>In other words, if I train emails from Forbes or Washington Post as ham,
>then train similar newsletter emails from other other providers that are
>more suspect, will bayes still be able to distinguish Forbes and WP as ham?

>The problem is that if I avoid training newsletters or bulk email
>altogether, then I'm also left with spam newsletters still only hitting
>bayes50.

If you only do this for Forbes or Washington Post, bayes will likely be able
to distinguish other newsletters, if you train those as spam.

>I'm actually in a situation now where Forbes and WP newsletters are being
>marked as spam, so considering retraining, but wondering what approach/best
>practices I should be following.

This should be safe. There are many types of newsletters, the problem would
only be if you started training them as ham unless you really know they are
welcome.

--
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
WinError #99999: Out of error messages.
Re: BAYES_00 BODY. Negative score? [ In reply to ]
If you run spamassasin with -D bayes -t xxx 2>debug.log

in debug.log you will see all the "tokens" the bayes system extracts
from the headers and you will probably find a lot of them related to
mailing lists.

If you teach SA that those tokens are spam and they are present both
in WP or Forbes, their emails will be flagged. It's normal.

If you want you can use bayes_ignore_header to ignore some headers.



On 2/15/23, Matus UHLAR - fantomas <uhlar@fantomas.sk> wrote:
>>>*-1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1%
>>> >* [score: 0.0000]
>>>
>>> This indicates a mistrained database, which means you have trained too
>>> many
>>> spams or spam-like messages (commercial messages) as ham.
>>>
>>> Proper training of spams should help. Just keep your spam (and
>>> optionally
>>> ham) corpora for retraining in case you would drop the database.
>>>
>>> I also recommend to abstain from training commercial mail (notices from
>>> e-shops, companies you done business with etc) as ham, unless they
>>> generate
>>> BAYES_999 score and you want it lower. I often train them as spam so
>>> those
>>> give uncertain BAYES_50 result.
>
> On 14.02.23 23:05, Alex wrote:
>>Is there any ability to distinguish a legitimate newsletter from a spam
>>newsletter?
>
> Very hard.
>
> That's why I recommend not to train newsletters unless you know you/users
> want them and they produce BAYES_99 result.
>
>
>>In other words, if I train emails from Forbes or Washington Post as ham,
>>then train similar newsletter emails from other other providers that are
>>more suspect, will bayes still be able to distinguish Forbes and WP as
>> ham?
>
>>The problem is that if I avoid training newsletters or bulk email
>>altogether, then I'm also left with spam newsletters still only hitting
>>bayes50.
>
> If you only do this for Forbes or Washington Post, bayes will likely be able
>
> to distinguish other newsletters, if you train those as spam.
>
>>I'm actually in a situation now where Forbes and WP newsletters are being
>>marked as spam, so considering retraining, but wondering what
>> approach/best
>>practices I should be following.
>
> This should be safe. There are many types of newsletters, the problem would
>
> only be if you started training them as ham unless you really know they are
>
> welcome.
>
> --
> Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
> Warning: I wish NOT to receive e-mail advertising to this address.
> Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
> WinError #99999: Out of error messages.
>
Re: BAYES_00 BODY. Negative score? [ In reply to ]
On 15.02.23 14:53, hg user wrote:
>If you run spamassasin with -D bayes -t xxx 2>debug.log
>
>in debug.log you will see all the "tokens" the bayes system extracts
>from the headers and you will probably find a lot of them related to
>mailing lists.
>
>If you teach SA that those tokens are spam and they are present both
>in WP or Forbes, their emails will be flagged. It's normal.

Don't expect anyone to manually compare tokens, unless they are deeply
debugging bayes functionality.

Simply said, bayes DOES gather all possible tokens and compare their
occurence with interesting effectivity - if you train Forbes and WP
newsletters as ham, and other newsletters as spam, bayes should be able to
distinguish them quite nicely.

However, many of tokens in even Forbes and WP newsletters may occure in
different spamy newsletters, so be careful when traning even these.

If you get the score down enough not to be classified as spam, you've won
and should not contine (unless you are willing to check all BAYES_0 mail for
suspicious newsletters and train those as spam, seeing how much it affects
mentioned Forbes and WP newsletters.

Bayes training is great, but one should be careful about that.


>If you want you can use bayes_ignore_header to ignore some headers.

this rarely helps.


>On 2/15/23, Matus UHLAR - fantomas <uhlar@fantomas.sk> wrote:
>>>>*-1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1%
>>>> >* [score: 0.0000]
>>>>
>>>> This indicates a mistrained database, which means you have trained too
>>>> many
>>>> spams or spam-like messages (commercial messages) as ham.
>>>>
>>>> Proper training of spams should help. Just keep your spam (and
>>>> optionally
>>>> ham) corpora for retraining in case you would drop the database.
>>>>
>>>> I also recommend to abstain from training commercial mail (notices from
>>>> e-shops, companies you done business with etc) as ham, unless they
>>>> generate
>>>> BAYES_999 score and you want it lower. I often train them as spam so
>>>> those
>>>> give uncertain BAYES_50 result.
>>
>> On 14.02.23 23:05, Alex wrote:
>>>Is there any ability to distinguish a legitimate newsletter from a spam
>>>newsletter?
>>
>> Very hard.
>>
>> That's why I recommend not to train newsletters unless you know you/users
>> want them and they produce BAYES_99 result.
>>
>>
>>>In other words, if I train emails from Forbes or Washington Post as ham,
>>>then train similar newsletter emails from other other providers that are
>>>more suspect, will bayes still be able to distinguish Forbes and WP as
>>> ham?
>>
>>>The problem is that if I avoid training newsletters or bulk email
>>>altogether, then I'm also left with spam newsletters still only hitting
>>>bayes50.
>>
>> If you only do this for Forbes or Washington Post, bayes will likely be able
>>
>> to distinguish other newsletters, if you train those as spam.
>>
>>>I'm actually in a situation now where Forbes and WP newsletters are being
>>>marked as spam, so considering retraining, but wondering what
>>> approach/best
>>>practices I should be following.
>>
>> This should be safe. There are many types of newsletters, the problem would
>>
>> only be if you started training them as ham unless you really know they are
>>
>> welcome.
>>
>> --
>> Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
>> Warning: I wish NOT to receive e-mail advertising to this address.
>> Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
>> WinError #99999: Out of error messages.
>>

--
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Save the whales. Collect the whole set.
Re: BAYES_00 BODY. Negative score? [ In reply to ]
Hi,

>
> However, many of tokens in even Forbes and WP newsletters may occure in
> different spamy newsletters, so be careful when traning even these.
>

This is exactly what I was thinking. When going through the quarantine,
it's also very difficult to always not only identify which newsletters may
have been miscategorized or trained incorrectly, but also ever being able
to correct an improperly trained newsletter (or email in general).


> If you get the score down enough not to be classified as spam, you've won
> and should not contine (unless you are willing to check all BAYES_0 mail
> for
> suspicious newsletters and train those as spam, seeing how much it affects
> mentioned Forbes and WP newsletters.
>

Too bad it wasn't possible to build a shared list of trusted
newsletters/senders to compensate for these mistakes.

On a related note, how about emails with only an image attachment? People
use email to send pictures, screenshots and other emails with nothing in
the body and sometimes even no subject, but aren't spam. The ones I see in
the quarantine are almost always ham, and despite training them as ham
(even with --max-size 0), they continue to be tagged as spam.

I've always also had difficulty with marking them so DCC ignores them.
Re: BAYES_00 BODY. Negative score? [ In reply to ]
>> However, many of tokens in even Forbes and WP newsletters may occure in
>> different spamy newsletters, so be careful when traning even these.

On 15.02.23 09:51, Alex wrote:
>This is exactly what I was thinking. When going through the quarantine,
>it's also very difficult to always not only identify which newsletters may
>have been miscategorized or trained incorrectly, but also ever being able
>to correct an improperly trained newsletter (or email in general).

this is why I recomment not to do any training on newsletters, or at least
no HAM training unless they are known.

>> If you get the score down enough not to be classified as spam, you've won
>> and should not contine (unless you are willing to check all BAYES_0 mail
>> for
>> suspicious newsletters and train those as spam, seeing how much it affects
>> mentioned Forbes and WP newsletters.

>Too bad it wasn't possible to build a shared list of trusted
>newsletters/senders to compensate for these mistakes.

I wouldn't trust such list, too many organizations set up their newsletters
to anyone they (n)ever communicated with...

>On a related note, how about emails with only an image attachment? People
>use email to send pictures, screenshots and other emails with nothing in
>the body and sometimes even no subject, but aren't spam. The ones I see in
>the quarantine are almost always ham, and despite training them as ham
>(even with --max-size 0), they continue to be tagged as spam.

There are a few rules supposed to catch short/empty messages with image
attachment.

There is ExtractText plugin that supports OCR scanning with tesseract, which
should be able to extract any text in those images. But note that OCR takes
time.

>I've always also had difficulty with marking them so DCC ignores them.

yes, from DCC's point of view they are empty messages and it's hard to score
anything besides EMPTY_MESSAGE and rules I mentioned above.

--
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Microsoft dick is soft to do no harm
Re: BAYES_00 BODY. Negative score? [ In reply to ]
he should not compare all the tokens but a rapid survey on the tokens
derived from headers can tell him how the bayes result was formed.

A couple of weeks ago some phishing reached our inboxes. Our custom rule
gave the message 5 points but I was surprised that the message was
categorized BAYES_00, -1.9.

I run the bayes debug and found that clearly spam words were not recognized
as spammy. Then I discovered that one admin enable auto-learning by mistake
and the database was full of garbage...

I cleared the db, reloaded it with our hand-selected corpus and the message
was now BAYES_50.



On Wed, Feb 15, 2023 at 3:27 PM Matus UHLAR - fantomas <uhlar@fantomas.sk>
wrote:

> On 15.02.23 14:53, hg user wrote:
> >If you run spamassasin with -D bayes -t xxx 2>debug.log
> >
> >in debug.log you will see all the "tokens" the bayes system extracts
> >from the headers and you will probably find a lot of them related to
> >mailing lists.
> >
> >If you teach SA that those tokens are spam and they are present both
> >in WP or Forbes, their emails will be flagged. It's normal.
>
> Don't expect anyone to manually compare tokens, unless they are deeply
> debugging bayes functionality.
>
> Simply said, bayes DOES gather all possible tokens and compare their
> occurence with interesting effectivity - if you train Forbes and WP
> newsletters as ham, and other newsletters as spam, bayes should be able to
> distinguish them quite nicely.
>
> However, many of tokens in even Forbes and WP newsletters may occure in
> different spamy newsletters, so be careful when traning even these.
>
> If you get the score down enough not to be classified as spam, you've won
> and should not contine (unless you are willing to check all BAYES_0 mail
> for
> suspicious newsletters and train those as spam, seeing how much it affects
> mentioned Forbes and WP newsletters.
>
> Bayes training is great, but one should be careful about that.
>
>
> >If you want you can use bayes_ignore_header to ignore some headers.
>
> this rarely helps.
>
>
> >On 2/15/23, Matus UHLAR - fantomas <uhlar@fantomas.sk> wrote:
> >>>>*-1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1%
> >>>> >* [score: 0.0000]
> >>>>
> >>>> This indicates a mistrained database, which means you have trained too
> >>>> many
> >>>> spams or spam-like messages (commercial messages) as ham.
> >>>>
> >>>> Proper training of spams should help. Just keep your spam (and
> >>>> optionally
> >>>> ham) corpora for retraining in case you would drop the database.
> >>>>
> >>>> I also recommend to abstain from training commercial mail (notices
> from
> >>>> e-shops, companies you done business with etc) as ham, unless they
> >>>> generate
> >>>> BAYES_999 score and you want it lower. I often train them as spam so
> >>>> those
> >>>> give uncertain BAYES_50 result.
> >>
> >> On 14.02.23 23:05, Alex wrote:
> >>>Is there any ability to distinguish a legitimate newsletter from a spam
> >>>newsletter?
> >>
> >> Very hard.
> >>
> >> That's why I recommend not to train newsletters unless you know
> you/users
> >> want them and they produce BAYES_99 result.
> >>
> >>
> >>>In other words, if I train emails from Forbes or Washington Post as ham,
> >>>then train similar newsletter emails from other other providers that are
> >>>more suspect, will bayes still be able to distinguish Forbes and WP as
> >>> ham?
> >>
> >>>The problem is that if I avoid training newsletters or bulk email
> >>>altogether, then I'm also left with spam newsletters still only hitting
> >>>bayes50.
> >>
> >> If you only do this for Forbes or Washington Post, bayes will likely be
> able
> >>
> >> to distinguish other newsletters, if you train those as spam.
> >>
> >>>I'm actually in a situation now where Forbes and WP newsletters are
> being
> >>>marked as spam, so considering retraining, but wondering what
> >>> approach/best
> >>>practices I should be following.
> >>
> >> This should be safe. There are many types of newsletters, the problem
> would
> >>
> >> only be if you started training them as ham unless you really know they
> are
> >>
> >> welcome.
> >>
> >> --
> >> Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
> >> Warning: I wish NOT to receive e-mail advertising to this address.
> >> Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
> >> WinError #99999: Out of error messages.
> >>
>
> --
> Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
> Warning: I wish NOT to receive e-mail advertising to this address.
> Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
> Save the whales. Collect the whole set.
>
Re: BAYES_00 BODY. Negative score? [ In reply to ]
On 2/14/2023 6:09 PM, joe a wrote:
> Please let this sit for a while, I've discovered a fundamental issue
> with my scheme of feeding messages to BAYES.  Unfortunately I was
> remiss, apparently, it setting up logging for some bits, so have no idea
> how long this has been failing.
>
> Sorry for the clutter.
>
> joe a.
>

Re-energized having recently heroically wrestled an elusive issue (to
me) into surrender . . . we now turn to another issue.

Probably I need to retrain BAYES "From scratch". I have a mess (years?)
of stored sample emails that and be relearned.

I understand that sa-learn should be run as the same user as spamd,
however I find it has always been run as root and when running as the
spamassassin user results in errors, such as:

~su -c "sa-learn --spam /var/mail/spamd/Cabinet.Missed-SPAM" spamfilter

results in errors, starting with:

plugin: failed to parse plugin (from @INC): Can't locate
Mail/SpamAssassin/Plugin/SpamCop.pm:
lib/Mail/SpamAssassin/Plugin/SpamCop.pm: Permission denied at (eval 44)
line 1.

plugin: failed to parse plugin (from @INC): Can't locate
Mail/SpamAssassin/Plugin/AutoLearnThreshold.pm:
lib/Mail/SpamAssassin/Plugin/AutoLearnThreshold.pm: Permission denied at
(eval 45) line 1.

One might presume this to be a permissions issue (where would I get THAT
idea?) but permissions to what? As I cannot seem to find the items
mentioned even as root.

Running with the -D option does produce more, after that list of
permission denied items

Feb 16 15:55:30.884 [10384] dbg: config: warning: no description set for
STOX_REPLY_TYPE_WITHOUT_QUOTES
Feb 16 15:55:30.884 [10384] dbg: config: warning: no description set for
MSOE_MID_WRONG_CASE
Feb 16 15:55:30.884 [10384] dbg: config: warning: no description set for
HELO_FRIEND
Feb 16 15:55:30.884 [10384] dbg: config: warning: no description set for
STOX_AND_PRICE
Feb 16 15:55:30.884 [10384] dbg: config: warning: no description set for
L_SPAM_TOOL_13
Feb 16 15:55:30.885 [10384] dbg: config: warning: no description set for
FSL_FAKE_HOTMAIL_RVCD

Means something to someone I guess.
Re: BAYES_00 BODY. Negative score? [ In reply to ]
On 2/16/2023 4:30 PM, Reindl Harald wrote:
>
>
> Am 16.02.23 um 21:57 schrieb joe a:
>> I understand that sa-learn should be run as the same user as spamd,
>> however I find it has always been run as root and when running as the
>> spamassassin user results in errors, such as:
>>
>> ~su -c "sa-learn --spam /var/mail/spamd/Cabinet.Missed-SPAM" spamfilter
>>
>> results in errors, starting with:
>>
>> plugin: failed to parse plugin (from @INC): Can't locate
>> Mail/SpamAssassin/Plugin/SpamCop.pm:
>> lib/Mail/SpamAssassin/Plugin/SpamCop.pm: Permission denied at (eval
>> 44) line 1.
>>
>> plugin: failed to parse plugin (from @INC): Can't locate
>> Mail/SpamAssassin/Plugin/AutoLearnThreshold.pm:
>> lib/Mail/SpamAssassin/Plugin/AutoLearnThreshold.pm: Permission denied
>> at (eval 45) line 1.
>>
>> One might presume this to be a permissions issue (where would I get
>> THAT idea?) but permissions to what?  As I cannot seem to find the
>> items mentioned even as root.
>
> when you don't use proper packages and even can't update your mlocate
> database so that "locate SpamAssassin/Plugin/AutoLearnThreshold" that's
> hardly a SA topic
>
> [root@mail-gw:~]$ rpm -q --file
> /usr/share/perl5/vendor_perl/Mail/SpamAssassin/Plugin/AutoLearnThreshold.pm
> spamassassin-3.4.6-5.fc36.x86_64
>
> [root@mail-gw:~]$ rpm -q --file
> /usr/share/perl5/vendor_perl/Mail/SpamAssassin/Plugin/SpamCop.pm
> spamassassin-3.4.6-5.fc36.x86_64

I have no idea what you refer to when you state "don't user proper
packages". "Proper" in what sense? A rhetorical question.

Mlocate is (was) not installed in this particular system but promises to
be useful in the future, regardless of your intent. "find" has always
been my go to tool. Such as it is.

Still it remains to be determined why root user can run sa-learn without
error while another whose permissions are more constrained, cannot.

And that, regardless of root (!) cause, would seem to be an SA topic.
Re: BAYES_00 BODY. Negative score? [ In reply to ]
On Thu, Feb 16, 2023 at 9:57 PM joe a <joea-lists@j4computers.com> wrote:

>
> plugin: failed to parse plugin (from @INC): Can't locate
> Mail/SpamAssassin/Plugin/SpamCop.pm:
> lib/Mail/SpamAssassin/Plugin/SpamCop.pm: Permission denied at (eval 44)
> line 1.
>

root can do anything. a restricted user can't: it's only allowed to do what
others allowed it.

it also runs with another environment, so it may miss PATHes or @INC
directories.

You should locate the SpamCop.pm file and list the owner and ACL.

As user spamfilter run spamassassin with -D and see in the first lines if
you have similar errors.

Also check permission of /var/mail/spamd/Cabinet.Missed-SPAM. I had
permission problems trying to sa-learn files owned by root.



> Running with the -D option does produce more, after that list of
> permission denied items
>
> Feb 16 15:55:30.884 [10384] dbg: config: warning: no description set for
> STOX_REPLY_TYPE_WITHOUT_QUOTES
>

These are not permission errors but warnings about the rules having no text
descriptions. It's ok.


>
>
Re: BAYES_00 BODY. Negative score? [ In reply to ]
. . .
>>
>> I have no idea what you refer to when you state "don't user proper
>> packages".  "Proper" in what sense? A rhetorical question.
>
> i have no idea how you installed SA but rpm packages or debs usually
> have correct permissions

Oh, of course. I installed as root initially, being foolish perhaps,
but did create a specific user "later" and adjusted permissions as
needed. Or, so I thought.

>> Mlocate is (was) not installed in this particular system but promises
>> to be useful in the future, regardless of your intent.  "find" has
>> always been my go to tool.  Such as it is.
>>
>> Still it remains to be determined why root user can run sa-learn
>> without error while another whose permissions are more constrained,
>> cannot.
>>
>> And that, regardless of root (!) cause, would seem to be an SA topic
>
> because the file permissions are obviously wrong which isn't a SA topic
> - SA can't do anything when you mess your local permissions
>

Permissions are (almost) certainly the issue. Now having the impressive
locate/mlocate creature at my command, I might actually make progress.

Thanks for the help.
Re: BAYES_00 BODY. Negative score? [ In reply to ]
On 2/16/2023 5:32 PM, hg user wrote:
>
>
> On Thu, Feb 16, 2023 at 9:57 PM joe a <joea-lists@j4computers.com
> <mailto:joea-lists@j4computers.com>> wrote:
>
>
> plugin: failed to parse plugin (from @INC): Can't locate
> Mail/SpamAssassin/Plugin/SpamCop.pm:
> lib/Mail/SpamAssassin/Plugin/SpamCop.pm: Permission denied at (eval 44)
> line 1.
>
>
> root can do anything. a restricted user can't: it's only allowed to do
> what others allowed it.
>
> it also runs with another environment, so it may miss PATHes or @INC
> directories.

That throws me a curve. What is an @INC directory? SA specific?
I do not find any with the locate command, but if the are an actual
directory may need to escape the @ sign somehow. \ does not seem to do it.

> You should locate the SpamCop.pm file and list the owner and ACL.

This I have done, with no change, even to the point of starting using _R
option at /usr/lib/perl5/vendor_perl/5.26.1/Mail


> As user spamfilter run spamassassin with -D and see in the first lines
> if you have similar errors.

Done that. It is impressively more verbose, but I did not detect any
more errors.

> Also check permission of /var/mail/spamd/Cabinet.Missed-SPAM. I had
> permission problems trying to sa-learn files owned by root.
>

That I found and fixed some time back.

>
> Running with the -D option does produce more, after that list of
> permission denied items
>
> Feb 16 15:55:30.884 [10384] dbg: config: warning: no description set
> for
> STOX_REPLY_TYPE_WITHOUT_QUOTES
>
>
> These are not permission errors but warnings about the rules having no
> text descriptions. It's ok.
>
>
>
Re: BAYES_00 BODY. Negative score? [ In reply to ]
. . .
>> it also runs with another environment, so it may miss PATHes or @INC
>> directories.
>
> That throws me a curve.  What is an @INC directory?  SA specific?
> I do not find any with the locate command, but if the are an actual
> directory may need to escape the @ sign somehow.  \ does not seem to do it.
>

I being to see. It is a perl thing. I knew I should not have left that
camel at the oasis.
Re: BAYES_00 BODY. Negative score? [ In reply to ]
On Thu, Feb 16, 2023 at 05:34:37PM -0500, joe a wrote:
> Oh, of course. I installed as root initially, being foolish perhaps, but
> did create a specific user "later" and adjusted permissions as needed. Or,
> so I thought.

well, installing as root (especially with restrictive umask) manually
(e.g. "make install" or "cpan" vs. "yum/rpm/dpkg") may often make
problems, even if you later switch to packages (you need to look not
only at final file permissions, but at directories leading up to it
too).

namei -l /path/to/file.pm is often helpful to quickly check ALL
permissions needed to access file (+x on directories is a must)

> Permissions are (almost) certainly the issue. Now having the impressive
> locate/mlocate creature at my command, I might actually make progress.

I usually troubleshoot those (if log is insufficient) with:

strace -efile -o /tmp/sa.log spamassassin foobar

then look at /tmp/sa.log to see which open/stat/access returned -1 EPERM
or EACCES error. Then check all path components for that file using
"namei -l" (or multiple "ls -ld"). Then try to su to that user and
"cat" that file manually.

If not regular DAC (chmod/chown) permissions, it might also be SELINUX
restrictions or more rarely ACL (getfacl(1)).

--
Opinions above are GNU-copylefted.
Re: BAYES_00 BODY. Negative score? [ In reply to ]
On 2/16/2023 8:28 PM, Matija Nalis wrote:
>
> On Thu, Feb 16, 2023 at 05:34:37PM -0500, joe a wrote:
>> Oh, of course. I installed as root initially, being foolish perhaps, but
>> did create a specific user "later" and adjusted permissions as needed. Or,
>> so I thought.
>
> well, installing as root (especially with restrictive umask) manually
> (e.g. "make install" or "cpan" vs. "yum/rpm/dpkg") may often make
> problems, even if you later switch to packages (you need to look not
> only at final file permissions, but at directories leading up to it
> too).
>
> namei -l /path/to/file.pm is often helpful to quickly check ALL
> permissions needed to access file (+x on directories is a must)
>
>> Permissions are (almost) certainly the issue. Now having the impressive
>> locate/mlocate creature at my command, I might actually make progress.
>
> I usually troubleshoot those (if log is insufficient) with:
>
> strace -efile -o /tmp/sa.log spamassassin foobar
>
> then look at /tmp/sa.log to see which open/stat/access returned -1 EPERM
> or EACCES error. Then check all path components for that file using
> "namei -l" (or multiple "ls -ld"). Then try to su to that user and
> "cat" that file manually.
>
> If not regular DAC (chmod/chown) permissions, it might also be SELINUX
> restrictions or more rarely ACL (getfacl(1)).
>

Well, I am in unfamiliar waters.

picking one error message as typical:

plugin: failed to parse plugin (from @INC): Can't locate
Mail/SpamAssassin/Plugin/iXhash2.pm:
lib/Mail/SpamAssassin/Plugin/iXhash2.pm: Permission denied at (eval
1746) line 1.

The file locations shown do not exist, as explicitly as shown. What I
find using "locate iXhash2.pm" is:

/usr/lib/perl5/vendor_perl/5.26.1/Mail/SpamAssassin/Plugin/iXhash2.pm
which the SA user can access, at least see via ll. The others I've
checked are also visible, and directories are x (exccutable).

The sense I am getting is there is a perl file that contains these paths
that is referred to as @INC.

I don't have the knowledge at this point to see if, somehow, root sees
the files as shown in the error or if the path is somehow altered for
the SA user.

Thanks for any guidance.

1 2  View All