Mailing List Archive

Bayes always reject.
Hello all,
I'm facing a strange problem.
I've feed the bayes db for a while and now I would like to put it in use
but all messages get a BAYES_99 and very high spam point.
I would like to understand why, and troubleshoot this problem but I can't
find a way.
Spamassassin version is:
root@puma:~# spamassassin --version
SpamAssassin version 3.4.6
running on Perl version 5.22.2
This is the sa_learn --dump magic:
root@puma:~# sa-learn --dump magic
0.000 0 3 0 non-token data: bayes db version
0.000 0 130610 0 non-token data: nspam
0.000 0 316040 0 non-token data: nham
0.000 0 136493 0 non-token data: ntokens
0.000 0 1695915149 0 non-token data: oldest atime
0.000 0 1702447561 0 non-token data: newest atime
0.000 0 1702449197 0 non-token data: last journal sync
atime
0.000 0 1701476495 0 non-token data: last expiry atime
0.000 0 5529600 0 non-token data: last expire atime
delta
0.000 0 34998 0 non-token data: last expire
reduction count
and this is the spamassassin --lint -D:
root@puma:~# spamassassin -D --lint 2>&1 | grep -i bay
Dec 13 07:39:07.885 [26545] dbg: plugin: loading
Mail::SpamAssassin::Plugin::Bayes from @INC
Dec 13 07:39:08.005 [26545] dbg: config: fixed relative path:
/var/lib/spamassassin/3.004006/updates_spamassassin_org/23_bayes.cf
Dec 13 07:39:08.005 [26545] dbg: config: using
"/var/lib/spamassassin/3.004006/updates_spamassassin_org/23_bayes.cf" for
included file
Dec 13 07:39:08.005 [26545] dbg: config: read file
/var/lib/spamassassin/3.004006/updates_spamassassin_org/23_bayes.cf
Dec 13 07:39:08.047 [26545] dbg: config: fixed relative path:
/var/lib/spamassassin/3.004006/updates_spamassassin_org/
60_bayes_stopwords.cf
Dec 13 07:39:08.047 [26545] dbg: config: using
"/var/lib/spamassassin/3.004006/updates_spamassassin_org/
60_bayes_stopwords.cf" for included file
Dec 13 07:39:08.047 [26545] dbg: config: read file
/var/lib/spamassassin/3.004006/updates_spamassassin_org/
60_bayes_stopwords.cf
Dec 13 07:39:08.292 [26545] dbg: shortcircuit: adding BAYES_99 using
abbreviation spam
Dec 13 07:39:08.292 [26545] dbg: shortcircuit: adding BAYES_00 using
abbreviation ham
Dec 13 07:39:08.586 [26545] dbg: plugin:
Mail::SpamAssassin::Plugin::Bayes=HASH(0x5cca570) implements 'learner_new',
priority 0
Dec 13 07:39:08.586 [26545] dbg: bayes: learner_new
self=Mail::SpamAssassin::Plugin::Bayes=HASH(0x5cca570),
bayes_store_module=Mail::SpamAssassin::BayesStore::DBM
Dec 13 07:39:08.594 [26545] dbg: bayes: learner_new: got
store=Mail::SpamAssassin::BayesStore::DBM=HASH(0x6a51bb0)
Dec 13 07:39:08.594 [26545] dbg: plugin:
Mail::SpamAssassin::Plugin::Bayes=HASH(0x5cca570) implements
'learner_is_scan_available', priority 0
Dec 13 07:39:08.595 [26545] dbg: bayes: tie-ing to DB file R/O
/var/spamassasin/bayes_toks
Dec 13 07:39:08.595 [26545] dbg: bayes: tie-ing to DB file R/O
/var/spamassasin/bayes_seen
Dec 13 07:39:08.595 [26545] dbg: bayes: found bayes db version 3
Dec 13 07:39:08.595 [26545] dbg: bayes: DB journal sync: last sync:
1702449197
Dec 13 07:39:08.621 [26545] dbg: bayes: DB journal sync: last sync:
1702449197
Dec 13 07:39:08.621 [26545] dbg: bayes: corpus size: nspam = 130610, nham =
316040
Dec 13 07:39:08.622 [26545] dbg: bayes: tokenized body: 120 tokens
Dec 13 07:39:08.622 [26545] dbg: bayes: tokenized uri: 0 tokens
Dec 13 07:39:08.622 [26545] dbg: bayes: tokenized invisible: 0 tokens
Dec 13 07:39:08.623 [26545] dbg: bayes: tokenized header: 14 tokens
Dec 13 07:39:08.623 [26545] dbg: bayes: score = 0.976034467829266
Dec 13 07:39:08.624 [26545] dbg: bayes: DB expiry: tokens in DB: 136493,
Expiry max size: 150000, Oldest atime: 1695915149, Newest atime:
1702447561, Last expire: 1701476495, Current time: 1702449548
Dec 13 07:39:08.624 [26545] dbg: bayes: DB journal sync: last sync:
1702449197
Dec 13 07:39:08.624 [26545] dbg: bayes: untie-ing
Dec 13 07:39:08.624 [26545] dbg: check: tagrun - tag BAYESTCHAMMY is now
ready, value: 0
Dec 13 07:39:08.624 [26545] dbg: check: tagrun - tag BAYESTCSPAMMY is now
ready, value: 2
Dec 13 07:39:08.624 [26545] dbg: check: tagrun - tag BAYESTCLEARNED is now
ready, value: 4
Dec 13 07:39:08.624 [26545] dbg: check: tagrun - tag BAYESTC is now ready,
value: 20
Dec 13 07:39:08.628 [26545] dbg: rules: ran eval rule BAYES_95 ======> got
hit (1)
Dec 13 07:39:08.863 [26545] dbg: check:
tests=BAYES_95,MISSING_DATE,MISSING_HEADERS,NO_RECEIVED,NO_RELAYS,T_SCC_BODY_TEXT_LINE
Dec 13 07:39:08.864 [26545] dbg: timing: total 1004 ms - init: 738 (73.5%),
parse: 0.85 (0.1%), extract_message_metadata: 1.10 (0.1%),
get_uri_detail_list: 3.9 (0.4%), tests_pri_-2000: 4.3 (0.4%), compile_gen:
85 (8.5%), compile_eval: 13 (1.3%), tests_pri_-1000: 3.6 (0.4%),
tests_pri_-950: 2.8 (0.3%), tests_pri_-900: 4.2 (0.4%), tests_pri_-100: 7
(0.7%), check_bayes: 3.9 (0.4%), b_tokenize: 2.1 (0.2%), b_tok_get_all:
0.22 (0.0%), b_comp_prob: 0.18 (0.0%), b_tok_touch_all: 0.02 (0.0%),
b_finish: 0.77 (0.1%), tests_pri_-90: 3.4 (0.3%), tests_pri_0: 169 (16.9%),
tests_pri_20: 2.5 (0.2%), tests_pri_30: 2.4 (0.2%), tests_pri_500: 59 (5.9%)

The strangest thing I've seen is that in the "--lint" I cannot see all the
BAYES_xx rules ( as for another mail server I've around ):
[root@vps676475 ~]# spamassassin -D --lint 2>&1 | grep -i bay
Dec 13 07:45:10.044 [12497] dbg: plugin: loading
Mail::SpamAssassin::Plugin::Bayes from @INC
Dec 13 07:45:10.178 [12497] dbg: config: added tld list - xn--mgbai9azgqp6j
xn--mgbayh7gpa xn--mgbb9fbpob xn--mgbbh1a xn--mgbbh1a71e
Dec 13 07:45:10.180 [12497] dbg: config: added tld list - barefoot bargains
baseball basketball bauhaus bayern bb bbc bbt bbva bcg bcn bd
Dec 13 07:45:10.332 [12497] dbg: config: fixed relative path:
/var/lib/spamassassin/3.004002/updates_spamassassin_org/23_bayes.cf
Dec 13 07:45:10.332 [12497] dbg: config: using
"/var/lib/spamassassin/3.004002/updates_spamassassin_org/23_bayes.cf" for
included file
Dec 13 07:45:10.333 [12497] dbg: config: read file
/var/lib/spamassassin/3.004002/updates_spamassassin_org/23_bayes.cf
Dec 13 07:45:10.334 [12497] dbg: *config: body eval rule name is BAYES_00
function is check_bayes*('0.00', '0.01')
Dec 13 07:45:10.334 [12497] dbg: *config: body eval rule name is BAYES_05
function is check_bayes*('0.01', '0.05')
Dec 13 07:45:10.334 [12497] dbg: *config: body eval rule name is BAYES_20
function is check_bayes*('0.05', '0.20')
Dec 13 07:45:10.334 [12497] dbg: config: body eval rule name is BAYES_40
function is check_bayes('0.20', '0.40')
Dec 13 07:45:10.334 [12497] dbg: config: body eval rule name is BAYES_50
function is check_bayes('0.40', '0.60')
Dec 13 07:45:10.334 [12497] dbg: config: body eval rule name is BAYES_60
function is check_bayes('0.60', '0.80')
Dec 13 07:45:10.335 [12497] dbg: config: body eval rule name is BAYES_80
function is check_bayes('0.80', '0.95')
Dec 13 07:45:10.335 [12497] dbg: config: body eval rule name is BAYES_95
function is check_bayes('0.95', '0.99')
Dec 13 07:45:10.335 [12497] dbg: config: body eval rule name is BAYES_99
function is check_bayes('0.99', '1.00')
Dec 13 07:45:10.335 [12497] dbg: config: body eval rule name is BAYES_999
function is check_bayes('0.999', '1.00')
Dec 13 07:45:10.410 [12497] dbg: config: fixed relative path:
/var/lib/spamassassin/3.004002/updates_spamassassin_org/
60_bayes_stopwords.cf
( this is from another server )
but rules are in the 23_bayes.cf files in both server.

All messages, as said, on this server are rejected whith this message:
2023-12-11 18:46:57.804833500 simscan:[5345]:SPAM REJECT
(103.50/4.40/9.50):0.1085s:*****SPAM*****
test:351.266.123.112:xxx.yyy@zzz.com:zzz@kkk.it
2023-12-11 18:49:33.264702500 simscan:[5391]:SPAM REJECT
(103.50/4.40/9.50):0.1233s:*****SPAM*****
test:351.266.123.112:xxx.yyy@zzz.com:zzz@kkk.it

Where can I start looking ?

Thanks in advance
Pierluigi
Re: Bayes always reject. [ In reply to ]
On 2023-12-13 at 01:49:24 UTC-0500 (Wed, 13 Dec 2023 07:49:24 +0100)
Pierluigi Frullani <pierluigi.frullani@gmail.com>
is rumored to have said:

> Hello all,
> I'm facing a strange problem.

Not really. MANY people run into this issue...

> I've feed the bayes db for a while and now I would like to put it in
> use
> but all messages get a BAYES_99 and very high spam point.
> I would like to understand why, and troubleshoot this problem but I
> can't
> find a way.

The only reasons that can happen are:

1. All of your mail is in fact spam.
2. Your Bayes DB is mis-trained.

The fix (assuming #2) is to recreate the Bayes DB with proper training.

*IN THEORY* one could fix a corrupted DB by 'unlearning' messages which
learned incorrectly, but as a practical matter that's usually a fantasy.

Most of the scanning and DB details that you included are not useful.
You cannot fix the bad DB, you need to rebuild it.



--
Bill Cole
bill@scconsult.com or billcole@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire
Re: Bayes always reject. [ In reply to ]
> From: Pierluigi Frullani <pierluigi.frullani@gmail.com>
> Date: Wed, 13 Dec 2023 07:49:24 +0100
>
> Hello all,
> I'm facing a strange problem.

...
> tests=BAYES_95,MISSING_DATE,MISSING_HEADERS,NO_RECEIVED,NO_RELAYS,T_SCC_BODY_TEXT_LINE

How did you feed this message into SpamAssassin?
Did you do something to strip off all of the email headers?

For the BAYES_99, as already mentioned you probably need to retrain
bayes, making sure to correct any incorrectly trained email messages.

-jeff