Mailing List Archive

Why does sa-compile access the bayes db?
Dear Spamassassin users and developers,

Recently, we've been setting up Bayesian learning on our existing Amavis
with Spamassassin setup on Ubuntu 18.04 (Spamassassin
3.4.2-0ubuntu0.18.04.3 and Amavis 1:2.11.0-1ubuntu1). We've decided to
use a global db that was seeded with an aggregation of spam and ham
we've received, then enabling autolearn to further train the set. As
Spamassassin runs inside Amavis, the Bayes database files are owned by
the amavis user. This setup works fine, and results for Bayes are great
and growing in accuracy by autolearning.

What was somewhat confusing is that we noticed our daily cronjob running
sa-update and sa-compile was giving us an error concerning permissions:
May 25 00:31:25.488 [8381] warn: bayes: cannot write to
/var/lib/spamassassin/bayes_db/bayes_journal, bayes db update ignored:
Permission denied
bayes: cannot write to /var/lib/spamassassin/bayes_db/bayes_journal,
bayes db update ignored: Permission denied

While this makes a lot of sense, considering that the files are owned by
the amavis user, we were quite surprised this cronjob would need to
access these files in the first place. Looking further into the issue,
we figured out it was specifically sa-compile, and the specific message
probably originated from
/usr/share/perl5/Mail/SpamAssassin/BayesStore/DBM.pm. While I have some
programming experience, I was sadly unable to understand this Perl file
enough to properly comprehend why this code was accessing bayes_journal
and what it was planning to do there.

My question therefore specifically is: what exactly does sa-compile do
to the bayes database files?

I've asked this same question on IRC but was unable to get an answer.
While a fix for this issue changing permissions and user/group ownership
is rather obvious, we'd first want to understand what sa-compile is up to.

Kind regards,
Bert Van de Poel
ULYSSIS
Re: Why does sa-compile access the bayes db? [ In reply to ]
On Mon, 25 May 2020 23:34:27 +0200
Bert Van de Poel wrote:


> My question therefore specifically is: what exactly does sa-compile
> do to the bayes database files?

I don't know for sure, but it's probably just a side-effect of
initializing plugins. Possibly it's trying to perform an opportunistic
sync on the journal file.

sa-compile doesn't need to access Bayes, so you could just treat it as
a cosmetic error. I wouldn't change ownership or permissions just for
this.
Re: Why does sa-compile access the bayes db? [ In reply to ]
Plugin initialization+journal sync would make a lot of sense.

What would be the cleanest solution in that case? It's quite annoying to
receive the same error mail every day. Should we use --cnf to disable
the bayes plugin, or is there a more elegant solution? Should we file a
bug about this?


On 26/05/2020 00:45, RW wrote:
> On Mon, 25 May 2020 23:34:27 +0200
> Bert Van de Poel wrote:
>
>
>> My question therefore specifically is: what exactly does sa-compile
>> do to the bayes database files?
> I don't know for sure, but it's probably just a side-effect of
> initializing plugins. Possibly it's trying to perform an opportunistic
> sync on the journal file.
>
> sa-compile doesn't need to access Bayes, so you could just treat it as
> a cosmetic error. I wouldn't change ownership or permissions just for
> this.
Re: Why does sa-compile access the bayes db? [ In reply to ]
On 25.05.20 23:34, Bert Van de Poel wrote:
>Recently, we've been setting up Bayesian learning on our existing
>Amavis with Spamassassin setup on Ubuntu 18.04 (Spamassassin
>3.4.2-0ubuntu0.18.04.3 and Amavis 1:2.11.0-1ubuntu1). We've decided to
>use a global db that was seeded with an aggregation of spam and ham
>we've received, then enabling autolearn to further train the set. As
>Spamassassin runs inside Amavis, the Bayes database files are owned by
>the amavis user. This setup works fine, and results for Bayes are
>great and growing in accuracy by autolearning.
>
>What was somewhat confusing is that we noticed our daily cronjob
>running sa-update and sa-compile was giving us an error concerning
>permissions:
>May 25 00:31:25.488 [8381] warn: bayes: cannot write to
>/var/lib/spamassassin/bayes_db/bayes_journal, bayes db update ignored:
>Permission denied
>bayes: cannot write to /var/lib/spamassassin/bayes_db/bayes_journal,
>bayes db update ignored: Permission denied

I wonder where did these files come from.
did you sety bayes_path in /etc/spamassassin/ ?


--
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
99 percent of lawyers give the rest a bad name.
Re: Why does sa-compile access the bayes db? [ In reply to ]
We're using a global bayes_path defined in local.cf:

use_bayes 1
use_bayes_rules 1
bayes_auto_learn 1
bayes_expiry_max_db_size 1500000
bayes_path /var/lib/spamassassin/bayes_db/bayes
bayes_file_mode 0775
bayes_ignore_to spam-analysis@ulyssis.org
bayes_ignore_from spam-analysis@ulyssis.org
bayes_auto_learn_threshold_nonspam 0.1
bayes_auto_learn_threshold_spam 10.0

score BAYES_00          -0.001 -0.001 -0.001 -0.001
score BAYES_05          -0.001 -0.001 -0.001 -0.001
score BAYES_20          -0.001 -0.001 -0.001 -0.001
score BAYES_40          -0.001 -0.001 -0.001 -0.001
score BAYES_50          0.001 0.001 0.001 0.001
score BAYES_60          0.001 0.001 0.001 0.001
score BAYES_80          0.001 0.001 0.001 0.001
score BAYES_95          0.001 0.001 0.001 0.001
score BAYES_99          0.001 0.001 0.001 0.001
score BAYES_999         0.001 0.001 0.001 0.001

Currently we're still evaluating the amount of false positives (and
contacting users who seem to have broken cronjobs that confuse bayes)
before taking away the artificial scores. We wanted to clear up our
sa-compile cronjob error.


On 28/05/2020 10:18, Matus UHLAR - fantomas wrote:
> On 25.05.20 23:34, Bert Van de Poel wrote:
>> Recently, we've been setting up Bayesian learning on our existing
>> Amavis with Spamassassin setup on Ubuntu 18.04 (Spamassassin
>> 3.4.2-0ubuntu0.18.04.3 and Amavis 1:2.11.0-1ubuntu1). We've decided
>> to use a global db that was seeded with an aggregation of spam and
>> ham we've received, then enabling autolearn to further train the set.
>> As Spamassassin runs inside Amavis, the Bayes database files are
>> owned by the amavis user. This setup works fine, and results for
>> Bayes are great and growing in accuracy by autolearning.
>>
>> What was somewhat confusing is that we noticed our daily cronjob
>> running sa-update and sa-compile was giving us an error concerning
>> permissions:
>> May 25 00:31:25.488 [8381] warn: bayes: cannot write to
>> /var/lib/spamassassin/bayes_db/bayes_journal, bayes db update
>> ignored: Permission denied
>> bayes: cannot write to /var/lib/spamassassin/bayes_db/bayes_journal,
>> bayes db update ignored: Permission denied
>
> I wonder where did these files come from.
> did you sety bayes_path in /etc/spamassassin/  ?
>
>
Re: Why does sa-compile access the bayes db? [ In reply to ]
On 28.05.20 13:38, Bert Van de Poel wrote:
>We're using a global bayes_path defined in local.cf:

This is your problem imho.

if you use amavis, you need no bayes database, but amavis users',
i guess in /var/lib/amavis/.spamassassin/


>On 28/05/2020 10:18, Matus UHLAR - fantomas wrote:
>>On 25.05.20 23:34, Bert Van de Poel wrote:
>>>Recently, we've been setting up Bayesian learning on our existing
>>>Amavis with Spamassassin setup on Ubuntu 18.04 (Spamassassin
>>>3.4.2-0ubuntu0.18.04.3 and Amavis 1:2.11.0-1ubuntu1). We've
>>>decided to use a global db that was seeded with an aggregation of
>>>spam and ham we've received, then enabling autolearn to further
>>>train the set. As Spamassassin runs inside Amavis, the Bayes
>>>database files are owned by the amavis user. This setup works
>>>fine, and results for Bayes are great and growing in accuracy by
>>>autolearning.
>>>
>>>What was somewhat confusing is that we noticed our daily cronjob
>>>running sa-update and sa-compile was giving us an error concerning
>>>permissions:
>>>May 25 00:31:25.488 [8381] warn: bayes: cannot write to
>>>/var/lib/spamassassin/bayes_db/bayes_journal, bayes db update
>>>ignored: Permission denied
>>>bayes: cannot write to
>>>/var/lib/spamassassin/bayes_db/bayes_journal, bayes db update
>>>ignored: Permission denied
>>
>>I wonder where did these files come from.
>>did you sety bayes_path in /etc/spamassassin/? ?
>>
>>

--
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
I intend to live forever - so far so good.
Re: Why does sa-compile access the bayes db? [ In reply to ]
Almost all of the email we process are forwarders. It doesn't really
make sense for us to do a non-global bayes db. The large majority of
email we process is also for a uniform group: student organizations at
our local university.

On 28/05/2020 15:22, Matus UHLAR - fantomas wrote:
> On 28.05.20 13:38, Bert Van de Poel wrote:
>> We're using a global bayes_path defined in local.cf:
>
> This is your problem imho.
>
> if you use amavis, you need no bayes database, but amavis users',
> i guess in /var/lib/amavis/.spamassassin/
>
>
>> On 28/05/2020 10:18, Matus UHLAR - fantomas wrote:
>>> On 25.05.20 23:34, Bert Van de Poel wrote:
>>>> Recently, we've been setting up Bayesian learning on our existing
>>>> Amavis with Spamassassin setup on Ubuntu 18.04 (Spamassassin
>>>> 3.4.2-0ubuntu0.18.04.3 and Amavis 1:2.11.0-1ubuntu1). We've decided
>>>> to use a global db that was seeded with an aggregation of spam and
>>>> ham we've received, then enabling autolearn to further train the
>>>> set. As Spamassassin runs inside Amavis, the Bayes database files
>>>> are owned by the amavis user. This setup works fine, and results
>>>> for Bayes are great and growing in accuracy by autolearning.
>>>>
>>>> What was somewhat confusing is that we noticed our daily cronjob
>>>> running sa-update and sa-compile was giving us an error concerning
>>>> permissions:
>>>> May 25 00:31:25.488 [8381] warn: bayes: cannot write to
>>>> /var/lib/spamassassin/bayes_db/bayes_journal, bayes db update
>>>> ignored: Permission denied
>>>> bayes: cannot write to
>>>> /var/lib/spamassassin/bayes_db/bayes_journal, bayes db update
>>>> ignored: Permission denied
>>>
>>> I wonder where did these files come from.
>>> did you sety bayes_path in /etc/spamassassin/? ?
>>>
>>>
>
Re: Why does sa-compile access the bayes db? [ In reply to ]
On 28.05.20 15:32, Bert Van de Poel wrote:
>Almost all of the email we process are forwarders. It doesn't really
>make sense for us to do a non-global bayes db. The large majority of
>email we process is also for a uniform group: student organizations at
>our local university.

you have apparently missed what I said before, so I repeat:

you said you use amavis. amavis daemon runs (usually) under amavis user.
Therefore, all mails processed by amavis use amavis' bayes database stored
in amavis home directory.

move the database to amavis' home (and chown it to the amavis user):

# ls -la ~amavis/.spamassassin/
total 41368
drwx------ 2 amavis amavis 4096 May 28 16:59 .
drwxr-x--- 7 amavis amavis 4096 May 28 06:50 ..
-rw------- 1 amavis amavis 89136 May 28 17:01 bayes_journal
-rw------- 1 amavis amavis 21065728 May 28 16:59 bayes_seen
-rw------- 1 amavis amavis 40144896 May 28 16:59 bayes_toks
-rw-r--r-- 1 amavis amavis 2304 May 5 12:41 user_prefs

Then remove global setting of bayes database in /etc/spamassassin/local.cf
and your problem will most probably to away.

>>On 28.05.20 13:38, Bert Van de Poel wrote:
>>>We're using a global bayes_path defined in local.cf:

>On 28/05/2020 15:22, Matus UHLAR - fantomas wrote:
>>This is your problem imho.
>>
>>if you use amavis, you need no bayes database, but amavis users',
>>i guess in /var/lib/amavis/.spamassassin/
>>
>>
>>>On 28/05/2020 10:18, Matus UHLAR - fantomas wrote:
>>>>On 25.05.20 23:34, Bert Van de Poel wrote:
>>>>>Recently, we've been setting up Bayesian learning on our
>>>>>existing Amavis with Spamassassin setup on Ubuntu 18.04
>>>>>(Spamassassin 3.4.2-0ubuntu0.18.04.3 and Amavis
>>>>>1:2.11.0-1ubuntu1). We've decided to use a global db that was
>>>>>seeded with an aggregation of spam and ham we've received,
>>>>>then enabling autolearn to further train the set. As
>>>>>Spamassassin runs inside Amavis, the Bayes database files are
>>>>>owned by the amavis user. This setup works fine, and results
>>>>>for Bayes are great and growing in accuracy by autolearning.
>>>>>
>>>>>What was somewhat confusing is that we noticed our daily
>>>>>cronjob running sa-update and sa-compile was giving us an
>>>>>error concerning permissions:
>>>>>May 25 00:31:25.488 [8381] warn: bayes: cannot write to
>>>>>/var/lib/spamassassin/bayes_db/bayes_journal, bayes db update
>>>>>ignored: Permission denied
>>>>>bayes: cannot write to
>>>>>/var/lib/spamassassin/bayes_db/bayes_journal, bayes db update
>>>>>ignored: Permission denied
>>>>
>>>>I wonder where did these files come from.
>>>>did you sety bayes_path in /etc/spamassassin/? ?
>>>>
>>>>
>>

--
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
My mind is like a steel trap - rusty and illegal in 37 states.
Re: Why does sa-compile access the bayes db? [ In reply to ]
On 2020-05-28 10:18, Matus UHLAR - fantomas wrote:

> I wonder where did these files come from.
> did you sety bayes_path in /etc/spamassassin/ ?

setup userprefs file for amavisd, in this file make sure bayes data keep
in amavisd user, not the spamassasin user where there is no write access

hope i have dokumented this now

its not a spamassassin bug btw, its how amavisd handle data path in
spamassassin is not solved if not using databases as mysql or
postgresql, redis, sqlite have same problem as berkdb and frinds
Re: Why does sa-compile access the bayes db? [ In reply to ]
On 2020-05-28 15:22, Matus UHLAR - fantomas wrote:
> On 28.05.20 13:38, Bert Van de Poel wrote:
>> We're using a global bayes_path defined in local.cf:
>
> This is your problem imho.
>
> if you use amavis, you need no bayes database, but amavis users',
> i guess in /var/lib/amavis/.spamassassin/

if amavisd running as amavisd user, then amavisd need write access to
the bayes path in spamassassin, pure simple

check user-prefs file in .spamassassin is correct

when amavisd is restarted this user-prefs is used
Re: Why does sa-compile access the bayes db? [ In reply to ]
On 2020-05-28 15:32, Bert Van de Poel wrote:
> Almost all of the email we process are forwarders. It doesn't really
> make sense for us to do a non-global bayes db. The large majority of
> email we process is also for a uniform group: student organizations at
> our local university.

does not matter if bayes database have no write access
Re: Why does sa-compile access the bayes db? [ In reply to ]
Oh, I had misunderstood you, Matus. My bad! I thought you meant we
should use a separate bayes db for every mailbox user, but now I
understand you were referring to the amavis user which indeed runs
everything.

I just moved the existing bayes db (after stopping amavis of course) to
the amavis user's .spamassassin folder and removed the path from
local.cf and it seems to work just fine and indeed solves our issue with
sa-compile. Thank you very much for the suggestion. This is a much
cleaner solution than what I had initially in mind!


On 28/05/2020 17:03, Matus UHLAR - fantomas wrote:
> On 28.05.20 15:32, Bert Van de Poel wrote:
>> Almost all of the email we process are forwarders. It doesn't really
>> make sense for us to do a non-global bayes db. The large majority of
>> email we process is also for a uniform group: student organizations
>> at our local university.
>
> you have apparently missed what I said before, so I repeat:
>
> you said you use amavis.? amavis daemon runs (usually) under amavis
> user. Therefore, all mails processed by amavis use amavis' bayes
> database stored
> in amavis home directory.
>
> move the database to amavis' home (and chown it to the amavis user):
>
> # ls -la ~amavis/.spamassassin/
> total 41368
> drwx------ 2 amavis amavis???? 4096 May 28 16:59 .
> drwxr-x--- 7 amavis amavis???? 4096 May 28 06:50 ..
> -rw------- 1 amavis amavis??? 89136 May 28 17:01 bayes_journal
> -rw------- 1 amavis amavis 21065728 May 28 16:59 bayes_seen
> -rw------- 1 amavis amavis 40144896 May 28 16:59 bayes_toks
> -rw-r--r-- 1 amavis amavis???? 2304 May? 5 12:41 user_prefs
>
> Then remove global setting of bayes database in
> /etc/spamassassin/local.cf
> and your problem will most probably to away.
>
>>> On 28.05.20 13:38, Bert Van de Poel wrote:
>>>> We're using a global bayes_path defined in local.cf:
>
>> On 28/05/2020 15:22, Matus UHLAR - fantomas wrote:
>>> This is your problem imho.
>>>
>>> if you use amavis, you need no bayes database, but amavis users',
>>> i guess in /var/lib/amavis/.spamassassin/
>>>
>>>
>>>> On 28/05/2020 10:18, Matus UHLAR - fantomas wrote:
>>>>> On 25.05.20 23:34, Bert Van de Poel wrote:
>>>>>> Recently, we've been setting up Bayesian learning on our existing
>>>>>> Amavis with Spamassassin setup on Ubuntu 18.04 (Spamassassin
>>>>>> 3.4.2-0ubuntu0.18.04.3 and Amavis 1:2.11.0-1ubuntu1). We've
>>>>>> decided to use a global db that was seeded with an aggregation of
>>>>>> spam and ham we've received, then enabling autolearn to further
>>>>>> train the set. As Spamassassin runs inside Amavis, the Bayes
>>>>>> database files are owned by the amavis user. This setup works
>>>>>> fine, and results for Bayes are great and growing in accuracy by
>>>>>> autolearning.
>>>>>>
>>>>>> What was somewhat confusing is that we noticed our daily cronjob
>>>>>> running sa-update and sa-compile was giving us an error
>>>>>> concerning permissions:
>>>>>> May 25 00:31:25.488 [8381] warn: bayes: cannot write to
>>>>>> /var/lib/spamassassin/bayes_db/bayes_journal, bayes db update
>>>>>> ignored: Permission denied
>>>>>> bayes: cannot write to
>>>>>> /var/lib/spamassassin/bayes_db/bayes_journal, bayes db update
>>>>>> ignored: Permission denied
>>>>>
>>>>> I wonder where did these files come from.
>>>>> did you sety bayes_path in /etc/spamassassin/? ?
>>>>>
>>>>>
>>>
>
Re: Why does sa-compile access the bayes db? [ In reply to ]
>On 2020-05-28 15:32, Bert Van de Poel wrote:
>>Almost all of the email we process are forwarders. It doesn't really
>>make sense for us to do a non-global bayes db. The large majority of
>>email we process is also for a uniform group: student organizations at
>>our local university.

On 28.05.20 21:05, Benny Pedersen wrote:
>does not matter if bayes database have no write access

missing write access can lead to warning or error messages too.
Luckily this problem is solved now as reported by OP.

--
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
WinError #99999: Out of error messages.