Mailing List Archive

Question about user specific bayes
Hi,

Trying to implement user specific bayes. My current setup is setup as follows in regards to global bayes. I'm also using amavis:

bayes_path /opt/sa-bayes/bayes
bayes_file_mode 0777
use_bayes 1
use_bayes_rules 1
bayes_auto_learn 0
bayes_auto_learn_threshold_spam 15
bayes_auto_learn_threshold_nonspam -5



According to various things I've read online, I've setup the following in /etc/default/spamassassin in an attempt to setup user specific bayes:


OPTIONS="--create-prefs --max-children 5 --helper-home-dir=/opt/sa-bayes-users/%u -x -u amavis"

I've also created a bunch of subdirectories with usernames under /opt/sa-bayes-users. Example:

/opt/sa-bayes-users/bob@domain.tld<mailto:/opt/sa-bayes-users/bob@domain.tld>
/opt/sa-bayes-users/larry@domain.tld<mailto:/opt/sa-bayes-users/larry@domain.tld>

Etc...

I've setup the owner in /opt/sa-bayes-users/ to amavis and I've also setup the permissions to 700.

I've run a test sa-learn as follows where /mnt/data/amavis/clean/n/nTutbwTMVWzK is the actual e-mail file I use to train SA:

sa-learn --spam --dbpath /opt/sa-bayes-users/bob@domain.tld /mnt/data/amavis/clean/n/nTutbwTMVWzK

and it did seem to create bayes_toks and bayes_seen files under the /opt/sa-bayes-users/bob@domain.tld<mailto:/opt/sa-bayes-users/bob@domain.tld> directory as expected.

Is this all that's required to get this working?

What happens to the global bayes file in local.cf? Is that no longer used?

How do the following settings from the local.cf figure in the user specific bayes files?

use_bayes 1
use_bayes_rules 1
bayes_auto_learn 0
bayes_auto_learn_threshold_spam 15
bayes_auto_learn_threshold_nonspam -5


Do the user specific bayes have the same requirements to train them with at least 200 messages? before they start working?

Thanks in advance
Re: Question about user specific bayes [ In reply to ]
On 2022-01-18 at 11:12:01 UTC-0500 (Tue, 18 Jan 2022 16:12:01 +0000)
Dino Edwards <dino.edwards@mydirectmail.net>
is rumored to have said:

> Hi,
>
> Trying to implement user specific bayes. My current setup is setup as
> follows in regards to global bayes. I'm also using amavis:
>
> bayes_path /opt/sa-bayes/bayes
> bayes_file_mode 0777

Don't do that anywhere. It's not safe.

> use_bayes 1
> use_bayes_rules 1
> bayes_auto_learn 0
> bayes_auto_learn_threshold_spam 15
> bayes_auto_learn_threshold_nonspam -5
[...]
>
> and it did seem to create bayes_toks and bayes_seen files under the
> /opt/sa-bayes-users/bob@domain.tld<mailto:/opt/sa-bayes-users/bob@domain.tld>
> directory as expected.

So, it is working.

> Is this all that's required to get this working?

Yes

> What happens to the global bayes file in local.cf? Is that no longer
> used?

I believe that it would be used if for some reason SA couldn't figure
out which user to pick for a scan at runtime. Maybe if spamd was
launched as a user that was later deleted?

But generally, working per-user Bayes setup makes the global file
pointless and unused.

>
> How do the following settings from the local.cf figure in the user
> specific bayes files?
>
> use_bayes 1
> use_bayes_rules 1
> bayes_auto_learn 0
> bayes_auto_learn_threshold_spam 15
> bayes_auto_learn_threshold_nonspam -5

The local.cf file is loaded before user_prefs, which is the last config
file loaded, so anything that can be changed in user_prefs (i.e. all of
those, I believe) which is set in user_prefs will 'stick'

Note that in this case you're choosing to disable auto-learn, so the
threshold values are never used.

> Do the user specific bayes have the same requirements to train them
> with at least 200 messages?

Yes. Each Bayes DB must be seeded before it can be used. You should also
plan a way to regularly feed known spam and ham to those databases,
since you aren't auto-learning.

> before they start working?

Before SA will determine a Bayes score on incoming messages, yes.




--
Bill Cole
bill@scconsult.com or billcole@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire
RE: Question about user specific bayes [ In reply to ]
Hi, thanks for the quick reply. So when amavis calls on SA for an incoming message, it will pass the recipient (e-mail address) in the %u variable and then SA will take that variable and look in the /opt/sa-bayes-users/%u directory for the existence of bayes database and if it finds one, it will use it provided it's properly seeded. If not, it will fall back to the global bayes. Is that correct?

Thanks



-----Original Message-----
From: Bill Cole <sausers-20150205@billmail.scconsult.com>
Sent: Tuesday, January 18, 2022 12:23 PM
To: users@spamassassin.apache.org
Subject: Re: Question about user specific bayes

On 2022-01-18 at 11:12:01 UTC-0500 (Tue, 18 Jan 2022 16:12:01 +0000) Dino Edwards <dino.edwards@mydirectmail.net> is rumored to have said:

> Hi,
>
> Trying to implement user specific bayes. My current setup is setup as
> follows in regards to global bayes. I'm also using amavis:
>
> bayes_path /opt/sa-bayes/bayes
> bayes_file_mode 0777

Don't do that anywhere. It's not safe.

> use_bayes 1
> use_bayes_rules 1
> bayes_auto_learn 0
> bayes_auto_learn_threshold_spam 15
> bayes_auto_learn_threshold_nonspam -5
[...]
>
> and it did seem to create bayes_toks and bayes_seen files under the
> /opt/sa-bayes-users/bob@domain.tld<mailto:/opt/sa-bayes-users/bob@doma
> in.tld>
> directory as expected.

So, it is working.

> Is this all that's required to get this working?

Yes

> What happens to the global bayes file in local.cf? Is that no longer
> used?

I believe that it would be used if for some reason SA couldn't figure out which user to pick for a scan at runtime. Maybe if spamd was launched as a user that was later deleted?

But generally, working per-user Bayes setup makes the global file pointless and unused.

>
> How do the following settings from the local.cf figure in the user
> specific bayes files?
>
> use_bayes 1
> use_bayes_rules 1
> bayes_auto_learn 0
> bayes_auto_learn_threshold_spam 15
> bayes_auto_learn_threshold_nonspam -5

The local.cf file is loaded before user_prefs, which is the last config file loaded, so anything that can be changed in user_prefs (i.e. all of those, I believe) which is set in user_prefs will 'stick'

Note that in this case you're choosing to disable auto-learn, so the threshold values are never used.

> Do the user specific bayes have the same requirements to train them
> with at least 200 messages?

Yes. Each Bayes DB must be seeded before it can be used. You should also plan a way to regularly feed known spam and ham to those databases, since you aren't auto-learning.

> before they start working?

Before SA will determine a Bayes score on incoming messages, yes.




--
Bill Cole
bill@scconsult.com or billcole@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Not Currently Available For Hire
Re: Question about user specific bayes [ In reply to ]
On 2022-01-18 at 13:40:29 UTC-0500 (Tue, 18 Jan 2022 18:40:29 +0000)
Dino Edwards <dino.edwards@mydirectmail.net>
is rumored to have said:

> Hi, thanks for the quick reply. So when amavis calls on SA for an
> incoming message, it will pass the recipient (e-mail address) in the
> %u variable and then SA will take that variable and look in the
> /opt/sa-bayes-users/%u directory for the existence of bayes database
> and if it finds one, it will use it provided it's properly seeded. If
> not, it will fall back to the global bayes. Is that correct?

Well, maybe? I don't currently have a system using per-user Bayes and
it's been a bit since I set one up so hopefully someone who has a
working rig will speak up...

Note that SA will try to create an empty DB if none exists. I'm not sure
that I can think up a circumstance (other than a disappearing user)
where fallback to global Bayes would happen. SA will not fall back to a
global Bayes DB just because an otherwise perfectly good per-user DB
isn't properly seeded.



> -----Original Message-----
> From: Bill Cole <sausers-20150205@billmail.scconsult.com>
> Sent: Tuesday, January 18, 2022 12:23 PM
> To: users@spamassassin.apache.org
> Subject: Re: Question about user specific bayes
>
> On 2022-01-18 at 11:12:01 UTC-0500 (Tue, 18 Jan 2022 16:12:01 +0000)
> Dino Edwards <dino.edwards@mydirectmail.net> is rumored to have said:
>
>> Hi,
>>
>> Trying to implement user specific bayes. My current setup is setup as
>> follows in regards to global bayes. I'm also using amavis:
>>
>> bayes_path /opt/sa-bayes/bayes
>> bayes_file_mode 0777
>
> Don't do that anywhere. It's not safe.
>
>> use_bayes 1
>> use_bayes_rules 1
>> bayes_auto_learn 0
>> bayes_auto_learn_threshold_spam 15
>> bayes_auto_learn_threshold_nonspam -5
> [...]
>>
>> and it did seem to create bayes_toks and bayes_seen files under the
>> /opt/sa-bayes-users/bob@domain.tld<mailto:/opt/sa-bayes-users/bob@doma
>> in.tld>
>> directory as expected.
>
> So, it is working.
>
>> Is this all that's required to get this working?
>
> Yes
>
>> What happens to the global bayes file in local.cf? Is that no longer
>> used?
>
> I believe that it would be used if for some reason SA couldn't figure
> out which user to pick for a scan at runtime. Maybe if spamd was
> launched as a user that was later deleted?
>
> But generally, working per-user Bayes setup makes the global file
> pointless and unused.
>
>>
>> How do the following settings from the local.cf figure in the user
>> specific bayes files?
>>
>> use_bayes 1
>> use_bayes_rules 1
>> bayes_auto_learn 0
>> bayes_auto_learn_threshold_spam 15
>> bayes_auto_learn_threshold_nonspam -5
>
> The local.cf file is loaded before user_prefs, which is the last
> config file loaded, so anything that can be changed in user_prefs
> (i.e. all of those, I believe) which is set in user_prefs will 'stick'
>
> Note that in this case you're choosing to disable auto-learn, so the
> threshold values are never used.
>
>> Do the user specific bayes have the same requirements to train them
>> with at least 200 messages?
>
> Yes. Each Bayes DB must be seeded before it can be used. You should
> also plan a way to regularly feed known spam and ham to those
> databases, since you aren't auto-learning.
>
>> before they start working?
>
> Before SA will determine a Bayes score on incoming messages, yes.
>
>
>
>
> --
> Bill Cole
> bill@scconsult.com or billcole@apache.org (AKA @grumpybozo and many
> *@billmail.scconsult.com addresses) Not Currently Available For Hire


--
Bill Cole
bill@scconsult.com or billcole@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire
RE: Question about user specific bayes [ In reply to ]
> Note that SA will try to create an empty DB if none exists. I'm not sure that I can think up a circumstance (other than a disappearing user) where fallback > to global Bayes would happen. SA will not fall back to a global Bayes DB just because an otherwise perfectly good per-user DB isn't properly seeded.

It doesn't seem to be creating an empty database at all. Not sure why

> -----Original Message-----
> From: Bill Cole <sausers-20150205@billmail.scconsult.com>
> Sent: Tuesday, January 18, 2022 12:23 PM
> To: users@spamassassin.apache.org
> Subject: Re: Question about user specific bayes
>
> On 2022-01-18 at 11:12:01 UTC-0500 (Tue, 18 Jan 2022 16:12:01 +0000)
> Dino Edwards <dino.edwards@mydirectmail.net> is rumored to have said:
>
>> Hi,
>>
>> Trying to implement user specific bayes. My current setup is setup as
>> follows in regards to global bayes. I'm also using amavis:
>>
>> bayes_path /opt/sa-bayes/bayes
>> bayes_file_mode 0777
>
> Don't do that anywhere. It's not safe.
>
>> use_bayes 1
>> use_bayes_rules 1
>> bayes_auto_learn 0
>> bayes_auto_learn_threshold_spam 15
>> bayes_auto_learn_threshold_nonspam -5
> [...]
>>
>> and it did seem to create bayes_toks and bayes_seen files under the
>> /opt/sa-bayes-users/bob@domain.tld<mailto:/opt/sa-bayes-users/bob@dom
>> a
>> in.tld>
>> directory as expected.
>
> So, it is working.
>
>> Is this all that's required to get this working?
>
> Yes
>
>> What happens to the global bayes file in local.cf? Is that no longer
>> used?
>
> I believe that it would be used if for some reason SA couldn't figure
> out which user to pick for a scan at runtime. Maybe if spamd was
> launched as a user that was later deleted?
>
> But generally, working per-user Bayes setup makes the global file
> pointless and unused.
>
>>
>> How do the following settings from the local.cf figure in the user
>> specific bayes files?
>>
>> use_bayes 1
>> use_bayes_rules 1
>> bayes_auto_learn 0
>> bayes_auto_learn_threshold_spam 15
>> bayes_auto_learn_threshold_nonspam -5
>
> The local.cf file is loaded before user_prefs, which is the last
> config file loaded, so anything that can be changed in user_prefs
> (i.e. all of those, I believe) which is set in user_prefs will 'stick'
>
> Note that in this case you're choosing to disable auto-learn, so the
> threshold values are never used.
>
>> Do the user specific bayes have the same requirements to train them
>> with at least 200 messages?
>
> Yes. Each Bayes DB must be seeded before it can be used. You should
> also plan a way to regularly feed known spam and ham to those
> databases, since you aren't auto-learning.
>
>> before they start working?
>
> Before SA will determine a Bayes score on incoming messages, yes.
>
>
>
>
> --
> Bill Cole
> bill@scconsult.com or billcole@apache.org (AKA @grumpybozo and many
> *@billmail.scconsult.com addresses) Not Currently Available For Hire


--
Bill Cole
bill@scconsult.com or billcole@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Not Currently Available For Hire
Re: Question about user specific bayes [ In reply to ]
On 2022-01-18 22:34, Bill Cole wrote:

> Well, maybe? I don't currently have a system using per-user Bayes and
> it's been a bit since I set one up so hopefully someone who has a
> working rig will speak up...

fuglu have pr user bayes pr default, and it recently fixed that local
part before could be mixed case so sender could create another bayes
user, ups, i had hoped on that this was solved in spamassassin core, but
maybe in sa 4.0.0

> Note that SA will try to create an empty DB if none exists.

and if spamd / spamc uses virtual sql users, or have static db files for
all users with read/write permissions, ideal if sqlite3 user prefs is
configured it could be very simple

> I'm not
> sure that I can think up a circumstance (other than a disappearing
> user) where fallback to global Bayes would happen.

is this even supported ?

> SA will not fall
> back to a global Bayes DB just because an otherwise perfectly good
> per-user DB isn't properly seeded.

good