Mailing List Archive

Share bayes database between servers
Hi there,

I am running two mailservers, first one serving two domains, other one
serving one domain.

Both serve as backup mx for each other. Both know about users and
aliases of the other domain(s).

On both systems, spamassassin is configured to read/store userprefs and
bayes data (per user) in a local mysql database.

Both systems reject email if the score exceeds a certain limit. To
avoid backscatter (or the need to accept any spam not rejected by the
backup mx), both servers should do their spam filtering based on
exactly the same information, including bayes data.

Now, the question is, what is the best way to share bayes data between
two (or more) servers?

I already share userprefs by setting up master-master replication
between the two mysql databases on both servers. This is uncritical,
since users (or admins) will update only userprefs for the local
virtual users on each system, which means, backup mx will never touch
primary mx userprefs.

But bayes data may be updated by either the primary mx or the backup
mx, since email may arrive at either server. 

I've set up a testing environment that also uses master-master
replication of the mysql bayes database, with priority in dns set to
equal for both mx to get incoming mail distributed evenly to both
systems. So far, this seems to work, but this is a low load
environment.

Any suggestions?

Regards,

Robert


--
Robert Senger
Re: Share bayes database between servers [ In reply to ]
Am Sonntag, dem 09.07.2023 um 19:21 +0200 schrieb Reindl Harald:
>
>
> Am 09.07.23 um 19:06 schrieb Robert Senger:
> > But bayes data may be updated by either the primary mx or the
> > backup
> > mx, since email may arrive at either server.
>
> in a smart setup your bayes-database is read-only like here since
> 2014,
> any autolearning disabled and strictly trained manually by a stored
> corpus giving you the opportinity removed and add messages to the
> training folders and revuild from scratch
>
> we share our bayes-db even with a different company since 2014

Well, that's the boring solution... ;) Nevertheless, this is what I
will likely do if I encounter any problems with the mysql master-master
replication as I have it running now.

Robert

--
Robert Senger
Re: Share bayes database between servers [ In reply to ]
On Sun, Jul 09, 2023 at 07:06:10PM +0200, Robert Senger wrote:
> I've set up a testing environment that also uses master-master
> replication of the mysql bayes database, with priority in dns set to
> equal for both mx to get incoming mail distributed evenly to both
> systems. So far, this seems to work, but this is a low load
> environment.

it boils down on how much you trust mysql master-master replication
stability and performance, which is heavily dependent on your
experiences and exact versions used (are we talking about Oracle
Mysql, or MariaDB or Percona forks? which versions? What replication
setup? etc.)

I've had problems under high concurrent load (not performance, but
replication setup breaking) in the past, so I prefer to avoid
master-master replication if possible, especially if I anticipate
high concurrent load.

But if you are confident in it, sure, go ahead.

> Any suggestions?

Well, how are you training your bayes DB? If it is via cron and
manually curated ham/spam corpuses (the recommended way), I'd rather
suggest keeping databases separate and simply running training on
both servers (you can duplicate or share ham/spam corpuses as you wish,
from rsync to SMB/NFS).

If you are using auto-learn (which was not recommended last time I
looked), well, you'd probably better off NOT syncing bayes at all
IMHO, as it should be prefered that risk of bayes poisoning is
reduced to one server instead of replicating that (and there is not
much benefit, as auto-learn will quickly learn on each server
separately anyway, and if one set of domains is not getting some type
of spam, it is not beneficial to learn it anyway)

--
Opinions above are GNU-copylefted.