Mailing List Archive

Spamassassin with Galera as SQL-Backend?
Hi,

I have a setup where the spamassassin-servers have actually no access to the data of the mail-servers. Now I was looking into having per user bayes-databases and saw that I can do that with a SQL-database. I have already a small galera-cluster and I wonder if spamassassin will work with it because of the limitations galera has.
The limitations are:
- only innodb
- unsupported explicit locking
- a primary key on all tables is necessary
- no XA transactions
- no reliance on auto-increment

Does anyone have experience with such a setup?

Best,

Niels
Re: Spamassassin with Galera as SQL-Backend? [ In reply to ]
On Fri, May 06, 2022 at 11:08:21AM +0200, Niels Kobsch?tzki wrote:
> Hi,
>
> I have a setup where the spamassassin-servers have actually no access to the
> data of the mail-servers. Now I was looking into having per user
> bayes-databases and saw that I can do that with a SQL-database. I have already
> a small galera-cluster and I wonder if spamassassin will work with it because
> of the limitations galera has.
> The limitations are:
>
> * only innodb
> * unsupported explicit locking
> * a primary key on all tables is necessary
> * no XA transactions
> * no reliance on auto-increment
>
> Does anyone have experience with such a setup?

I see no reason why it wouldn't work, none of the limitations should apply
to SpamAssassin.
Re: Spamassassin with Galera as SQL-Backend? [ In reply to ]
On 2022-05-06 11:25, Henrik K wrote:
> On Fri, May 06, 2022 at 11:08:21AM +0200, Niels Kobschätzki wrote:
>> Hi,
>>
>> I have a setup where the spamassassin-servers have actually no access
>> to the
>> data of the mail-servers. Now I was looking into having per user
>> bayes-databases and saw that I can do that with a SQL-database. I have
>> already
>> a small galera-cluster and I wonder if spamassassin will work with it
>> because
>> of the limitations galera has.
>> The limitations are:
>>
>> * only innodb
>> * unsupported explicit locking
>> * a primary key on all tables is necessary
>> * no XA transactions
>> * no reliance on auto-increment
>>
>> Does anyone have experience with such a setup?
>
> I see no reason why it wouldn't work, none of the limitations should
> apply
> to SpamAssassin.

fair, its just that redis is more prefered to bayes imho, and postgresql
is high performance without being memory hungry
Re: Spamassassin with Galera as SQL-Backend? [ In reply to ]
On 6 May 2022, at 11:31, Benny Pedersen wrote:

> On 2022-05-06 11:25, Henrik K wrote:
>> On Fri, May 06, 2022 at 11:08:21AM +0200, Niels Kobschätzki wrote:
>>> Hi,
>>>
>>> I have a setup where the spamassassin-servers have actually no access to the
>>> data of the mail-servers. Now I was looking into having per user
>>> bayes-databases and saw that I can do that with a SQL-database. I have already
>>> a small galera-cluster and I wonder if spamassassin will work with it because
>>> of the limitations galera has.
>>> The limitations are:
>>>
>>> * only innodb
>>> * unsupported explicit locking
>>> * a primary key on all tables is necessary
>>> * no XA transactions
>>> * no reliance on auto-increment
>>>
>>> Does anyone have experience with such a setup?
>>
>> I see no reason why it wouldn't work, none of the limitations should apply
>> to SpamAssassin.

Great :)
I’d rather be safe than sorry and like to ask.

> fair, its just that redis is more prefered to bayes imho, and postgresql is high performance without being memory hungry

But I read that redis doesn’t have per-user databases? And I probably would need new machines with lots of RAM for it, because I have no idea how much RAM is needed per user.
And I already have a galera-cluster running and don’t want to set up yet another database-cluster (psql).

Niels
Re: Spamassassin with Galera as SQL-Backend? [ In reply to ]
On 5/6/22 11:08, Niels Kobschätzki wrote:
> Hi,
>
> I have a setup where the spamassassin-servers have actually no access to the data of the mail-servers. Now I was looking into having per user bayes-databases and saw that I can do that with a SQL-database. I have already a small galera-cluster and I wonder if spamassassin will work with it because of the limitations galera has.
> The limitations are:
>
> * only innodb
> * unsupported explicit locking
> * a primary key on all tables is necessary
> * no XA transactions
> * no reliance on auto-increment
>
> Does anyone have experience with such a setup?
>
Few things to consider:
bayes_expire has no primary key.
bayes_vars MySQL table has the id defined as "id int(11) NOT NULL AUTO_INCREMENT".

Actually I have no idea if this could be a blocker for you, there should be no problem if you do not use Bayes anyway.

Giovanni
Re: Spamassassin with Galera as SQL-Backend? [ In reply to ]
On Fri, May 06, 2022 at 12:31:47PM +0200, giovanni@paclan.it wrote:
> On 5/6/22 11:08, Niels Kobsch?tzki wrote:
> > Hi,
> >
> > I have a setup where the spamassassin-servers have actually no access to the data of the mail-servers. Now I was looking into having per user bayes-databases and saw that I can do that with a SQL-database. I have already a small galera-cluster and I wonder if spamassassin will work with it because of the limitations galera has.
> > The limitations are:
> >
> > * only innodb
> > * unsupported explicit locking
> > * a primary key on all tables is necessary
> > * no XA transactions
> > * no reliance on auto-increment
> >
> > Does anyone have experience with such a setup?
> >
> Few things to consider:
> bayes_expire has no primary key.

From what I see, there's no reason why it shouldn't be.

CREATE TABLE bayes_expire (
id int(11) NOT NULL default '0',
runtime int(11) NOT NULL default '0',
KEY bayes_expire_idx1 (id)
) ENGINE=InnoDB;

BayesStore/MySQL.pm has kind of a dumb insert which might insert things multiple times

my $sql = "INSERT INTO bayes_expire (id,runtime) VALUES (?,?)";

It should just be converted to UPSERT.

Of course this won't help until 4.0.0 is released..

> bayes_vars MySQL table has the id defined as "id int(11) NOT NULL AUTO_INCREMENT".

Google implies Galera supports auto_increment just fine, it just does
something funny like incrementing them in 3 multiples or something.
Re: Spamassassin with Galera as SQL-Backend? [ In reply to ]
On 2022-05-06 12:21, Niels Kobschätzki wrote:

> But I read that redis doesn’t have per-user databases?

nope, pr user bayes needs one database in redis, not multiple pr user

> And I probably
> would need new machines with lots of RAM for it, because I have no
> idea how much RAM is needed per user.

only redis works better if have Gb range of ram, while postgresql keep
it at Mb og ram

> And I already have a galera-cluster running and don’t want to set up
> yet another database-cluster (psql).

redis can be clustered aswell :=)

even sqlite3 can be clustered

i only have fokus on ram usage, not what tool used

>
> Niels
Re: Spamassassin with Galera as SQL-Backend? [ In reply to ]
I’ve had it running against a mariadb galera setup for a few years now. Have not experienced any issues here. Did not have to take any special actions on SpamAssassins side, just pointed it at a dns round robin entry for the backend servers and setup appropriate access perms for them.

-Darrell

> On May 6, 2022, at 4:08 AM, Niels Kobschätzki <niels@kobschaetzki.net> wrote:
>
> Hi,
>
> I have a setup where the spamassassin-servers have actually no access to the data of the mail-servers. Now I was looking into having per user bayes-databases and saw that I can do that with a SQL-database. I have already a small galera-cluster and I wonder if spamassassin will work with it because of the limitations galera has.
> The limitations are:
>
> only innodb
> unsupported explicit locking
> a primary key on all tables is necessary
> no XA transactions
> no reliance on auto-increment
> Does anyone have experience with such a setup?
>
> Best,
>
> Niels
>
Re: Spamassassin with Galera as SQL-Backend? [ In reply to ]
On 2022-05-06 6:56 am, Henrik K wrote:

> On Fri, May 06, 2022 at 12:31:47PM +0200, giovanni@paclan.itwrote:
> On 5/6/22 11:08, Niels Kobschätzki wrote: Hi, I have a setup where the spamassassin-servers have actually no access to the data of the mail-servers. Now I was looking into having per user bayes-databases and saw that I can do that with a SQL-database. I have already a small galera-cluster and I wonder if spamassassin will work with it because of the limitations galera has. The limitations are: * only innodb * unsupported explicit locking * a primary key on all tables is necessary * no XA transactions * no reliance on auto-increment Does anyone have experience with such a setup? Few things to consider: bayes_expire has no primary key.

From what I see, there's no reason why it shouldn't be.

CREATE TABLE bayes_expire (
id int(11) NOT NULL default '0',
runtime int(11) NOT NULL default '0',
KEY bayes_expire_idx1 (id)
) ENGINE=InnoDB;

BayesStore/MySQL.pm has kind of a dumb insert which might insert things
multiple times

my $sql = "INSERT INTO bayes_expire (id,runtime) VALUES (?,?)";

It should just be converted to UPSERT.

Of course this won't help until 4.0.0 is released..

> bayes_vars MySQL table has the id defined as "id int(11) NOT NULL AUTO_INCREMENT".

Google implies Galera supports auto_increment just fine, it just does
something funny like incrementing them in 3 multiples or something.

It works fine with Galera - been running that for years. This is from my
ansible spamassassin role

> 267 # http://blog.secaserver.com/2013/10/converting-data-work-galera-cluster/
> 268 - name: Set bayes_expire key to PRIMARY so galera replication works
> 269 lineinfile:
> 270 path: /usr/share/doc/spamassassin/sql/bayes_mysql.sql
> 271 regexp: 'KEY (bayes_expire_idx1.*)'
> 272 line: 'PRIMARY KEY 1'
> 273 backrefs: yes
> 274 state: present
> 275
> 276 # NOTE: As of 3.4.3 see UPGRADE file - says to add last_hit field to awl table
> 277 # exactly as we're doing here
> 278 # lastupdate timestamp default CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
> 279 # http://www200.pair.com/mecham/spam/debian-spamassassin-sql.html
> 280 - name: Add last_hit field to bayes_seen and awl tables
> 281 lineinfile:
> 282 path: "/usr/share/doc/spamassassin/sql/{{ item.file }}"
> 283 insertbefore: '.*({{ item.before }}).*'
> 284 line: ' last_hit timestamp NOT NULL default CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,'
> 285 state: present
> 286 with_items:
> 287 - { before: 'id,msgid', file: bayes_mysql.sql }
> 288 - { before: 'username,email,signedby,ip', file: awl_mysql.sql }
> 289
> 290 # Only run the DB creation tasks on a single host
> 291 # This depends on mysql/galera repliction being active to propagate
> 292 # the DB across to the other nodes
> 293 - when: inventory_hostname == groups.testmail[0] or play_hosts | length == 1
> 294 block:
> 295 #
> 296 - name: Create Spamassassin database
> 297 mysql_db:
> 298 name: spamassassin
> 299 state: present
> 300 register: spamassassin_database

Just setting the bayes_expire key to PRIMARY was all that was needed.