A few days ago I reported a problem I was having with my bayes
database along with the observation that I was pretty sure it was a
bug in the bayes expiry software.
The problem was that under heavy load "bayes_toks.expire[pid]" files
(eg. bayes_toks.expire16781) were piling up in the bayes database area
one after another as successive expiries failed to complete.
I run spamassassin under MIMEDefang and I was allowing expires to take
place automatically, when needed, rather than forcing an expiry on a
regular basis using sa-learn.
A day or so after reporting the problem I got a message from David Lee
who had run into the same problem, and who thought it might be due to
the controlling agent, in his case a program called Mailscanner,
timing out the expiry process before it could complete.
It turns to be exactly what was happening. A bayes expiry can often
take 3 or 4 minutes to complete, and if the system load happens to be
really high when a mimedefang/spamassassin process decides its time to
do an expiry, the process can easily take much longer, and if it takes
longer than 5 minutes you're in trouble, because 5 minutes happens to
be the time given to complete each task by sendmail if you are running
MimeDefang with the default setup.
I'm also pretty sure this must be the case because I ran an expire via
sa-learn and it finished successfully in about 8 minutes, so it wasn't
a matter of a corrupted database causing the problem.
The point is that allowing bayes expiry to take place opportunisticaly
on a high volume server is a recipe for disaster, and what you need to
do is set "bayes_auto_expire 0" in your local.cf file and use sa-learn
to force an expire on a regular basis via cron.
Thanks to David Lee for pointing me in the right direction.
- rick mallett
database along with the observation that I was pretty sure it was a
bug in the bayes expiry software.
The problem was that under heavy load "bayes_toks.expire[pid]" files
(eg. bayes_toks.expire16781) were piling up in the bayes database area
one after another as successive expiries failed to complete.
I run spamassassin under MIMEDefang and I was allowing expires to take
place automatically, when needed, rather than forcing an expiry on a
regular basis using sa-learn.
A day or so after reporting the problem I got a message from David Lee
who had run into the same problem, and who thought it might be due to
the controlling agent, in his case a program called Mailscanner,
timing out the expiry process before it could complete.
It turns to be exactly what was happening. A bayes expiry can often
take 3 or 4 minutes to complete, and if the system load happens to be
really high when a mimedefang/spamassassin process decides its time to
do an expiry, the process can easily take much longer, and if it takes
longer than 5 minutes you're in trouble, because 5 minutes happens to
be the time given to complete each task by sendmail if you are running
MimeDefang with the default setup.
I'm also pretty sure this must be the case because I ran an expire via
sa-learn and it finished successfully in about 8 minutes, so it wasn't
a matter of a corrupted database causing the problem.
The point is that allowing bayes expiry to take place opportunisticaly
on a high volume server is a recipe for disaster, and what you need to
do is set "bayes_auto_expire 0" in your local.cf file and use sa-learn
to force an expire on a regular basis via cron.
Thanks to David Lee for pointing me in the right direction.
- rick mallett