>>>On Wed, 17 Mar 2021 10:42:14 -0400 Kris Deugau wrote:
>>>>My own experience has been that accumulating blobs of ham/spam and
>>>>just repeatedly running sa-learn over those works just fine.? It also
>>>>reduces the incidence of tokens from somewhat rarer mail
>>>>automatically expiring out of Bayes, leading to FPs and FNs.
>>
>>On 17.03.21 22:01, RW wrote:
>>>It wont do that by default. You would need to have something removing
>>>the signature hashes from the database.
>Matus UHLAR - fantomas wrote:
>>oh, yes, it does:
>>
>> ????? bayes_auto_expire???????????? (default: 1)
>> ????????? If enabled, the Bayes system will try to automatically expire old
>> ????????? tokens from the database.? Auto-expiry occurs when the number of
>> ????????? tokens in the database surpasses the bayes_expiry_max_db_size
>> ????????? value. If a bayes datastore backend does not implement individual
>> ????????? key/value expirations, the setting is silently ignored.
>>
>>note that multiple people reported long delivery time when expiration has
>>occured, and it's often recommended to turn this off and do
>>expiration e.g. from cron job.
>>
>>BAYES database stored in redis does not have this issue.
On 18.03.21 11:02, Kris Deugau wrote:
>That option only controls when Bayes expiry is run, not what gets
>expired when it does happen.
It says that the expiry is run automatically when certain conditions are met.
Of course it doesn't affect how expiry works, other options affect that.
>Thinking more I may have conflated several actions that any long-lived
>Bayes DB will have to experience.
>
>- Token expiry will happen automatically out of the box as above, or
>manually as scheduled (for BDB or SQL backends - IIRC the Redis
>backend uses a Redis feature to expire tokens automatically).
>Historically autoexpire worked well for file-based per-user or (very)
>small sitewide DBs, but very poorly for larger ones (even a couple
>tens to hundreds of users) due to the strict locking and extra time
>taken while processing a message. I'm not sure this is so much of a
>problem with an SQL back end.
that's what I wrote above about long delivery times. Maybe I should
explained more deeply (there were more problems but I don't remember
details)
manual expiration does not work with redis, so with redis bayes_auto_expire
should be set to 1 (default - simply don't turn it off)
>- The list of "seen" messages (bayes_seen file or DB table - not sure
>what the Redis backend uses) may grow without limit unless manually
>trimmed. On a long-lived Bayes DB this can get very large indeed if
>not.
>
>Between these two items, rarely matched/learned tokens will tend to
>expire out (by design - I have no problem with this), but even with a
>(much) larger bayes_expiry_max_db_size there will be a few more FNs or
>FPs if these tokens are more or less permanently expired. Keeping
>them in circulation by re-learning the same mail over and over helps
>nudge the overall accuracy just a little closer to that impossible
>"perfect" filter that catches all the spam and none of the ham.
so, in fact, you want to keep tokens fresh.
other solution would be increase their TTL and database size.
I wonder if TTL isn't updated when tokens are used (so only unused tokens
get expired).
>This is just from my own experience, although some things may have
>been refined and changed since Bayes was first introduced in 2.x (2.4?
>2.6? don't recall any more).
--
Matus UHLAR - fantomas, uhlar@fantomas.sk ;
http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
A day without sunshine is like, night.