Mailing List Archive

[Bug 3488] Bayes SQL and Bayes DBM differ on newest_token_age for same corpus.
http://bugzilla.spamassassin.org/show_bug.cgi?id=3488

parkerm@pobox.com changed:

What |Removed |Added
----------------------------------------------------------------------------
Summary|Bayes SQL setting |Bayes SQL and Bayes DBM
|newest_token_age incorrectly|differ on newest_token_age
| |for same corpus.





------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
[Bug 3488] Bayes SQL and Bayes DBM differ on newest_token_age for same corpus. [ In reply to ]
http://bugzilla.spamassassin.org/show_bug.cgi?id=3488





------- Additional Comments From felicity@kluge.net 2004-06-08 21:07 -------
really? do you have that message? can you attach it here? no one has been able to show me a
message which is found to be in the future.

what exactly is the issue for DBM?



------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
[Bug 3488] Bayes SQL and Bayes DBM differ on newest_token_age for same corpus. [ In reply to ]
http://bugzilla.spamassassin.org/show_bug.cgi?id=3488





------- Additional Comments From parkerm@pobox.com 2004-06-08 21:18 -------
Subject: Re: Bayes SQL and Bayes DBM differ on newest_token_age for same corpus.

On Tue, Jun 08, 2004 at 09:07:30PM -0700, bugzilla-daemon@bugzilla.spamassassin.org wrote:
> ------- Additional Comments From felicity@kluge.net 2004-06-08 21:07 -------
> really? do you have that message? can you attach it here? no one has been able to show me a
> message which is found to be in the future.

Well, if my guess is right, everyone has it. See compile_now.

> what exactly is the issue for DBM?
>

Red Herring I think. Not totally sure why DBM doesn't show the
problem.

The test msg from compile_now is causing several tokens to get
added/touched in the SQL database. They don't appear to get touched
in the DBM database. Not sure why.

The corpus I'm using is several months old, so with this really recent
set of tokens in the DB it makes calculate_expire_delta give back
wonky results.

Michael





------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
[Bug 3488] Bayes SQL and Bayes DBM differ on newest_token_age for same corpus. [ In reply to ]
http://bugzilla.spamassassin.org/show_bug.cgi?id=3488





------- Additional Comments From felicity@kluge.net 2004-06-08 21:27 -------
Subject: Re: Bayes SQL and Bayes DBM differ on newest_token_age for same corpus.

On Tue, Jun 08, 2004 at 09:18:24PM -0700, bugzilla-daemon@bugzilla.spamassassin.org wrote:
> The test msg from compile_now is causing several tokens to get
> added/touched in the SQL database. They don't appear to get touched
> in the DBM database. Not sure why.

the config for DBM aims at the temp directory at startup, but I think
the SQL config causes startup to go to the actual db.

perhaps the backup bit ought to disable auto learning then restore it
at the end?





------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
[Bug 3488] Bayes SQL and Bayes DBM differ on newest_token_age for same corpus. [ In reply to ]
http://bugzilla.spamassassin.org/show_bug.cgi?id=3488





------- Additional Comments From jm@jmason.org 2004-06-08 21:52 -------
re: disabling autolearning -- I think that sounds like a good idea.



------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
[Bug 3488] Bayes SQL and Bayes DBM differ on newest_token_age for same corpus. [ In reply to ]
http://bugzilla.spamassassin.org/show_bug.cgi?id=3488





------- Additional Comments From parkerm@pobox.com 2004-06-08 22:04 -------
Subject: Re: Bayes SQL and Bayes DBM differ on newest_token_age for same corpus.

Actually auto-learning is disabled during the init stage. It's the
tok_touch that is firing.

Someone remind me, I can't seem to find it in the code, if auto
learning is disabled, we don't write to the journal right? If so,
then that's it.

Hmmmm...so what should the behavior be for the SQL code?

Michael





------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
[Bug 3488] Bayes SQL and Bayes DBM differ on newest_token_age for same corpus. [ In reply to ]
http://bugzilla.spamassassin.org/show_bug.cgi?id=3488





------- Additional Comments From felicity@kluge.net 2004-06-09 08:33 -------
Subject: Re: Bayes SQL and Bayes DBM differ on newest_token_age for same corpus.

On Tue, Jun 08, 2004 at 10:04:25PM -0700, bugzilla-daemon@bugzilla.spamassassin.org wrote:
> Actually auto-learning is disabled during the init stage. It's the
> tok_touch that is firing.
>
> Someone remind me, I can't seem to find it in the code, if auto
> learning is disabled, we don't write to the journal right? If so,
> then that's it.

Ah... Yes, journal writes happen on scan. If the tokens already exist
in the DB, their atimes get updated.





------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
[Bug 3488] Bayes SQL and Bayes DBM differ on newest_token_age for same corpus. [ In reply to ]
http://bugzilla.spamassassin.org/show_bug.cgi?id=3488

parkerm@pobox.com changed:

What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|3.0.0 |3.1.0



------- Additional Comments From parkerm@pobox.com 2004-06-09 09:00 -------
Moving to 3.1.0. It's an interesting problem, but not one that I think will hit
a lot of people. A) you've got to be scanning a corpus that is old enough that
a few tokens with an atime of now would radically throw off the
calcualte_expire_delta calculation, b) you've got to be running Bayes SQL in
such a way that the spamd startup (ie compile_now) will be forced to the same
user everyone else is using (ie global bayes).

I'll look into some ways that we can mimic the DBM behavior in the SQL code.



------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
[Bug 3488] Bayes SQL and Bayes DBM differ on newest_token_age for same corpus. [ In reply to ]
http://bugzilla.spamassassin.org/show_bug.cgi?id=3488





------- Additional Comments From parkerm@pobox.com 2004-06-16 20:56 -------
Created an attachment (id=2046)
--> (http://bugzilla.spamassassin.org/attachment.cgi?id=2046&action=view)
Patch File

This patch takes the simple approach, turning off the bayes rules before
checking the sample message, and then setting it back to it's previous value.



------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.