Mailing List Archive

[Bug 3217] SpamAssassin.pm: "bayes_scanner" may be undefined
http://bugzilla.spamassassin.org/show_bug.cgi?id=3217

felicity@kluge.net changed:

What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |WORKSFORME



------- Additional Comments From felicity@kluge.net 2004-03-26 09:30 -------
don't take this the wrong way, but if you're calling from some third party app, why do you think the bug
is in our code?

the way it's supposed to work is that M::SA::init() gets called, which will setup the bayes_scanner
appropriately. you can't just call init_learner() because you'll want init() to do the configs and such first.

so you were right on that init() call, but the solution is for MailScanner to call it...



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 3217] SpamAssassin.pm: "bayes_scanner" may be undefined [ In reply to ]
http://bugzilla.spamassassin.org/show_bug.cgi?id=3217





------- Additional Comments From t.d.lee@durham.ac.uk 2004-03-26 09:44 -------
Thanks. Do you mean that the calling application should change from:

$X = new Mail::SpamAssassin(...)
$X->init_learner(...)
$X->rebuild_learner_caches(...)
$X->finish_learner(...)

to:

$X = new Mail::SpamAssassin(...)
$X->init(1)
$X->init_learner(...)
$X->rebuild_learner_caches(...)
$X->finish_learner(...)

But around line 1150 in SpamAssassin.pm is the declaration of
"init", immediately preceded by a comment that looks significant:
"non-public methods", which I took to mean that we (the "public"
caller) should _not_ try this approach.

I'm quite willing to believe that I've misunderstood something
somewhere here. Enlightenment would be welcome!




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 3217] SpamAssassin.pm: "bayes_scanner" may be undefined [ In reply to ]
http://bugzilla.spamassassin.org/show_bug.cgi?id=3217





------- Additional Comments From felicity@kluge.net 2004-03-26 12:28 -------
Subject: Re: SpamAssassin.pm: "bayes_scanner" may be undefined

On Fri, Mar 26, 2004 at 09:44:45AM -0800, bugzilla-daemon@bugzilla.spamassassin.org wrote:
> Thanks. Do you mean that the calling application should change from:
>
> $X = new Mail::SpamAssassin(...)
> $X->init_learner(...)
> $X->rebuild_learner_caches(...)
> $X->finish_learner(...)
>
> to:
>
> $X = new Mail::SpamAssassin(...)
> $X->init(1)
> $X->init_learner(...)
> $X->rebuild_learner_caches(...)
> $X->finish_learner(...)

Yeah, or you could, technically, do init_learner then init if you're not
going to deal with learn_to_journal. I think init() then init_learner()
is more accurate.

> But around line 1150 in SpamAssassin.pm is the declaration of
> "init", immediately preceded by a comment that looks significant:
> "non-public methods", which I took to mean that we (the "public"
> caller) should _not_ try this approach.
>
> I'm quite willing to believe that I've misunderstood something
> somewhere here. Enlightenment would be welcome!

:)

Well, hrm. I was going to say that rebuild_learner_caches() is a
non-public function, but the docs don't quite bare me out on that one...
Basically what the docs say is you're expected to do:

init_learner()
learn()
rebuild_learner_caches()
finish_learner()

and learn() does init() (so does check() and a bunch of other "public
API" functions).

So I guess the question is -- should daemons be expecting to be able to
do a force sync/expire without doing a learn/check/etc first?





------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 3217] SpamAssassin.pm: "bayes_scanner" may be undefined [ In reply to ]
http://bugzilla.spamassassin.org/show_bug.cgi?id=3217





------- Additional Comments From t.d.lee@durham.ac.uk 2004-03-29 01:31 -------
Thanks for your reply. Your suggestion:

> init_learner()
> learn()
> rebuild_learner_caches()
> finish_learner()

would require that a cache rebuild requires some message to be
learned as a prelude to rebuilding the caches (which already
contain many learned messages). This seems to be an unnecessary
complication to a simple "rebuild" request.

I see that the docs say rebuild "should be called after the
learning process". But presumably the intention of this is that
it be called on a non-empty cache, not (I presume) that a
particular "init"/"rebuild"/"finish" sequence must necessarily
include the learning of some arbitrary message.

The desired logic to rebuild an existing cache is simply
something like:

> init_learner(...)
> rebuild_learner_caches(...)
> finish_learner(...)

So two possibilities would seem to be:

1. Let "init()" formally become "public" and be used as
a preamble to the above sequence. Might be as simple
as adjusting the comments in the source, and docs etc.

2. Let "init_learner()" itself do the relevant "init()"-like
things. Possibly using within it something like either:
$self->init(1);
or
$self->{bayes_scanner} = new Mail::SpamAssassin::Bayes($self);

Does that seem reasonable? I hope so, but let me know if and
how that is not the case.

Failing that, if we (from MailScanner: www.mailscanner.info) have
to use a "learn", is there some way of giving "learn" some sort
of null email, neither spam nor ham?

Thanks again.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 3217] SpamAssassin.pm: "bayes_scanner" may be undefined [ In reply to ]
http://bugzilla.spamassassin.org/show_bug.cgi?id=3217





------- Additional Comments From felicity@kluge.net 2004-03-29 08:15 -------
Subject: Re: SpamAssassin.pm: "bayes_scanner" may be undefined

On Mon, Mar 29, 2004 at 01:31:58AM -0800, bugzilla-daemon@bugzilla.spamassassin.org wrote:
> would require that a cache rebuild requires some message to be
> learned as a prelude to rebuilding the caches (which already
> contain many learned messages). This seems to be an unnecessary
> complication to a simple "rebuild" request.

In theory -- but when will a daemon issue a rebuild request? Conceptually
it could be advanced enough to do it when it decides is a good time to
do it when nothing else is going on, but then it'll also be advanced
enough to call init() IMHO.

All of the daemons I know of only do rebuilds when using the DB in
question, which is going to be after a learn()/check() run.

> 2. Let "init_learner()" itself do the relevant "init()"-like
> things. Possibly using within it something like either:
> $self->init(1);
> or
> $self->{bayes_scanner} = new Mail::SpamAssassin::Bayes($self);

It really needs to do the whole init() run if it hasn't already.
Otherwise the Bayes paths and such will all be unconfigured.

> Does that seem reasonable? I hope so, but let me know if and
> how that is not the case.
>
> Failing that, if we (from MailScanner: www.mailscanner.info) have
> to use a "learn", is there some way of giving "learn" some sort
> of null email, neither spam nor ham?

If you send in an undef message, learn() just aborts. I believe
everything else is set to handle the undef situation.

But if MailScanner is going to actually get into a situation where it
wants to rebuild on its own, we should be able to work something out
which isn't as hackish as the undef learn thing, but I'd like to know
what the general plan/thinking is for rebuilding w/out check/learn()...?

I'll look at the code and see what looks good. I'm not sure
init_learner() is a good place for an init() call, but it may be better
than the other available choice.





------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.