Mailing List Archive: Assymetry in learning spam and ham

Assymetry in learning spam and ham

Feb 25, 2004, 12:58 PM

Post #1 of 3 (390 views)

Howdy,

Just a question. For learning a message as spam (and reporting it), it
seems like this is adequate:

$f->report_as_spam( $mail )

However, for ham, it seems like you need to do this:

$f->init_learner;
$status = $f->learn( $mail );
$f->rebuild_learner_caches;
$f->finish_learner

report_as_spam() implies that it is learnt by Bayes, but I didn't see if
the learn() stuff is happening under the hood. It also appears the
learn() returns a status object, where report returns nothing terribly
useful.

Is the code in 2.63 doing the right thing? This is for a persistant
process, so I need to make sure I get it right.

Thanks,

David.

Re: Assymetry in learning spam and ham [ In reply to ]

mkettler at evi-inc

Feb 25, 2004, 4:18 PM

Post #2 of 3 (379 views)

Permalink

At 02:58 PM 2/25/2004, David Birnbaum wrote:
>Howdy,
>
>Just a question. For learning a message as spam (and reporting it), it
>seems like this is adequate:
>
> $f->report_as_spam( $mail )
>
>However, for ham, it seems like you need to do this:
>
> $f->init_learner;
> $status = $f->learn( $mail );
> $f->rebuild_learner_caches;
> $f->finish_learner
>
>report_as_spam() implies that it is learnt by Bayes, but I didn't see if
>the learn() stuff is happening under the hood. It also appears the
>learn() returns a status object, where report returns nothing terribly
>useful.

it's a bit convoluted if you as me, and it's contrary to the claims of
what you must do before calling various functions, but it appears to be
functional.

I also can't find a finish_learner call anywhere in the chain, but that
doesn't mean it's not there, somewhere in some long chain of sub calls.

Here's my short track through the 2.63 code:

report_as_spam in SpamAssassin.pm does definitely do learning. You can see
it call self->learn like this:

sub report_as_spam {

<snip - bunch of code>
# learn as spam if enabled
if ( $self->{conf}->{bayes_learn_during_report} ) {
$self->learn ($mail, undef, 1, 0);
}

And the learn subroutine calls self->init(1), which in turn does
init_learner. (Note that the comments around self->learn claim you need to
init the learner first, but apparently this isn't 100% true.. )

Now, the SpamAssassin.pm learn routine calls msg->learn_spam.

PerMsgLearner.pm 's learn_spam calls:

$self->{bayes_scanner}->learn (1, $self->{msg}, $id);

Re: Assymetry in learning spam and ham [ In reply to ]

davidb at pins

Feb 25, 2004, 7:59 PM

Post #3 of 3 (384 views)

Permalink

On Wed, 25 Feb 2004, Matt Kettler wrote:

> >Just a question. For learning a message as spam (and reporting it), it
> >seems like this is adequate:
> >
> > $f->report_as_spam( $mail )
> >
> >However, for ham, it seems like you need to do this:
> >
> > $f->init_learner;
> > $status = $f->learn( $mail );
> > $f->rebuild_learner_caches;
> > $f->finish_learner
>
> it's a bit convoluted if you as me, and it's contrary to the claims of
> what you must do before calling various functions, but it appears to be
> functional.

Yes...labyrinthan even. I'd really like to not worry about that stuff.
But for now I'll just do it this way. Perhaps the next time somebody
staggers through this code we can make a learn_as_(sp|h)am() routine that
just encapsulates the whole thing instead of adding the other stuff.

Does it seem to break anything if you call the init() functions multiple
times?

David.