We seeded the bayes db with manual training at the
beginning (few hundred messages each ham and spam
I think). Have been letting it auto-learn since
then. Is that a bad paradigm?
I also saw a score config for HABEAS_VIOLATOR, but
it wasn't triggered by our spam with habeas headers.
-glenn
Matt Kettler wrote:
> At 06:05 PM 3/10/2004, little@cs.ucsd.edu wrote:
>
>> We have auto-learn, and many spams don't make
>> a high enough score to be auto-learned as spam. In addition,
>> some spams actually score low enough (see the habeas problem
>> I mentioned earlier) to be auto-learned as ham :-(
>
>
> Autolearn is a good thing, but how much manual training are you doing?
>
> Autolearning alone as your sole source of bayes training is a very bad
> idea, and prone to disaster.
>
> I might also suggest the following to help mitigate some of the habeas
> damage:
>
> bayes_ignore_header X-Habeas-SWE-1
> bayes_ignore_header X-Habeas-SWE-2
> bayes_ignore_header X-Habeas-SWE-3
> bayes_ignore_header X-Habeas-SWE-4
> bayes_ignore_header X-Habeas-SWE-5
> bayes_ignore_header X-Habeas-SWE-6
> bayes_ignore_header X-Habeas-SWE-7
> bayes_ignore_header X-Habeas-SWE-8
> bayes_ignore_header X-Habeas-SWE-9
>
> This will make the bayes database never give ham nor spam points because
> an email has these headers.. since there's already a rule for them,
> there's no reason to give "double credit" and give them bayes
> consideration as well.
>
beginning (few hundred messages each ham and spam
I think). Have been letting it auto-learn since
then. Is that a bad paradigm?
I also saw a score config for HABEAS_VIOLATOR, but
it wasn't triggered by our spam with habeas headers.
-glenn
Matt Kettler wrote:
> At 06:05 PM 3/10/2004, little@cs.ucsd.edu wrote:
>
>> We have auto-learn, and many spams don't make
>> a high enough score to be auto-learned as spam. In addition,
>> some spams actually score low enough (see the habeas problem
>> I mentioned earlier) to be auto-learned as ham :-(
>
>
> Autolearn is a good thing, but how much manual training are you doing?
>
> Autolearning alone as your sole source of bayes training is a very bad
> idea, and prone to disaster.
>
> I might also suggest the following to help mitigate some of the habeas
> damage:
>
> bayes_ignore_header X-Habeas-SWE-1
> bayes_ignore_header X-Habeas-SWE-2
> bayes_ignore_header X-Habeas-SWE-3
> bayes_ignore_header X-Habeas-SWE-4
> bayes_ignore_header X-Habeas-SWE-5
> bayes_ignore_header X-Habeas-SWE-6
> bayes_ignore_header X-Habeas-SWE-7
> bayes_ignore_header X-Habeas-SWE-8
> bayes_ignore_header X-Habeas-SWE-9
>
> This will make the bayes database never give ham nor spam points because
> an email has these headers.. since there's already a rule for them,
> there's no reason to give "double credit" and give them bayes
> consideration as well.
>