Mailing List Archive

[Bug 3230] New: Bayes tries to learn zero-length tokens from decomposition.
http://bugzilla.spamassassin.org/show_bug.cgi?id=3230

Summary: Bayes tries to learn zero-length tokens from
decomposition.
Product: Spamassassin
Version: SVN Trunk (Latest Devel Version)
Platform: Other
OS/Version: other
Status: NEW
Severity: normal
Priority: P5
Component: Learner
AssignedTo: spamassassin-dev@incubator.apache.org
ReportedBy: koppel@ece.lsu.edu


Bayes.pm::tokenize line creates new tokens by decomposing (removing
most non-word characters) from words (for lack of a better word) found
in the message. The new token is used even if it's of zero length.
One symptom is warnings like
"Bayes journal: gibberish entry found: t 1080642675 "

The following simple patch fixes the problem.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.