Mailing List Archive

DO NOT REPLY [Bug 7412] New: - GermanStemFilter setting wrong values for startoffset/endoffset of stemmed tokens
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=7412>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=7412

GermanStemFilter setting wrong values for startoffset/endoffset of stemmed tokens

Summary: GermanStemFilter setting wrong values for
startoffset/endoffset of stemmed tokens
Product: Lucene
Version: CVS Nightly - Specify date in submission
Platform: PC
OS/Version: Linux
Status: NEW
Severity: Normal
Priority: Other
Component: Analysis
AssignedTo: lucene-dev@jakarta.apache.org
ReportedBy: reyes@charabia.net


The GermanStemFilter sets wrong values to the new Token object created when the
stemmer succeeds in stemming the termText() string. Bug found in 1.2-RC5-dev

-----------------
Example, for the processing of the string "this is a simple test":
token : thi (0,3)
token : is (5,7)
token : a (8,9)
token : simpl (0,5)
token : test (17,21)

(all the stemmed tokens have wrong start/end offsets).

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>