Mailing List Archive

Bug in german stemmer ?
Hi all,

I am using the lucene German Stemmer/Analyzer. There seems to be a bug
within the GermanStemmer class. As far as i understand the algorithm the
count variable "substCount"
should be set to 0 before processing the next token.
In the current implementation, the stemmed result will differ for the same
terms after a while.
The easiest solution would be to reset that counter variable in the method:
"private StringBuffer substitute( StringBuffer buffer )" .

best regards
Bernhard


--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
Re: Bug in german stemmer ? [ In reply to ]
This email sounds right. substCount variable always increases and
never gets reset to zero and it seems that it should be reset before
every substitution, so that its value reflects the number of characters
substituted in each token.

I will commit the fix now.
Gerhard, please correct me if I'm wrong.

Thanks,
Otis



--- Bernhard Messer <Bernhard.Messer@intrafind.de> wrote:
> Hi all,
>
> I am using the lucene German Stemmer/Analyzer. There seems to be a
> bug
> within the GermanStemmer class. As far as i understand the algorithm
> the
> count variable "substCount"
> should be set to 0 before processing the next token.
> In the current implementation, the stemmed result will differ for the
> same
> terms after a while.
> The easiest solution would be to reset that counter variable in the
> method:
> "private StringBuffer substitute( StringBuffer buffer )" .
>
> best regards
> Bernhard
>
>
> --
> To unsubscribe, e-mail:
> <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-dev-help@jakarta.apache.org>
>


__________________________________________________
Do You Yahoo!?
Send FREE Valentine eCards with Yahoo! Greetings!
http://greetings.yahoo.com

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
Re: Bug in german stemmer ? [ In reply to ]
Hi,

Otis Gospodnetic wrote:
> This email sounds right. substCount variable always increases and
> never gets reset to zero and it seems that it should be reset before
> every substitution, so that its value reflects the number of characters
> substituted in each token.
>
> I will commit the fix now.
> Gerhard, please correct me if I'm wrong.

The fix is correct. Sorry that I did not fix it, I wanted to commit
it with other changes to the Stemmer and Filter.
Unfortunatly, I have some serious Problems (CeBIT comes closer)
and therefore I have not much spare time.


Ciao,
Gerhard


> Thanks,
> Otis
>
> --- Bernhard Messer <Bernhard.Messer@intrafind.de> wrote:
> > Hi all,
> >
> > I am using the lucene German Stemmer/Analyzer. There seems to be a
> > bug
> > within the GermanStemmer class. As far as i understand the algorithm
> > the
> > count variable "substCount"
> > should be set to 0 before processing the next token.
> > In the current implementation, the stemmed result will differ for the
> > same
> > terms after a while.
> > The easiest solution would be to reset that counter variable in the
> > method:
> > "private StringBuffer substitute( StringBuffer buffer )" .
> >
> > best regards
> > Bernhard
> >
> >
> > --
> > To unsubscribe, e-mail:
> > <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> > For additional commands, e-mail:
> > <mailto:lucene-dev-help@jakarta.apache.org>
> >
>
> __________________________________________________
> Do You Yahoo!?
> Send FREE Valentine eCards with Yahoo! Greetings!
> http://greetings.yahoo.com
>
> --
> To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>