Mailing List Archive

Snowball Java EnglishStemmer: Porter or Porter2?
Does the java-version of Snowball employ Porter or Porter2 stemming
algorithm in its EnglishStemmer available from the Lucene Sandbox? If it is
Porter2, I should get the word "his" indexed as "his" not as "hi" as it does
at the moment.

Regards,
Steve Legrand

_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today it's FREE!
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/
Re: Snowball Java EnglishStemmer: Porter or Porter2? [ In reply to ]
On May 22, 2005, at 1:53 PM, Steve Legrand wrote:

> Does the java-version of Snowball employ Porter or Porter2 stemming
> algorithm in its EnglishStemmer available from the Lucene Sandbox?
> If it is Porter2, I should get the word "his" indexed as "his" not
> as "hi" as it does at the moment.

I don't know the specifics of which algorithm, but there are three
different SnowballAnalyzer stemmers for English - "English", "Lovins"
and "Porter. I just ran each of the English stemmers with the
AnalyzerDemo and got this output analyzing the string "his hiss
history":

SnowballAnalyzer: // English
[his] [hiss] [histori]

SnowballAnalyzer: // Lovins
[his] [his] [history]

SnowballAnalyzer: // Porter
[hi] [hiss] [histori]

Only the "Lovins" one does what seems to be the right thing with
"his", except that it does a bad job with words like "country" and
"countries".

Erik
Re: Snowball Java EnglishStemmer: Porter or Porter2? [ In reply to ]
Thanks, Eric

I debugged my code and noticed that I had indexed one set of my files using
the older PorterAnalyzer and did the search with the SnowballAnalyzer. Now I
have the Snowball´s Porter algorithm (net.sf.snowball) in both indexing and
search in all the file sets and everything works fine.

Cheerio, Steve

Steve Legrand

>
>On May 22, 2005, at 1:53 PM, Steve Legrand wrote:
>
>>Does the java-version of Snowball employ Porter or Porter2 stemming
>>algorithm in its EnglishStemmer available from the Lucene Sandbox? If it
>>is Porter2, I should get the word "his" indexed as "his" not as "hi" as
>>it does at the moment.
>
>I don't know the specifics of which algorithm, but there are three
>different SnowballAnalyzer stemmers for English - "English", "Lovins" and
>"Porter. I just ran each of the English stemmers with the AnalyzerDemo
>and got this output analyzing the string "his hiss history":
>
> SnowballAnalyzer: // English
> [his] [hiss] [histori]
>
> SnowballAnalyzer: // Lovins
> [his] [his] [history]
>
> SnowballAnalyzer: // Porter
> [hi] [hiss] [histori]
>
>Only the "Lovins" one does what seems to be the right thing with "his",
>except that it does a bad job with words like "country" and "countries".
>
> Erik
>

_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today it's FREE!
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/