Mailing List Archive

Support for russian morphology in Lucene
Hi All !

Is there anybody who have any ideas about implementing russian morphology in Lucene ?
Please, let me know.

Thanks in advance.

Vadim Solonovich,
mailto:vsolon@park.ru
http://www.park.ru
http://garant.park.ru
Re: Support for russian morphology in Lucene [ In reply to ]
its mei :) having no ideas about morphology and great wishes to use lucene
in russian. nice to see you here. maybe we should try to do things together.

----- Original Message -----
From: "Vadim Solonovich" <vsolon@park.ru>
To: "Lucene Developers List" <lucene-dev@jakarta.apache.org>
Cc: "Lucene Users List" <lucene-user@jakarta.apache.org>
Sent: Thursday, March 07, 2002 6:40 AM
Subject: Support for russian morphology in Lucene


> Hi All !
>
> Is there anybody who have any ideas about implementing russian morphology
in Lucene ?
> Please, let me know.
>
> Thanks in advance.
>
> Vadim Solonovich,
> mailto:vsolon@park.ru
> http://www.park.ru
> http://garant.park.ru


--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
RE: Support for russian morphology in Lucene [ In reply to ]
Real morphology (finding the root for all the forms of a word) in
Russian might not be that easy since in Russian you have both prefixes
(aspect) and suffixes (case, number, conjugation) that inflect a word.
But, there are already efforts to write stemmers (suffix strippers) for
Russian following Porter's model. SNOWBALL (for SNOBOL) is a formal
language which has found it's main use in writing stemmers for different
languages. Until now there are rule sets for Danish, Dutch, English,
French, German, Italian, Norwegian, Portuguese, Russian, Spanish and
Swedish.

Sometimes ago, somebody posted an French stemmer built from SNOWBALL. It
seems straightforward to convert all these stemmers to Lucene and maybe
include them in the package.

The site for SNOWBALL is snowball.sf.net. The latest version of their
compiler outputs Java code. I am attaching the Russian SNOWBALL file and
its corresponding Java output. This is just the stemmer though and does
not include the needed code for interfacing with Lucene.

Best,

Alex

-----Original Message-----
From: Philipp Chudinov [mailto:morpheus@basko.ru]
Sent: Thursday, March 07, 2002 1:21 AM
To: Lucene Users List
Subject: Re: Support for russian morphology in Lucene


its mei :) having no ideas about morphology and great wishes to use
lucene in russian. nice to see you here. maybe we should try to do
things together.

----- Original Message -----
From: "Vadim Solonovich" <vsolon@park.ru>
To: "Lucene Developers List" <lucene-dev@jakarta.apache.org>
Cc: "Lucene Users List" <lucene-user@jakarta.apache.org>
Sent: Thursday, March 07, 2002 6:40 AM
Subject: Support for russian morphology in Lucene


> Hi All !
>
> Is there anybody who have any ideas about implementing russian
> morphology
in Lucene ?
> Please, let me know.
>
> Thanks in advance.
>
> Vadim Solonovich,
> mailto:vsolon@park.ru
> http://www.park.ru
> http://garant.park.ru


--
To unsubscribe, e-mail:
<mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail:
<mailto:lucene-user-help@jakarta.apache.org>
Re: Support for russian morphology in Lucene [ In reply to ]
Hi !

Sorry for delay in answer.
Recently I have found that Russian and Ukrainian stemmer for Lucene can be implemented based on Andrew Kovalenko (http://linguist.nm.ru) stem library, which is free. Though it is not 100% pure Java solution, this library is compiled on multiple platforms and will be available soon on SourceForge. Andrew Kovalenko's code works in many russian projects such as Aport search engine, Rambler and MediaLingua products. So I think of using his code in my own project in common with Lucene.

And now I have a question. Does anybody know how to deal with multiple word stems in Lucene ?

Thanks in advance.

Vadim Solonovich,
mailto:vsolon@park.ru
http://www.park.ru
http://garant.park.ru

----- Original Message -----
From: "Philipp Chudinov" <morpheus@basko.ru>
To: "Lucene Users List" <lucene-user@jakarta.apache.org>
Sent: Thursday, March 07, 2002 9:20 AM
Subject: Re: Support for russian morphology in Lucene


> its mei :) having no ideas about morphology and great wishes to use lucene
> in russian. nice to see you here. maybe we should try to do things together.
>