Real morphology (finding the root for all the forms of a word) in
Russian might not be that easy since in Russian you have both prefixes
(aspect) and suffixes (case, number, conjugation) that inflect a word.
But, there are already efforts to write stemmers (suffix strippers) for
Russian following Porter's model. SNOWBALL (for SNOBOL) is a formal
language which has found it's main use in writing stemmers for different
languages. Until now there are rule sets for Danish, Dutch, English,
French, German, Italian, Norwegian, Portuguese, Russian, Spanish and
Swedish.
Sometimes ago, somebody posted an French stemmer built from SNOWBALL. It
seems straightforward to convert all these stemmers to Lucene and maybe
include them in the package.
The site for SNOWBALL is snowball.sf.net. The latest version of their
compiler outputs Java code. I am attaching the Russian SNOWBALL file and
its corresponding Java output. This is just the stemmer though and does
not include the needed code for interfacing with Lucene.
Best,
Alex
-----Original Message-----
From: Philipp Chudinov [mailto:morpheus@basko.ru]
Sent: Thursday, March 07, 2002 1:21 AM
To: Lucene Users List
Subject: Re: Support for russian morphology in Lucene
its mei :) having no ideas about morphology and great wishes to use
lucene in russian. nice to see you here. maybe we should try to do
things together.
----- Original Message -----
From: "Vadim Solonovich" <vsolon@park.ru>
To: "Lucene Developers List" <lucene-dev@jakarta.apache.org>
Cc: "Lucene Users List" <lucene-user@jakarta.apache.org>
Sent: Thursday, March 07, 2002 6:40 AM
Subject: Support for russian morphology in Lucene
> Hi All !
>
> Is there anybody who have any ideas about implementing russian
> morphology
in Lucene ?
> Please, let me know.
>
> Thanks in advance.
>
> Vadim Solonovich,
> mailto:vsolon@park.ru
> http://www.park.ru
> http://garant.park.ru
--
To unsubscribe, e-mail:
<mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail:
<mailto:lucene-user-help@jakarta.apache.org>