Mailing List Archive

Finnish Stemmer / Ananlyzer source code
Appologies for submitting my source code in a rushed manner - but I have been
very busy leading up to my maternity leave. I am about to take leave for
approximately 6 months, but I would like to submit some of the source code I
have developed recently to use Lucene for web sites written in Finnish.

This mail contains a zip file which contains the relevant java classes, some
design documentation and also some C code - since our solution involved using
some third party commercial software called Morfo by a company called Kielikone.

To make use of this Finnish Analyzer, we also had to make 2 relatively small
changes to the lucene core code. This was to allow more than one token to be
stored at the same postion within a document. These changes may be useful to
others and I would be grateful if someone could consider adding them to he
lucene core code.

The 2 lucene files that were modified are :
/analysis/TokenStream.java
/index/DocumetWriter.java

Our updated versions are also attached.

I have submitted this code at the last minute - so unfortunately, I shall not
be able to respond to any queries. But perhaps one day someone will search the
archive for a Finnish Stemmer and find these attachments useful ?

Kind regards

Joanne Sproston
Teamware Group
joanne.sproston@teamware.co.uk
phone: +44 (0)1782 794879 fax: +44 (0)1782 776667

intra / extra / Internet solutions at www.teamware.com