Mailing List Archive

cross-lingual IR
Dear All,

I would like to implement a cross-lingual IR system with support for Persian and English languages for an academic research task. How can I use Lucene for my task? How shall I proceed? what are the requirements?

Regards,
Farzad

---------------------------------
Pinpoint customers who are looking for what you sell.
Re: cross-lingual IR [ In reply to ]
Hi Farzad,

Hmmm, where to begin... This is a tough question and one that
warrants a fair amount of research. I would start by taking a look
at the TREC cross-language tracks and the CLEF conference.

I have used Lucene to index/search both the English and Arabic/French/
Spanish/Dutch/etc. documents. In general, you need some way of
transforming a source language query into a target language query OR
you need some way of automatically translating all your documents to
the same language. How you do this is really the matter of research,
eh? The most basic approach to the query transformation problem is
to use a dictionary to look up the terms from the source and get the
target language equivalents.

As for Lucene, you will need an Analyzer that handles Persian (try
googling "Persian Lucene Analyzer") you may very well have to write
your own. The actual indexing and search tasks are relatively
straightforward as Lucene tasks and there a number of good tutorials
and books on how to do that.

Good luck,
Grant

On Aug 13, 2007, at 6:30 AM, Farzad Mahdikhani wrote:

> Dear All,
>
> I would like to implement a cross-lingual IR system with support
> for Persian and English languages for an academic research task.
> How can I use Lucene for my task? How shall I proceed? what are the
> requirements?
>
> Regards,
> Farzad
>
> ---------------------------------
> Pinpoint customers who are looking for what you sell.

--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ