Mailing List Archive

lucene web-app & russian language
Hi! I was trying the lucene web-app (lucene-1.2-rc5-dev.jar). I've created
and indexed a simple html document with both english and russian words. it
was ANSI encoded, if I check _3.fdt from created index, I can see my
document indexed and both russian and english terms indexed (it opens in utf
encoding, i suppose). but the problem starts when searching. If i search
with russian word, it returns nothing, if I search with engglish, it returns
a result, but all russian words are returned as ? signs. I've changed .jsp
contenttypes to return in UTF-8 encoding, but the resukt is still the same.

So, finally, does Lucene those multilingual search or not? What am I doing
wrong? I am trying to make it work since version 1.0 with russian docs, but
still no idea and no resutls :((((((


--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
Re: lucene web-app & russian language [ In reply to ]
Philipp,

>Hi! I was trying the lucene web-app (lucene-1.2-rc5-dev.jar). I've created
>and indexed a simple html document with both english and russian words. it
>was ANSI encoded, if I check _3.fdt from created index, I can see my
>document indexed and both russian and english terms indexed (it opens in utf
>encoding, i suppose). but the problem starts when searching. If i search
>with russian word, it returns nothing, if I search with engglish, it returns
>a result, but all russian words are returned as ? signs. I've changed .jsp
>contenttypes to return in UTF-8 encoding, but the resukt is still the same.
>
>So, finally, does Lucene those multilingual search or not? What am I doing
>wrong? I am trying to make it work since version 1.0 with russian docs, but
>still no idea and no resutls :((((((

Did you read the FAQ on the use of the StandardAnalyzer during indexing
and query parsing? You might need to replace it with a RussianAnalyzer
which you'll have to make yourself when no one has done this before
you. Have a look at the GermanAnalyzer for some inspiration.

Good luck,
Ype

--

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
Re: lucene web-app & russian language [ In reply to ]
Hi,

Sorry, Lucene supports other languages but the webapp was written to
English. Change out the analyzer. If you can adapt it to make it
configurable I'd be happy to adapt both the "getting started guide" and
commit the changes.

Thanks,

Andy

On Fri, 2002-03-01 at 15:49, Ype Kingma wrote:
> Philipp,
>
> >Hi! I was trying the lucene web-app (lucene-1.2-rc5-dev.jar). I've created
> >and indexed a simple html document with both english and russian words. it
> >was ANSI encoded, if I check _3.fdt from created index, I can see my
> >document indexed and both russian and english terms indexed (it opens in utf
> >encoding, i suppose). but the problem starts when searching. If i search
> >with russian word, it returns nothing, if I search with engglish, it returns
> >a result, but all russian words are returned as ? signs. I've changed .jsp
> >contenttypes to return in UTF-8 encoding, but the resukt is still the same.
> >
> >So, finally, does Lucene those multilingual search or not? What am I doing
> >wrong? I am trying to make it work since version 1.0 with russian docs, but
> >still no idea and no resutls :((((((
>
> Did you read the FAQ on the use of the StandardAnalyzer during indexing
> and query parsing? You might need to replace it with a RussianAnalyzer
> which you'll have to make yourself when no one has done this before
> you. Have a look at the GermanAnalyzer for some inspiration.
>
> Good luck,
> Ype
>
> --
>
> --
> To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
>
--
http://www.superlinksoftware.com
http://jakarta.apache.org - port of Excel/Word/OLE 2 Compound Document
format to java
http://developer.java.sun.com/developer/bugParade/bugs/4487555.html
- fix java generics!
The avalanche has already started. It is too late for the pebbles to
vote.
-Ambassador Kosh


--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
Re: lucene web-app & russian language [ In reply to ]
Ok, I'll try to make the russian analyzer and report to you in 2-3 days.
Hope, about success. But if i fail, I'll report anyway :)


----- Original Message -----
From: "Andrew C. Oliver" <acoliver@apache.org>
To: "Lucene Users List" <lucene-user@jakarta.apache.org>
Sent: Saturday, March 02, 2002 9:28 PM
Subject: Re: lucene web-app & russian language


> Hi,
>
> Sorry, Lucene supports other languages but the webapp was written to
> English. Change out the analyzer. If you can adapt it to make it
> configurable I'd be happy to adapt both the "getting started guide" and
> commit the changes.
>
> Thanks,
>
> Andy
>
> On Fri, 2002-03-01 at 15:49, Ype Kingma wrote:
> > Philipp,
> >
> > >Hi! I was trying the lucene web-app (lucene-1.2-rc5-dev.jar). I've
created
> > >and indexed a simple html document with both english and russian words.
it
> > >was ANSI encoded, if I check _3.fdt from created index, I can see my
> > >document indexed and both russian and english terms indexed (it opens
in utf
> > >encoding, i suppose). but the problem starts when searching. If i
search
> > >with russian word, it returns nothing, if I search with engglish, it
returns
> > >a result, but all russian words are returned as ? signs. I've changed
.jsp
> > >contenttypes to return in UTF-8 encoding, but the resukt is still the
same.
> > >
> > >So, finally, does Lucene those multilingual search or not? What am I
doing
> > >wrong? I am trying to make it work since version 1.0 with russian docs,
but
> > >still no idea and no resutls :((((((
> >
> > Did you read the FAQ on the use of the StandardAnalyzer during indexing
> > and query parsing? You might need to replace it with a RussianAnalyzer
> > which you'll have to make yourself when no one has done this before
> > you. Have a look at the GermanAnalyzer for some inspiration.
> >
> > Good luck,
> > Ype
> >
> > --
> >
> > --
> > To unsubscribe, e-mail:
<mailto:lucene-user-unsubscribe@jakarta.apache.org>
> > For additional commands, e-mail:
<mailto:lucene-user-help@jakarta.apache.org>
> >
> --
> http://www.superlinksoftware.com
> http://jakarta.apache.org - port of Excel/Word/OLE 2 Compound Document
> format to java
> http://developer.java.sun.com/developer/bugParade/bugs/4487555.html
> - fix java generics!
> The avalanche has already started. It is too late for the pebbles to
> vote.
> -Ambassador Kosh
>
>
> --
> To unsubscribe, e-mail:
<mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
<mailto:lucene-user-help@jakarta.apache.org>
>


--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>