Mailing List Archive

Problem with Unicode !!!
Hi,

I am working on lucene to index unicode content. I am
facing the following problems .

1) I am creating a index where i am adding two fields
in the index without specifying any encoding. one
field is title and the other is body

e.g :- doc.add(Field.Text("body",(Reader)isr));
where isr is my InputStream Reader.

Now i am able to search for the words but the title
when displayed in the browser shows junk. inspite of
the correct encoding in the browser (UTF-8).

2) I also tried specifying enconding in the
InputStreamReader . In this case the title comes
properly but i am not able to search non english
words.

I am trying this on a win2000 machine.

I would really appriciate some help

TIA,

Regards
Harpreet.

__________________________________________________
Do You Yahoo!?
Yahoo! Greetings - send holiday greetings for Easter, Passover
http://greetings.yahoo.com/

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
Re: Problem with Unicode !!! [ In reply to ]
Hi,

We are currently testing with plain text unicode files made in windows 2000
notepad. The files are in English and devanagri (Hindi) .
Now , the english words when shown in the results come properly but the
devanagri words come as junk although they are searchable.

As far as encoding goes we have saved the files in utf-8 in windows . Also
we have not worked extensively on lucene so we would be greatful if you can
tell us how to check the encoding of the lucene index file.

Thanks And Regards
Harpreet


> Hi,
>
> Please give an example of what you have indexed and what you are searching
> on.
>
> Also, are you sure the text you are searching for has been encoded
properly?
>
> --Peter
>
>
> On 4/1/02 3:59 AM, "eeed wewefwf" <xalanlist@yahoo.com> wrote:
>
> > Hi,
> >
> > I am working on lucene to index unicode content. I am
> > facing the following problems .
> >
> > 1) I am creating a index where i am adding two fields
> > in the index without specifying any encoding. one
> > field is title and the other is body
> >
> > e.g :- doc.add(Field.Text("body",(Reader)isr));
> > where isr is my InputStream Reader.
> >
> > Now i am able to search for the words but the title
> > when displayed in the browser shows junk. inspite of
> > the correct encoding in the browser (UTF-8).
> >
> > 2) I also tried specifying enconding in the
> > InputStreamReader . In this case the title comes
> > properly but i am not able to search non english
> > words.
> >
> > I am trying this on a win2000 machine.
> >
> > I would really appriciate some help
> >
> > TIA,
> >
> > Regards
> > Harpreet.
> >
> > __________________________________________________
> > Do You Yahoo!?
> > Yahoo! Greetings - send holiday greetings for Easter, Passover
> > http://greetings.yahoo.com/
> >
> > --
> > To unsubscribe, e-mail:
<mailto:lucene-user-unsubscribe@jakarta.apache.org>
> > For additional commands, e-mail:
<mailto:lucene-user-help@jakarta.apache.org>
> >
> >
>
>
> --
> To unsubscribe, e-mail:
<mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
<mailto:lucene-user-help@jakarta.apache.org>


--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
Re: Problem with Unicode !!! [ In reply to ]
Hi,

Please give an example of what you have indexed and what you are searching
on.

Also, are you sure the text you are searching for has been encoded properly?

--Peter


On 4/1/02 3:59 AM, "eeed wewefwf" <xalanlist@yahoo.com> wrote:

> Hi,
>
> I am working on lucene to index unicode content. I am
> facing the following problems .
>
> 1) I am creating a index where i am adding two fields
> in the index without specifying any encoding. one
> field is title and the other is body
>
> e.g :- doc.add(Field.Text("body",(Reader)isr));
> where isr is my InputStream Reader.
>
> Now i am able to search for the words but the title
> when displayed in the browser shows junk. inspite of
> the correct encoding in the browser (UTF-8).
>
> 2) I also tried specifying enconding in the
> InputStreamReader . In this case the title comes
> properly but i am not able to search non english
> words.
>
> I am trying this on a win2000 machine.
>
> I would really appriciate some help
>
> TIA,
>
> Regards
> Harpreet.
>
> __________________________________________________
> Do You Yahoo!?
> Yahoo! Greetings - send holiday greetings for Easter, Passover
> http://greetings.yahoo.com/
>
> --
> To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
>
>


--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
Re: Problem with Unicode !!! [ In reply to ]
Please set the year on your computer to this year (2002), not last.


On Sunday, April 1, 2001, at 12:05 PM, Harpreet S Walia wrote:
...

~ David Smiley


--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>