Mailing List Archive

RE: scandinavian characters. //from Lucene users maillist
they can be found here:
http://cvs.apache.org/viewcvs/jakarta-lucene/src/java/org/apache/lucene/quer
yParser/ , and both files sould be put in your
jakarta-lucene../src/java/org/apache/lucene/queryParser/ directory. then you
will have to recompile , i used "ant clean jar" and then you get a new jar
file in bin/

mvh karl oie



-----Original Message-----
From: Philipp Chudinov [mailto:morpheus@basko.ru]
Sent: 28. november 2001 12:40
To: karl@gan.no
Subject: Re: scandinavian characters. //from Lucene users maillist


Hi, Karl!
I've faced with the same problem (trying to search russian documents). Now
Iam trying to repeat your steps. I've included new version of
QueryParser.jj, but could'nt you describe, how to "include the
FastCharStream.java class to compile." - where it is and where I should
include this? Thanks.

Philipp



----- Original Message -----
From: "Karl Øie" <karl@gan.no>
To: "Lucene Users List" <lucene-user@jakarta.apache.org>
Sent: Tuesday, November 27, 2001 8:34 PM
Subject: RE: scandinavian characters.


> found a fix to the problem;
>
> the "QueryParser.jj" in rc2 does not accept unicode, version 1.6 in cvs
> does, so i replaced the file with the newest one from cvs and also had to
> include the FastCharStream.java class to compile.
>
> then i just had to force-convert the querystring that came from the
browser
> to utf-8 and it worked (guess the browser sent the string as ascii!!! i'm
so
> happy and thanks to you both jonas and david!!
>
>
>
> String query = this.request.getParameter( "query" );
> if( query!=null ) {
> query = new String( query.getBytes(), "UTF-8" );
> }
>
>
>
> mvh karl øie/gan media
>
>
>
>
>
> -----Original Message-----
> From: Jonas Bechlund [mailto:jonas.bechlund@framfab.dk]
> Sent: 27. november 2001 13:52
> To: 'Lucene Users List'
> Subject: RE: scandinavian characters.
>
>
> Hi Karl,
>
> It is a little bit tricky - but when you get the idea it is not that
bad...
>
> I had the same problem with the danish characters. I made changes TOKEN
> definition in the "Token Definitions" section of the file "QueryParser.jj"
> and that actually solved the problem. One minor detail is that you have to
> rebuild the jar file with ANT. (See build.txt for instructions)
>
> I guess that solves your problem,
> Regards,
> / Jonas
>
> -----Original Message-----
> From: Karl Øie [mailto:karl@gan.no]
> Sent: 27 November 2001 13:01
> To: Lucene Users List
> Subject: RE: scandinavian characters.
>
>
> there must be something seriously broken with the queryparse code.
>
> if a query starts with ø/æ/å (&oslash;, &oaelig;, &aring;) then an
exception
> in the queryparser occurs.
>
> org.apache.lucene.queryParser.TokenMgrError: Lexical error at line 1,
column
> 1. Encountered: "\u00c3" (195), after : ""
> at
> org.apache.lucene.queryParser.QueryParserTokenManager.getNextToken(Unknown
> Source)
> at org.apache.lucene.queryParser.QueryParser.jj_ntk(Unknown Source)
> at org.apache.lucene.queryParser.QueryParser.Modifiers(Unknown
> Source)
> at org.apache.lucene.queryParser.QueryParser.Query(Unknown Source)
> at org.apache.lucene.queryParser.QueryParser.parse(Unknown Source)
> at org.apache.lucene.queryParser.QueryParser.parse(Unknown Source)
>
> but if the query contains ø/æ/å (&oslash;, &oaelig;, &aring;) then it is
> translated wrongly into the swedish/german &auml; regardless of what
> character it was.
>
> if someone could point me to where to start I could try to find the
problem
> because I guess it is errorous unicode translation...
>
>
> mvh karl
>
>
>
> >no it's even stranger than that, i have decoded the querystring, the
> problem
> >is that it seems like something is changed on the way in. if i search for
> >"fjøs" (fj&oslash;s) i get the swedish "fjä" (fj&Auml;). Where &oslash;
is
> >changed to &Auml; and 's' is removed.
> >
> >is the querystring translated some where?
> >
> >mvh karl øie
> > -----Original Message-----
> > From: David Bonilla [mailto:david@bit-bang.com]
> > Sent: 27. november 2001 10:43
> > To: Lucene Users List; karl@gan.no
> > Subject: Re: scandinavian characters.
> >
> >
> > Hi Karl !!!
> >
> > Im spanish and I have a lot of problems programming with our not
english
> >characters. I use LUCENE with spanish accents and it works fine...
> >
> > Have you tried to use the java.net.URLEncoder and java.net.URLDecoder
> with
> >your fields to index ?
> >
> > Best Regards from Spain !
> > __________________________
> > David Bonilla Fuertes
> > THE BIT BANG NETWORK
> > http://www.bit-bang.com
> > Profesor Waksman, 8, 6 B
> > 28036 Madrid
> > SPAIN
> > Tel.: (+34) 914 577 747
> > Móvil: 656 62 83 92
> > Fax: (+34) 914 586 176
> > __________________________
>
>
>
>
> --
> To unsubscribe, e-mail:
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>
>
> --
> To unsubscribe, e-mail:
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>
>
>
> --
> To unsubscribe, e-mail:
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>
>
>
> --
> To unsubscribe, e-mail:
<mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
<mailto:lucene-user-help@jakarta.apache.org>
>


--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>