found a fix to the problem;
the "QueryParser.jj" in rc2 does not accept unicode, version 1.6 in cvs
does, so i replaced the file with the newest one from cvs and also had to
include the FastCharStream.java class to compile.
then i just had to force-convert the querystring that came from the browser
to utf-8 and it worked (guess the browser sent the string as ascii!!! i'm so
happy and thanks to you both jonas and david!!
String query = this.request.getParameter( "query" );
if( query!=null ) {
query = new String( query.getBytes(), "UTF-8" );
}
mvh karl øie/gan media
-----Original Message-----
From: Jonas Bechlund [mailto:jonas.bechlund@framfab.dk]
Sent: 27. november 2001 13:52
To: 'Lucene Users List'
Subject: RE: scandinavian characters.
Hi Karl,
It is a little bit tricky - but when you get the idea it is not that bad...
I had the same problem with the danish characters. I made changes TOKEN
definition in the "Token Definitions" section of the file "QueryParser.jj"
and that actually solved the problem. One minor detail is that you have to
rebuild the jar file with ANT. (See build.txt for instructions)
I guess that solves your problem,
Regards,
/ Jonas
-----Original Message-----
From: Karl Øie [mailto:karl@gan.no]
Sent: 27 November 2001 13:01
To: Lucene Users List
Subject: RE: scandinavian characters.
there must be something seriously broken with the queryparse code.
if a query starts with ø/æ/å (ø, &oaelig;, å) then an exception
in the queryparser occurs.
org.apache.lucene.queryParser.TokenMgrError: Lexical error at line 1, column
1. Encountered: "\u00c3" (195), after : ""
at
org.apache.lucene.queryParser.QueryParserTokenManager.getNextToken(Unknown
Source)
at org.apache.lucene.queryParser.QueryParser.jj_ntk(Unknown Source)
at org.apache.lucene.queryParser.QueryParser.Modifiers(Unknown
Source)
at org.apache.lucene.queryParser.QueryParser.Query(Unknown Source)
at org.apache.lucene.queryParser.QueryParser.parse(Unknown Source)
at org.apache.lucene.queryParser.QueryParser.parse(Unknown Source)
but if the query contains ø/æ/å (ø, &oaelig;, å) then it is
translated wrongly into the swedish/german ä regardless of what
character it was.
if someone could point me to where to start I could try to find the problem
because I guess it is errorous unicode translation...
mvh karl
>no it's even stranger than that, i have decoded the querystring, the
problem
>is that it seems like something is changed on the way in. if i search for
>"fjøs" (fjøs) i get the swedish "fjä" (fjÄ). Where ø is
>changed to Ä and 's' is removed.
>
>is the querystring translated some where?
>
>mvh karl øie
> -----Original Message-----
> From: David Bonilla [mailto:david@bit-bang.com]
> Sent: 27. november 2001 10:43
> To: Lucene Users List; karl@gan.no
> Subject: Re: scandinavian characters.
>
>
> Hi Karl !!!
>
> I´m spanish and I have a lot of problems programming with our not english
>characters. I use LUCENE with spanish accents and it works fine...
>
> Have you tried to use the java.net.URLEncoder and java.net.URLDecoder
with
>your fields to index ?
>
> Best Regards from Spain !
> __________________________
> David Bonilla Fuertes
> THE BIT BANG NETWORK
> http://www.bit-bang.com
> Profesor Waksman, 8, 6º B
> 28036 Madrid
> SPAIN
> Tel.: (+34) 914 577 747
> Móvil: 656 62 83 92
> Fax: (+34) 914 586 176
> __________________________
--
To unsubscribe, e-mail:
<mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail:
<mailto:lucene-user-help@jakarta.apache.org>
--
To unsubscribe, e-mail:
<mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail:
<mailto:lucene-user-help@jakarta.apache.org>
--
To unsubscribe, e-mail:
<mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail:
<mailto:lucene-user-help@jakarta.apache.org>
--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>