On Tue, May 21, 2002 at 10:38:18AM -0700, Brion L. VIBBER wrote:
> On mar, 2002-05-21 at 06:28, Kurt Jansson wrote:
> > > I have set up the test site at http://test-de.wikipedia.com/,
> >
> > Thanks Jason! I informed the German Wikipedians and hope they'll start
> > playing around with it soon.
> >
> > I wrote down some bugs I found on
> > http://test-de.wikipedia.com/wiki/wikipedia:Beobachtete+Fehler
> >
> > But that's in German, and I don't know anybody else but Magnus will
> > understand it. So shell I post a (bad) translated report to wikitech-l
> > from time to time?
> >
> > * The search for words with umlauts doesn't seem to work. The script
> > doesn't find them,
>
> Hmm, looks like the search needs another drubbing. I believe it's
> chopping up words at the boundaries of non-ASCII characters; if your
> word is big enough on either end (übersetzung, terroranschläge) this
> does a good enough job anyway, but it's hardly the way it should work!
It doesn't (unless you have changed this in my code, I didn't check). The
parsing function usses the PERL regular expression \w to decide if something
is legal in a search word or not. If it thinks a character is illegal it
gives an error so you should have noticed that. If the result is empty and
you didn't get a syntax error there can be two problems:
- MySQL doesn't index that character. I wasn't able to find out which
characters are exactly indexed by the full-text index and which not. I know
that characters with umlauts get indexed (search for "G"odel" for example)
but I wouldn't know about the ringel-S (sp?) for example. The easiest way to
find out is probably simply trying.
- The special characters in the articles were written by using entities. The
search doesn't know that "o and ö are the same.
Btw, if you have questions about the parse function for the search, just
ask. I'm very busy at the moment so I don't have time to do any real
programming, but answering a few questions should not be a problem.
-- Jan Hidders