Mailing List Archive

Unpredictable search results
Hi list!

This message popped up as a result of investigation
why the new suite for Polish Wikipedia has so unpredictable
search results when words with non latin1 chars are searched.
But this is valid for Meta as well - which is UTF-8
as well (right?). Japanese, Korean, etc Wikipedias
are endagered as well, but havent' tested that.

I focused on three letter words, what is theoretically legal.

Three letter words (3LW) with one Polish language
specific letter (PLSL) can be searched,
although many silly matches are found.

3LW with two PLSL search fails with the prompt
"Badly formed search query"

3LW with three PLSL - the same as with one PLSL (strange!).

Exactly the same happens when I put Chinese ideograms
instead of PLSL.

It seems that word lengths aren't recognized correctly
("Badly formed..." message is shown when I enter 1-letter word!)
and/or words containing PLSL (or ideograms) are split somehow
in a strange way.
Based on search results my impression is:
non-latin1 letters are treated sometimes as separators
and sometimes as wildcards.

I guess that other bugs in searching non-latin1 words
can be reduced to this.

It would be nice for Polish users to have it fixed before
upgrade to Phase III. Other non-latin1 users would be
perhaps happier too :-)

User:Youandme


----------------------------------------------------------------------
Najlepsi nie maja watpliwosci... >>> http://link.interia.pl/f1667
Re: [Intlwiki-l] Unpredictable search results [ In reply to ]
youandme@poczta.fm wrote:
...
> It would be nice for Polish users to have it fixed before
> upgrade to Phase III. Other non-latin1 users would be
> perhaps happier too :-)

Yes, that's the main reason the Polish wiki isn't upgraded yet, and the
secondary reason the Esperanto wiki isn't upgraded yet (we also need a
charset conversion, since Esperanto keyboard support is not as widely
available as one might hope).

I seem to remember I had this sort-of working on an experimental version
of the phase II software, I just haven't had a chance to fix it up
again. I'll bang on it some more...

-- brion vibebr (brion @ pobox.com)