Hi list!
This message popped up as a result of investigation
why the new suite for Polish Wikipedia has so unpredictable
search results when words with non latin1 chars are searched.
But this is valid for Meta as well - which is UTF-8
as well (right?). Japanese, Korean, etc Wikipedias
are endagered as well, but havent' tested that.
I focused on three letter words, what is theoretically legal.
Three letter words (3LW) with one Polish language
specific letter (PLSL) can be searched,
although many silly matches are found.
3LW with two PLSL search fails with the prompt
"Badly formed search query"
3LW with three PLSL - the same as with one PLSL (strange!).
Exactly the same happens when I put Chinese ideograms
instead of PLSL.
It seems that word lengths aren't recognized correctly
("Badly formed..." message is shown when I enter 1-letter word!)
and/or words containing PLSL (or ideograms) are split somehow
in a strange way.
Based on search results my impression is:
non-latin1 letters are treated sometimes as separators
and sometimes as wildcards.
I guess that other bugs in searching non-latin1 words
can be reduced to this.
It would be nice for Polish users to have it fixed before
upgrade to Phase III. Other non-latin1 users would be
perhaps happier too :-)
User:Youandme
----------------------------------------------------------------------
Najlepsi nie maja watpliwosci... >>> http://link.interia.pl/f1667
This message popped up as a result of investigation
why the new suite for Polish Wikipedia has so unpredictable
search results when words with non latin1 chars are searched.
But this is valid for Meta as well - which is UTF-8
as well (right?). Japanese, Korean, etc Wikipedias
are endagered as well, but havent' tested that.
I focused on three letter words, what is theoretically legal.
Three letter words (3LW) with one Polish language
specific letter (PLSL) can be searched,
although many silly matches are found.
3LW with two PLSL search fails with the prompt
"Badly formed search query"
3LW with three PLSL - the same as with one PLSL (strange!).
Exactly the same happens when I put Chinese ideograms
instead of PLSL.
It seems that word lengths aren't recognized correctly
("Badly formed..." message is shown when I enter 1-letter word!)
and/or words containing PLSL (or ideograms) are split somehow
in a strange way.
Based on search results my impression is:
non-latin1 letters are treated sometimes as separators
and sometimes as wildcards.
I guess that other bugs in searching non-latin1 words
can be reduced to this.
It would be nice for Polish users to have it fixed before
upgrade to Phase III. Other non-latin1 users would be
perhaps happier too :-)
User:Youandme
----------------------------------------------------------------------
Najlepsi nie maja watpliwosci... >>> http://link.interia.pl/f1667