Mailing List Archive: Question) Unicode AND Sorting

Question) Unicode AND Sorting

Aug 3, 2006, 7:02 AM

Post #1 of 4 (554 views)

hi.

I have two questions.

1) How can I indexing unicode(utf-8) text?

2) When I use sort by field value?

Sorry my poor English and thank you for KinoSearch.

Question) Unicode AND Sorting [ In reply to ]

marvin at rectangular

Aug 3, 2006, 1:29 PM

Post #2 of 4 (543 views)

Permalink

On Aug 3, 2006, at 6:49 AM, ???? wrote:

> 1) How can I indexing unicode(utf-8) text?

I was going to say, "the same way you handle regular text", but I've
just realized that the TokenBatch class is not preserving the UTF-8
flag of the scalars that it's derived from -- and therefore, all of
KinoSearch's Analyzers function in a non-UTF-8 context. :( So right
this moment the only way to do it is to write your own Tokenizer class.

I'm slammed putting out fires for my main client right now and can't
work on this today, but fixing this behavior is a high priority. The
fix will be to have the TokenBatch absorb the UTF8 flag of the latest
scalar that gets assigned to it. After that, the regular expressions
in KinoSearch's Tokenizer will adapt themselves and function either
in a UTF-8 context or not depending on the input.

> 2) When I use sort by field value?

This is only possible at present using a somewhat inefficient hack
that violates KinoSearch's public API.

Sorry,

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/

Question) Unicode AND Sorting [ In reply to ]

henka at cityweb

Aug 4, 2006, 1:11 AM

Post #3 of 4 (542 views)

Permalink

Welcome back Marvin!

How about posting the text of your OSCON presentation to the list?

Cheers
Henk

Question) Unicode AND Sorting [ In reply to ]

marvin at rectangular

Aug 4, 2006, 1:18 AM

Post #4 of 4 (546 views)

Permalink

On Aug 4, 2006, at 1:15 AM, henka@cityweb.co.za wrote:

>
> Welcome back Marvin!
>
> How about posting the text of your OSCON presentation to the list?

Text? We can do better. :)

A PDF is available from the KinoSearch homepage.

http://www.rectangular.com/kinosearch/

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/