Mailing List Archive: Indexing too slow

I profiled the Lucene indexing using java profile
option looks like while indexing, it spends around 80%
of its time in StandardTokenizerManager.java and eats
up practically all the CPU on my 1GHz machine. It
takes around 17 Minutes to index 140 MB of plain text
thats around 8MB/ minute. I think its too slow.

Specially when we just want to tokenize based on white
spaces and some standard delimiters, I want to speed
it up to change by changing the grammer.

Just wondering if anybody has done any test in this
area. Or have some other clues as to how to speed it
up. I am wondering if we should use YACC instead of
javacc and call it thgough JNI although It shouldn't
matter because of hotspot but you nver know

-Manish

__________________________________________________
Do You Yahoo!?
Yahoo! Health - Feel better, live better
http://health.yahoo.com

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>