Mailing List Archive

Remote ParallelMultiSearcher hardware and performance
Hi,



We are interested in scaling up our use of Lucene to a really massive amount
of data (in the 1-2 terabyte range). We have multiple machines, each with
dual Xeon processors, but up to this point we have not been using Remote
ParallelMultiSearcher, but instead doing round-robin load balancing where
each machine handles the entire query. That isn't going to cut it as we
expand the amount of data we have. I'm wondering if anyone could provide
insights as to the kind of performance they get using Remote
ParallelMultiSearcher on large data collections across multiple machines,
what the limiting factors are (e.g., CPU versus disk access versus network
speed), and what hardware works best (primarily interested in knowing if
anyone has experience with AMD versus Intel versus Sun).



Thanks,

James