Mailing List Archive

Best Lucene hardware
Hi,



I'm wondering if someone familiar with the way Lucene accesses data could
give their opinion on whether hard drive seek time or throughput is more
important in Lucene performance, assuming a very large index that cannot fit
in RAM. I'm looking at buying some new servers that will be running Lucene,
and wonder if I should go with SCSI RAID, or if perhaps spending the extra
money on processors (and going with SATA for drives) is better. I'm not
sure where the bottleneck is in an average system, and I don't have any SCSI
RAID systems available for testing.



Thanks,

James
Re: Best Lucene hardware [ In reply to ]
hello Mr.james,

u can get some info from the following link...

http://lucene.apache.org/java/docs/benchmarks.html




James <james@ryley.com> wrote:
Hi,



I'm wondering if someone familiar with the way Lucene accesses data could
give their opinion on whether hard drive seek time or throughput is more
important in Lucene performance, assuming a very large index that cannot fit
in RAM. I'm looking at buying some new servers that will be running Lucene,
and wonder if I should go with SCSI RAID, or if perhaps spending the extra
money on processors (and going with SATA for drives) is better. I'm not
sure where the bottleneck is in an average system, and I don't have any SCSI
RAID systems available for testing.



Thanks,

James




Enduringly your's,
V.Sivanarul.,M.Tech.





---------------------------------
Relax. Yahoo! Mail virus scanning helps detect nasty viruses!
RE: Best Lucene hardware [ In reply to ]
Hi,

Thanks for the info. Unfortunately, most of that has to do with indexing,
whereas I am concerned with retrieval speed. And, there really isn't enough
information there to make good comparisons -- there are several completely
different systems with no way to pin down what the important changes in
hardware are. But, thanks for the link!

Sincerely,
James

> -----Original Message-----
> From: sivan v [mailto:sivanarul_v@yahoo.com]
> Sent: Sunday, February 05, 2006 9:47 AM
> To: general@lucene.apache.org
> Subject: Re: Best Lucene hardware
>
> hello Mr.james,
>
> u can get some info from the following link...
>
> http://lucene.apache.org/java/docs/benchmarks.html
RE: Best Lucene hardware [ In reply to ]
Dear James,

I recently had the same question, but no definitive answer to offer.

I guess that throughput/access time requirements depend on:
a) document size (the larger the document, the more the throughput might
be important)
b) how many documents you want to actually read (only a few to display
them, or all to do some processing with them)
If you want to read many documents, seek time becomes more
important

My best guess is that access time is more important for you, unless you
store only very few very large documents.

Of course you should look for native command queuing discs (the disc may
reorder the read commands to reduce seek time).

Another option (if your memory requirements are not so huge) : Solid state
disk, see e.g.
http://techreport.com/reviews/2006q1/gigabyte-iram/index.x?pg=7

The second version shall support up to 16Gbyte, see
http://www.vr-zone.com.sg/?i=3052

Best regards,

Wolfgang









"James" <james@ryley.com>
05-02-2006 18:12
Please respond to
general@lucene.apache.org


To
<general@lucene.apache.org>
cc

Subject
RE: Best Lucene hardware






Hi,

Thanks for the info. Unfortunately, most of that has to do with indexing,
whereas I am concerned with retrieval speed. And, there really isn't
enough
information there to make good comparisons -- there are several completely
different systems with no way to pin down what the important changes in
hardware are. But, thanks for the link!

Sincerely,
James

> -----Original Message-----
> From: sivan v [mailto:sivanarul_v@yahoo.com]
> Sent: Sunday, February 05, 2006 9:47 AM
> To: general@lucene.apache.org
> Subject: Re: Best Lucene hardware
>
> hello Mr.james,
>
> u can get some info from the following link...
>
> http://lucene.apache.org/java/docs/benchmarks.html
RE: Best Lucene hardware [ In reply to ]
Thanks for the feedback. I saw those solid-state hard drives, and those are
definitely an interesting option if I am I/O limited. But, I suspect that I
am CPU limited, which (ironically, after all the investigation that I have
done), seems to make commodity server farms the best option.

Thanks,
James

> Dear James,
>
> I recently had the same question, but no definitive answer to offer.
>
> I guess that throughput/access time requirements depend on:
> a) document size (the larger the document, the more the throughput might
> be important)
> b) how many documents you want to actually read (only a few to display
> them, or all to do some processing with them)
> If you want to read many documents, seek time becomes more
> important
>
> My best guess is that access time is more important for you, unless you
> store only very few very large documents.
>
> Of course you should look for native command queuing discs (the disc may
> reorder the read commands to reduce seek time).
>
> Another option (if your memory requirements are not so huge) : Solid state
> disk, see e.g.
> http://techreport.com/reviews/2006q1/gigabyte-iram/index.x?pg=7
>
> The second version shall support up to 16Gbyte, see
> http://www.vr-zone.com.sg/?i=3052
>
> Best regards,
>
> Wolfgang