Mailing List Archive: One Question -- Successful Deployments

One Question -- Successful Deployments

Jan 8, 2002, 8:40 AM

Post #1 of 5 (1143 views)

All,

I have one question in relation to Lucene. Can you give me the contact
information of 2+ individuals who have successfully deployed Lucene?

I want to ask them directly:
How fast is it?
How reliable is it?

-Joe Lerner

Note: I wrote code and lead small teams of developers for an internet
publishing company in downtown Raleigh, North Carolina.

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>

Re: One Question -- Successful Deployments [ In reply to ]

wdavies at cs

Jan 8, 2002, 1:33 PM

Post #2 of 5 (1121 views)

Permalink

Hi,

We're (Overture/Goto) evaluating Lucene ... email me specific questions.

In general I would say Lucene is very efficient. It is only about
30% slower than Thunderstone Texis
(which is a native C code base). Main difference is that Lucene
doesn't handle Caching as well as
Texis does.

Basically the Index is on Disk or in RAM (ie can take up 400-500 MB
in our application). Texis for example
is able to buffer what it can of the Index in memory without
explicit setting of memory limits.

Out of the box we couldn't use Phrase Matching for very high volume
transactions (we're looking at 1000s queries/sec)
and had to customize it to your needs, but because its Open Source,
guess what, you can write any kind
of optimizations you want. Actually that isn't fair -- just be
careful that you understand the performance
parameters involved in text retrieval and the various types of
querys that are possible. Do you need Text Retrieval
or Are you doing an unranked "Text Search" ?

Oh, and its free :)

Reliable ? Well I've never had a problem someone couldnt answer, and
it never crashes (ie its pretty bug-free
as far as I can tell).

Cheers,
Winton

Winton Davies
Lead Engineer, Overture (NSDQ: OVER)
1820 Gateway Drive, Suite 360
San Mateo, CA 94404
work: (650) 403-2259
cell: (650) 867-1598
http://www.overture.com/

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>

Re: One Question -- Successful Deployments [ In reply to ]

otis_gospodnetic at yahoo

Jan 10, 2002, 8:47 PM

Post #3 of 5 (1124 views)

Permalink

Hello,

Funny, I was just wondering how Lucene compares to Texis the other day.
Yes, I guess Lucene doesn't have any caching. Perhaps this could
easily be added by making use of one of many caching projects that seem
to be popping up under Jakarta (jakarta.apache.org).

Winston, if appropriate, could you share some of the changes you made
to Lucene to support the query rate that you mentioned?

Thanks,
Otis

--- Winton Davies <wdavies@cs.stanford.edu> wrote:
> Hi,
>
> We're (Overture/Goto) evaluating Lucene ... email me specific
> questions.
>
> In general I would say Lucene is very efficient. It is only about
> 30% slower than Thunderstone Texis
> (which is a native C code base). Main difference is that Lucene
> doesn't handle Caching as well as
> Texis does.
>
> Basically the Index is on Disk or in RAM (ie can take up 400-500 MB
>
> in our application). Texis for example
> is able to buffer what it can of the Index in memory without
> explicit setting of memory limits.
>
> Out of the box we couldn't use Phrase Matching for very high volume
>
> transactions (we're looking at 1000s queries/sec)
> and had to customize it to your needs, but because its Open Source,
>
> guess what, you can write any kind
> of optimizations you want. Actually that isn't fair -- just be
> careful that you understand the performance
> parameters involved in text retrieval and the various types of
> querys that are possible. Do you need Text Retrieval
> or Are you doing an unranked "Text Search" ?
>
>
> Oh, and its free :)
>
> Reliable ? Well I've never had a problem someone couldnt answer,
> and
> it never crashes (ie its pretty bug-free
> as far as I can tell).
o:lucene-user-help@jakarta.apache.org>
>

__________________________________________________
Do You Yahoo!?
Send FREE video emails in Yahoo! Mail!
http://promo.yahoo.com/videomail/

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>

Re: One Question -- Successful Deployments [ In reply to ]

wdavies at cs

Jan 11, 2002, 1:41 AM

Post #4 of 5 (1118 views)

Permalink

Heh,

Cool :-) In principle we might be able to, but it will be a while,
as our legal and biz dev will be involved.
However, I do believe everything I did was referred to by Dave as
some point. Most of the changes are pretty obvious if you run through
the code.

I'm about to do a bunch of benchmarking (maybe 2 weeks?) on Linux
and Solaris, of Texis and Lucene, in 4 different configurations
(weighted and unweighted, sloppy phrase match and conjunctive). I'll
post a summary :)

A lot about optimizing Lucene involves taming GC with RAMdirectory.
I would say that using RAMdirectory is a huge saving.
Minimize fields -- have one indexed, tokenized, not stored, one with
the "content" as a monolithic field (parse it afterwards). Write a
custom Hit Collector if appropriate. Minimize classes, stick with
Java builtins as much as possible.

There are other considerations choosing between texis and lucene --
cost(!!) and caching (as I said). Memory maxes out at 4GB on most
normal boxes, so if you can't fit your document base and index in
<4GB, then you need the caching.

Winton

>Hello,
>
>Funny, I was just wondering how Lucene compares to Texis the other day.
>Yes, I guess Lucene doesn't have any caching. Perhaps this could
>easily be added by making use of one of many caching projects that seem
>to be popping up under Jakarta (jakarta.apache.org).
>
>Winston, if appropriate, could you share some of the changes you made
>to Lucene to support the query rate that you mentioned?
>
>Thanks,
>Otis
>
>
>--- Winton Davies <wdavies@cs.stanford.edu> wrote:
>> Hi,
>>
>> We're (Overture/Goto) evaluating Lucene ... email me specific
>> questions.
>>
>> In general I would say Lucene is very efficient. It is only about
>> 30% slower than Thunderstone Texis
>> (which is a native C code base). Main difference is that Lucene
>> doesn't handle Caching as well as
>> Texis does.
>>
>> Basically the Index is on Disk or in RAM (ie can take up 400-500 MB
>>
>> in our application). Texis for example
>> is able to buffer what it can of the Index in memory without
>> explicit setting of memory limits.
>>
>> Out of the box we couldn't use Phrase Matching for very high volume
>>
>> transactions (we're looking at 1000s queries/sec)
>> and had to customize it to your needs, but because its Open Source,
>>
>> guess what, you can write any kind
>> of optimizations you want. Actually that isn't fair -- just be
>> careful that you understand the performance
>> parameters involved in text retrieval and the various types of
>> querys that are possible. Do you need Text Retrieval
>> or Are you doing an unranked "Text Search" ?
>>
>>
>> Oh, and its free :)
>>
>> Reliable ? Well I've never had a problem someone couldnt answer,
>> and
>> it never crashes (ie its pretty bug-free
>> as far as I can tell).
>o:lucene-user-help@jakarta.apache.org>
>>
>
>
>__________________________________________________
>Do You Yahoo!?
>Send FREE video emails in Yahoo! Mail!
>http://promo.yahoo.com/videomail/
>
>--
>To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
>For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>

--

Winton Davies
Lead Engineer, Overture (NSDQ: OVER)
1820 Gateway Drive, Suite 360
San Mateo, CA 94404
work: (650) 403-2259
cell: (650) 867-1598
http://www.overture.com/

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>

Re: One Question -- Successful Deployments [ In reply to ]

wdavies at cs

Jan 11, 2002, 11:02 AM

Post #5 of 5 (1122 views)

Permalink

Oh, one final thing, try different JVMs. I've found the best so far
(short of going to the bleeding edge of 1.4) is the 1.3.1_10 jvm.
This is Hotspot, and supports the -server option, as well as very
sophisticated GC controls.

It does make a difference against 1.2 and 1.3 previous versions.

Cheers,
Winton
--

Winton Davies
Lead Engineer, Overture (NSDQ: OVER)
1820 Gateway Drive, Suite 360
San Mateo, CA 94404
work: (650) 403-2259
cell: (650) 867-1598
http://www.overture.com/

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>