Mailing List Archive

Lucene's scalability
Is there a known limit to the number of documents that Lucene can handle
efficiently? I'm looking to index around 15 million, 2K docs which contain
7-10 searchable fields. Should I be attempting this with Lucene?

Thanks,

Joel
RE: Lucene's scalability [ In reply to ]
I currently have an index of ~ 12 million documents, which are each about
that size (but in xml form).

When they are transformed for lucene to index, there are upwards of 50
searchable fields.

The index is about 10 GB right now.

I have not yet had any problems with "pushing the limits" of lucene.

In the next few weeks, I will be pushing my number of indexed documents up
into the 15-20 million range. I can let you know if any problems arise.

Dan



-----Original Message-----
From: Joel Bernstein [mailto:j.bernstein@ei.org]
Sent: Monday, April 29, 2002 1:32 PM
To: lucene-user@jakarta.apache.org
Subject: Lucene's scalability


Is there a known limit to the number of documents that Lucene can handle
efficiently? I'm looking to index around 15 million, 2K docs which contain
7-10 searchable fields. Should I be attempting this with Lucene?

Thanks,

Joel


--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
Re: Lucene's scalability [ In reply to ]
Great,

Thanks for the quick response, I am very interested in hearing
how lucene handles itself in the 15-20 million doc range.

I will be doing some testing this week with lucene and will report my
findings as
well. I am also testing FAST and AltaVista and I will post some
comparison details. I would be very happy to find that we did not need to
buy a commercial engine because Lucene could do the job.

Joel

----- Original Message -----
From: "Armbrust, Daniel C." <Armbrust.Daniel@mayo.edu>
To: "'Lucene Users List'" <lucene-user@jakarta.apache.org>
Sent: Monday, April 29, 2002 2:37 PM
Subject: RE: Lucene's scalability


> I currently have an index of ~ 12 million documents, which are each about
> that size (but in xml form).
>
> When they are transformed for lucene to index, there are upwards of 50
> searchable fields.
>
> The index is about 10 GB right now.
>
> I have not yet had any problems with "pushing the limits" of lucene.
>
> In the next few weeks, I will be pushing my number of indexed documents up
> into the 15-20 million range. I can let you know if any problems arise.
>
> Dan
>
>
>
> -----Original Message-----
> From: Joel Bernstein [mailto:j.bernstein@ei.org]
> Sent: Monday, April 29, 2002 1:32 PM
> To: lucene-user@jakarta.apache.org
> Subject: Lucene's scalability
>
>
> Is there a known limit to the number of documents that Lucene can handle
> efficiently? I'm looking to index around 15 million, 2K docs which
contain
> 7-10 searchable fields. Should I be attempting this with Lucene?
>
> Thanks,
>
> Joel
>
>
> --
> To unsubscribe, e-mail:
<mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
<mailto:lucene-user-help@jakarta.apache.org>
>


--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
Re: Lucene's scalability [ In reply to ]
Hi ,

I am also trying to do a similar thing . I am very eager to know what kind
of hardware u are using to maintain such a big index.
In my case it is very important that the search happens very fast . so does
such a big index of 10 gb pose any problems in this direction

TIA

Regards
Harpreet



----- Original Message -----
From: "Armbrust, Daniel C." <Armbrust.Daniel@mayo.edu>
To: "'Lucene Users List'" <lucene-user@jakarta.apache.org>
Sent: Tuesday, April 30, 2002 12:07 AM
Subject: RE: Lucene's scalability


> I currently have an index of ~ 12 million documents, which are each about
> that size (but in xml form).
>
> When they are transformed for lucene to index, there are upwards of 50
> searchable fields.
>
> The index is about 10 GB right now.
>
> I have not yet had any problems with "pushing the limits" of lucene.
>
> In the next few weeks, I will be pushing my number of indexed documents up
> into the 15-20 million range. I can let you know if any problems arise.
>
> Dan
>
>
>
> -----Original Message-----
> From: Joel Bernstein [mailto:j.bernstein@ei.org]
> Sent: Monday, April 29, 2002 1:32 PM
> To: lucene-user@jakarta.apache.org
> Subject: Lucene's scalability
>
>
> Is there a known limit to the number of documents that Lucene can handle
> efficiently? I'm looking to index around 15 million, 2K docs which
contain
> 7-10 searchable fields. Should I be attempting this with Lucene?
>
> Thanks,
>
> Joel
>
>
> --
> To unsubscribe, e-mail:
<mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
<mailto:lucene-user-help@jakarta.apache.org>


--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
RE: Lucene's scalability [ In reply to ]
Currently, we are using a Ultra-80 Sparc Solaris with 4 processors and 4 GB
of Ram.

However, we are only making use of one of those processors with the index.
Our biggest speed restriction is the fact that our entire index resides on a
single disk drive. We have a raid array coming soon.

The performance has been very impressive, but as you can imagine, the speed
depends highly on the complexity of the query. If you run a query with a
1/2 a dozen terms and fields, which returns ~30,000 results, it usually
takes on the order of a second or two. If you run a query with 50-60 terms,
it may take 5-6 seconds.

I don't have any better performance stats than this currently.

Dan


-----Original Message-----
From: Harpreet S Walia [mailto:harpreet@sansuisoftware.com]
Sent: Friday, May 17, 2002 7:23 AM
To: Lucene Users List
Subject: Re: Lucene's scalability


Hi ,

I am also trying to do a similar thing . I am very eager to know what kind
of hardware u are using to maintain such a big index.
In my case it is very important that the search happens very fast . so does
such a big index of 10 gb pose any problems in this direction

TIA

Regards
Harpreet



----- Original Message -----
From: "Armbrust, Daniel C." <Armbrust.Daniel@mayo.edu>
To: "'Lucene Users List'" <lucene-user@jakarta.apache.org>
Sent: Tuesday, April 30, 2002 12:07 AM
Subject: RE: Lucene's scalability


> I currently have an index of ~ 12 million documents, which are each about
> that size (but in xml form).
>
> When they are transformed for lucene to index, there are upwards of 50
> searchable fields.
>
> The index is about 10 GB right now.
>
> I have not yet had any problems with "pushing the limits" of lucene.
>
> In the next few weeks, I will be pushing my number of indexed documents up
> into the 15-20 million range. I can let you know if any problems arise.
>
> Dan
>
>
>
> -----Original Message-----
> From: Joel Bernstein [mailto:j.bernstein@ei.org]
> Sent: Monday, April 29, 2002 1:32 PM
> To: lucene-user@jakarta.apache.org
> Subject: Lucene's scalability
>
>
> Is there a known limit to the number of documents that Lucene can handle
> efficiently? I'm looking to index around 15 million, 2K docs which
contain
> 7-10 searchable fields. Should I be attempting this with Lucene?
>
> Thanks,
>
> Joel
>
>
> --
> To unsubscribe, e-mail:
<mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
<mailto:lucene-user-help@jakarta.apache.org>


--
To unsubscribe, e-mail:
<mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail:
<mailto:lucene-user-help@jakarta.apache.org>

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
Re: Lucene's scalability [ In reply to ]
you didn't mention the hit count on the query with 50-60 terms.
just wondering if the time was linear.

----- Original Message -----
From: Armbrust, Daniel C. <Armbrust.Daniel@mayo.edu>
To: 'Lucene Users List' <lucene-user@jakarta.apache.org>
Sent: Friday, May 17, 2002 6:57 AM
Subject: RE: Lucene's scalability


> Currently, we are using a Ultra-80 Sparc Solaris with 4 processors and 4
GB
> of Ram.
>
> However, we are only making use of one of those processors with the index.
> Our biggest speed restriction is the fact that our entire index resides on
a
> single disk drive. We have a raid array coming soon.
>
> The performance has been very impressive, but as you can imagine, the
speed
> depends highly on the complexity of the query. If you run a query with a
> 1/2 a dozen terms and fields, which returns ~30,000 results, it usually
> takes on the order of a second or two. If you run a query with 50-60
terms,
> it may take 5-6 seconds.
>
> I don't have any better performance stats than this currently.
>
> Dan
>
>
> -----Original Message-----
> From: Harpreet S Walia [mailto:harpreet@sansuisoftware.com]
> Sent: Friday, May 17, 2002 7:23 AM
> To: Lucene Users List
> Subject: Re: Lucene's scalability
>
>
> Hi ,
>
> I am also trying to do a similar thing . I am very eager to know what kind
> of hardware u are using to maintain such a big index.
> In my case it is very important that the search happens very fast . so
does
> such a big index of 10 gb pose any problems in this direction
>
> TIA
>
> Regards
> Harpreet
>
>
>
> ----- Original Message -----
> From: "Armbrust, Daniel C." <Armbrust.Daniel@mayo.edu>
> To: "'Lucene Users List'" <lucene-user@jakarta.apache.org>
> Sent: Tuesday, April 30, 2002 12:07 AM
> Subject: RE: Lucene's scalability
>
>
> > I currently have an index of ~ 12 million documents, which are each
about
> > that size (but in xml form).
> >
> > When they are transformed for lucene to index, there are upwards of 50
> > searchable fields.
> >
> > The index is about 10 GB right now.
> >
> > I have not yet had any problems with "pushing the limits" of lucene.
> >
> > In the next few weeks, I will be pushing my number of indexed documents
up
> > into the 15-20 million range. I can let you know if any problems arise.
> >
> > Dan
> >
> >
> >
> > -----Original Message-----
> > From: Joel Bernstein [mailto:j.bernstein@ei.org]
> > Sent: Monday, April 29, 2002 1:32 PM
> > To: lucene-user@jakarta.apache.org
> > Subject: Lucene's scalability
> >
> >
> > Is there a known limit to the number of documents that Lucene can handle
> > efficiently? I'm looking to index around 15 million, 2K docs which
> contain
> > 7-10 searchable fields. Should I be attempting this with Lucene?
> >
> > Thanks,
> >
> > Joel
> >
> >
> > --
> > To unsubscribe, e-mail:
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> > For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>
>
>
> --
> To unsubscribe, e-mail:
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>
>
> --
> To unsubscribe, e-mail:
<mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
<mailto:lucene-user-help@jakarta.apache.org>
>



--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
RE: Lucene's scalability [ In reply to ]
In my experience the time it takes depends much more on the complexity of
the query, rather than the number of results returned. If I am making a
query with 50-60 terms, I am usually getting down to a couple thousand or
less results.

Dan


-----Original Message-----
From: CNew [mailto:cnew@fuse.net]
Sent: Monday, May 20, 2002 8:46 AM
To: Lucene Users List
Subject: Re: Lucene's scalability


you didn't mention the hit count on the query with 50-60 terms.
just wondering if the time was linear.

----- Original Message -----
From: Armbrust, Daniel C. <Armbrust.Daniel@mayo.edu>
To: 'Lucene Users List' <lucene-user@jakarta.apache.org>
Sent: Friday, May 17, 2002 6:57 AM
Subject: RE: Lucene's scalability


> Currently, we are using a Ultra-80 Sparc Solaris with 4 processors and 4
GB
> of Ram.
>
> However, we are only making use of one of those processors with the index.
> Our biggest speed restriction is the fact that our entire index resides on
a
> single disk drive. We have a raid array coming soon.
>
> The performance has been very impressive, but as you can imagine, the
speed
> depends highly on the complexity of the query. If you run a query with a
> 1/2 a dozen terms and fields, which returns ~30,000 results, it usually
> takes on the order of a second or two. If you run a query with 50-60
terms,
> it may take 5-6 seconds.
>
> I don't have any better performance stats than this currently.
>
> Dan
>
>
> -----Original Message-----
> From: Harpreet S Walia [mailto:harpreet@sansuisoftware.com]
> Sent: Friday, May 17, 2002 7:23 AM
> To: Lucene Users List
> Subject: Re: Lucene's scalability
>
>
> Hi ,
>
> I am also trying to do a similar thing . I am very eager to know what kind
> of hardware u are using to maintain such a big index.
> In my case it is very important that the search happens very fast . so
does
> such a big index of 10 gb pose any problems in this direction
>
> TIA
>
> Regards
> Harpreet
>
>
>
> ----- Original Message -----
> From: "Armbrust, Daniel C." <Armbrust.Daniel@mayo.edu>
> To: "'Lucene Users List'" <lucene-user@jakarta.apache.org>
> Sent: Tuesday, April 30, 2002 12:07 AM
> Subject: RE: Lucene's scalability
>
>
> > I currently have an index of ~ 12 million documents, which are each
about
> > that size (but in xml form).
> >
> > When they are transformed for lucene to index, there are upwards of 50
> > searchable fields.
> >
> > The index is about 10 GB right now.
> >
> > I have not yet had any problems with "pushing the limits" of lucene.
> >
> > In the next few weeks, I will be pushing my number of indexed documents
up
> > into the 15-20 million range. I can let you know if any problems arise.
> >
> > Dan
> >
> >
> >
> > -----Original Message-----
> > From: Joel Bernstein [mailto:j.bernstein@ei.org]
> > Sent: Monday, April 29, 2002 1:32 PM
> > To: lucene-user@jakarta.apache.org
> > Subject: Lucene's scalability
> >
> >
> > Is there a known limit to the number of documents that Lucene can handle
> > efficiently? I'm looking to index around 15 million, 2K docs which
> contain
> > 7-10 searchable fields. Should I be attempting this with Lucene?
> >
> > Thanks,
> >
> > Joel
> >
> >
> > --
> > To unsubscribe, e-mail:
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> > For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>
>
>
> --
> To unsubscribe, e-mail:
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>
>
> --
> To unsubscribe, e-mail:
<mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
<mailto:lucene-user-help@jakarta.apache.org>
>



--
To unsubscribe, e-mail:
<mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail:
<mailto:lucene-user-help@jakarta.apache.org>

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>