Mailing List Archive

recommended index size
Hello,

is there a recommended / rule of thumb maximum size for index?
I try to target between 50 and 100 Gb, before spreading to other servers.
or is this just a matter of how much memory and cpu I have?
this is a log aggregation use case. a lot of write, smaller number of reads
obviously.
I am using lucene 9.
thanks,
Vincent
Re: recommended index size [ In reply to ]
Hi Vincent,

Lucene has a hard limit of ~2.1 B documents in a single index; hopefully
you hit the ~50 - 100 GB limit well before that.

Otherwise it's very application dependent: how much latency can you
tolerate during searching, how fast are the underlying IO devices at random
and large sequential IO, the types of queries, etc.

Lucene should not require much additional RAM as the index gets larger --
much work has been done in recent years to move data structures off-heap.

Mike McCandless

http://blog.mikemccandless.com


On Tue, Jan 2, 2024 at 9:49?AM <vvsevel@gmail.com> wrote:

> Hello,
>
> is there a recommended / rule of thumb maximum size for index?
> I try to target between 50 and 100 Gb, before spreading to other servers.
> or is this just a matter of how much memory and cpu I have?
> this is a log aggregation use case. a lot of write, smaller number of reads
> obviously.
> I am using lucene 9.
> thanks,
> Vincent
>
Re: recommended index size [ In reply to ]
Hi Vincent,

My 2 cents:

We had a production environment with ~250g and ~1M docs with static + dynamic fields in Solr (afair lucene 7) with a machine having 4GB for the jvm and (afair) a little bit more maybe 6GB OS ‚cache‘.
In peak times (re-index) we had 10-15k updates / minute and (partially) complex queries up to 50/sec per jvm. At those times our servers still had rotating discs.

In this setup we did not experience any performance issues when we did not had bugs / misconfigurations.

We were thinking of sharding / splitting indexes, but did not do it due to complexity of maintaining those later - AND especially - there was NO NEED at all.

Elasticsearch/Solr started to do it out of the box at that time. Maybe Kibana/ELK or such is a thing to look at too.

Cheers from Berlin, Ralf

Von meinem Telefon gesendet, etwaige Rechtschreibfehler kann ich nicht ausschliessen

> Am 04.01.2024 um 17:32 schrieb Michael McCandless <lucene@mikemccandless.com>:
>
> ?Hi Vincent,
>
> Lucene has a hard limit of ~2.1 B documents in a single index; hopefully
> you hit the ~50 - 100 GB limit well before that.
>
> Otherwise it's very application dependent: how much latency can you
> tolerate during searching, how fast are the underlying IO devices at random
> and large sequential IO, the types of queries, etc.
>
> Lucene should not require much additional RAM as the index gets larger --
> much work has been done in recent years to move data structures off-heap.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
>> On Tue, Jan 2, 2024 at 9:49?AM <vvsevel@gmail.com> wrote:
>>
>> Hello,
>>
>> is there a recommended / rule of thumb maximum size for index?
>> I try to target between 50 and 100 Gb, before spreading to other servers.
>> or is this just a matter of how much memory and cpu I have?
>> this is a log aggregation use case. a lot of write, smaller number of reads
>> obviously.
>> I am using lucene 9.
>> thanks,
>> Vincent
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org