Mailing List Archive

Sharing buffer between large number of IndexWriters?
Hi,
I want to create a separate index per tenant in application. It is due to
both strong data separation requirements as well as query performance
(active tenants with large indices affect others). The number of active
IndexWriters would go into a few thousands. One of the concerns that rises
is RAM buffers needed by IndexWritters, as even a few MBs of buffer per
writer translates into heavy GBs of RAM.

Is there any way to give all IndexWriters one cumulative limit of RAM so
that they can share it proportionally to their traffic?

Thank you,
Marcin
Re: Sharing buffer between large number of IndexWriters? [ In reply to ]
Hello Marcin,

Alas, Lucene does not have this capability out of the box.

However, you are able to live-update the
IndexWriterConfig.setRAMBufferSizeMB, and the change should take effect on
the next document indexed in that IndexWriter instance. So you could build
your own "proportional RAM" on top of that.

But I would worry about the little not-accounted-for RAM that IndexWriter
uses ... summed across a few thousand instances that might start to matter.

When there are no merges running, IndexWriter should be quick to close and
re-open; maybe you want to do that more aggressively.

Mike McCandless

http://blog.mikemccandless.com


On Tue, Jun 16, 2020 at 9:25 AM Marcin Okraszewski <okrasz@gmail.com> wrote:

> Hi,
> I want to create a separate index per tenant in application. It is due to
> both strong data separation requirements as well as query performance
> (active tenants with large indices affect others). The number of active
> IndexWriters would go into a few thousands. One of the concerns that rises
> is RAM buffers needed by IndexWritters, as even a few MBs of buffer per
> writer translates into heavy GBs of RAM.
>
> Is there any way to give all IndexWriters one cumulative limit of RAM so
> that they can share it proportionally to their traffic?
>
> Thank you,
> Marcin
>
Re: Sharing buffer between large number of IndexWriters? [ In reply to ]
Hi Marcin,

I’m working on a somewhat similar problem in Solr, also with the goal to better handle multi-tenant Solr clusters (SOLR-13579). It’s probably not directly applicable to your scenario but one costly lesson that I learned (obvious in hindsight ;) ) is that when things happen not instantaneously but over time, a number of “interesting” dynamic aspects come into play…

You may think that “proportional allotment relative to traffic” sounds like a simple formula but I can assure you it isn’t - as soon as you start considering how actually things happen over time: how you measure the traffic rate (time window? exponentially decaying? with what ratio? sampled how often?) and the delays between the adjustment of the controlled parameter (RAM size) and the change in your monitored values, and the delay until that change is reflected in your metrics. This is a classical control theory problem of tuning a feedback loop, and you can use a PID controller to manage it - but even then it’s far from simple, many thick volumes have been written on PID tuning…



Andrzej Bia?ecki

> On 22 Jun 2020, at 16:27, Michael McCandless <lucene@mikemccandless.com> wrote:
>
> Hello Marcin,
>
> Alas, Lucene does not have this capability out of the box.
>
> However, you are able to live-update the
> IndexWriterConfig.setRAMBufferSizeMB, and the change should take effect on
> the next document indexed in that IndexWriter instance. So you could build
> your own "proportional RAM" on top of that.
>
> But I would worry about the little not-accounted-for RAM that IndexWriter
> uses ... summed across a few thousand instances that might start to matter.
>
> When there are no merges running, IndexWriter should be quick to close and
> re-open; maybe you want to do that more aggressively.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Tue, Jun 16, 2020 at 9:25 AM Marcin Okraszewski <okrasz@gmail.com> wrote:
>
>> Hi,
>> I want to create a separate index per tenant in application. It is due to
>> both strong data separation requirements as well as query performance
>> (active tenants with large indices affect others). The number of active
>> IndexWriters would go into a few thousands. One of the concerns that rises
>> is RAM buffers needed by IndexWritters, as even a few MBs of buffer per
>> writer translates into heavy GBs of RAM.
>>
>> Is there any way to give all IndexWriters one cumulative limit of RAM so
>> that they can share it proportionally to their traffic?
>>
>> Thank you,
>> Marcin
>>