Mailing List Archive

setRAMBufferSizeMB vs. setMaxBufferedDocs
Just thinking loud - would the API be more friendly/simple API if
maxBufDocs/Dels limits to coexist with maxBytes limit?

They seem to serve two different purposes:
- setRAMBufferSizeMB controls memory: "how much memory can be used".
- setMaxBufferedDeleteTerms and setMaxBufferedDocs control freshness: "how
many index modifications are too many to stale".

If a third setMaxStaleDelay(millis) was added it would fall into the second
group of course: "freshness".

On one hand this can always be done by the application.

On the other, the logic of "use memory-limit unless added-docs-limit was
specified" seems somewhat confusing (why only by pending adds, why not also
by pending deletions?). In addition, I saw the old use of maxBufferredDocs
for controlling memory as a compromise. Now that setRAMBufferSizeMB is
available, applications can benefit from both.

(Perhaps the reason is that this would somehow break a merge behavior in a
way that I do not understand?)


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: setRAMBufferSizeMB vs. setMaxBufferedDocs [ In reply to ]
Hi Doron,

> On the other, the logic of "use memory-limit unless added-docs-limit was
> specified" seems somewhat confusing

The design intention is to use either
maxBufferedDocs/maxBufferedDeleteTerms or ramBufferSize, but not both
at the same time.

> (why only by pending adds, why not also by pending deletions?).

You are right. MaxBufferedDocs and maxBufferedDeleteTerms should be
used in a similar way, but they are not - maxBufferedDocs cannot be
used with ramBufferSize while maxBufferedDeleteTerms can. I'll open an
issue.

> In addition, I saw the old use of maxBufferredDocs
> for controlling memory as a compromise. Now that setRAMBufferSizeMB is
> available, applications can benefit from both.

Do you mean to make maxBufferedDocs/maxBufferedDeleteTerms and
ramBufferSize take effect at the same time? That is, a flush is
triggered whenever one of the limits is reached? I think it's
reasonable. If we make that change, we should also change a few
default values to provide good out-of-box performance.

Ning

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: setRAMBufferSizeMB vs. setMaxBufferedDocs [ In reply to ]
Hi Ning,

"Ning Li" <ning.li.li@gmail.com> wrote on 24/09/2007 00:26:36:

> Do you mean to make maxBufferedDocs/maxBufferedDeleteTerms and
> ramBufferSize take effect at the same time? That is, a flush is
> triggered whenever one of the limits is reached? I think it's
> reasonable. If we make that change, we should also change a few
> default values to provide good out-of-box performance.

Yes that's what I meant - this way all these limits apply, but for
different purposes: maxBufferedDocs/maxBufferedDeleteTerms for availability
(of recent changes) for search considerations, and ramBufferSize for memory
limits considerations.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: setRAMBufferSizeMB vs. setMaxBufferedDocs [ In reply to ]
Hi Mike,

"Michael McCandless" <lucene@mikemccandless.com> wrote on 24/09/2007
20:19:31:

> But, we have to be careful for the ParallelReader use case: for this
> case you only want to trigger flushing by doc count (ie, never by
> RAM), right?

I think you mean the case that applications using the ParallelReader need
full control over flushing, so they can do all the tricks with docids,
right?

> It seems like sometimes we want "A or B, whichever comes first"but other
> times either just "A" or just "B".

Is it perhaps more drastic? - some applications need a way to disable some
(or all) auto flush mechanisms? Could a DISABLE_AUTU_FLUSH = -1 be used as
an indication to disable certain flushing? I.e. if passed to setMaxBuffDocs
will disable auto flushing by number of buffered docs, if passed to
setMaxRamBytes would disable that type of flushing, etc.?



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: setRAMBufferSizeMB vs. setMaxBufferedDocs [ In reply to ]
"Doron Cohen" <DORONC@il.ibm.com> wrote:
> Hi Ning,
>
> "Ning Li" <ning.li.li@gmail.com> wrote on 24/09/2007 00:26:36:
>
> > Do you mean to make maxBufferedDocs/maxBufferedDeleteTerms and
> > ramBufferSize take effect at the same time? That is, a flush is
> > triggered whenever one of the limits is reached? I think it's
> > reasonable. If we make that change, we should also change a few
> > default values to provide good out-of-box performance.
>
> Yes that's what I meant - this way all these limits apply, but for
> different purposes: maxBufferedDocs/maxBufferedDeleteTerms for availability
> (of recent changes) for search considerations, and ramBufferSize for memory
> limits considerations.

But, we have to be careful for the ParallelReader use case: for this
case you only want to trigger flushing by doc count (ie, never by
RAM), right?

It seems like sometimes we want "A or B, whichever comes first" but other
times either just "A" or just "B".

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: setRAMBufferSizeMB vs. setMaxBufferedDocs [ In reply to ]
Hi Mike,

I think we can simulate the support of just "A" or just "B" if we
support "A or B, whichever comes first": for someone just wants
ramBufferSize limit, he can set maxBufferedDocs/maxBufferedDeleteTerms
to max int; for ParallelReader, ramBufferSize can be set to max int
MB.

Ning


On 9/24/07, Michael McCandless <lucene@mikemccandless.com> wrote:
>
> "Doron Cohen" <DORONC@il.ibm.com> wrote:
> > Hi Ning,
> >
> > "Ning Li" <ning.li.li@gmail.com> wrote on 24/09/2007 00:26:36:
> >
> > > Do you mean to make maxBufferedDocs/maxBufferedDeleteTerms and
> > > ramBufferSize take effect at the same time? That is, a flush is
> > > triggered whenever one of the limits is reached? I think it's
> > > reasonable. If we make that change, we should also change a few
> > > default values to provide good out-of-box performance.
> >
> > Yes that's what I meant - this way all these limits apply, but for
> > different purposes: maxBufferedDocs/maxBufferedDeleteTerms for availability
> > (of recent changes) for search considerations, and ramBufferSize for memory
> > limits considerations.
>
> But, we have to be careful for the ParallelReader use case: for this
> case you only want to trigger flushing by doc count (ie, never by
> RAM), right?
>
> It seems like sometimes we want "A or B, whichever comes first" but other
> times either just "A" or just "B".
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: setRAMBufferSizeMB vs. setMaxBufferedDocs [ In reply to ]
"Doron Cohen" <DORONC@il.ibm.com> wrote:
> Hi Mike,
>
> "Michael McCandless" <lucene@mikemccandless.com> wrote on 24/09/2007
> 20:19:31:
>
> > But, we have to be careful for the ParallelReader use case: for this
> > case you only want to trigger flushing by doc count (ie, never by
> > RAM), right?
>

> I think you mean the case that applications using the ParallelReader
> need full control over flushing, so they can do all the tricks with
> docids, right?

Right. It's actually merging that you need control over (since it's
only merging that shifts the docIDs), but one simple way to control
merging is to always flush at the same time.

> > It seems like sometimes we want "A or B, whichever comes first"
> > but other times either just "A" or just "B".
>
> Is it perhaps more drastic? - some applications need a way to
> disable some (or all) auto flush mechanisms? Could a
> DISABLE_AUTU_FLUSH = -1 be used as an indication to disable certain
> flushing? I.e. if passed to setMaxBuffDocs will disable auto
> flushing by number of buffered docs, if passed to setMaxRamBytes
> would disable that type of flushing, etc.?

I like this approach -- assigning a static constant so people can
separately turn off the different sources of flush triggering. Then
whether we use MAX_INT (Ning's suggestion) or -1, it's got a name
(DISABLE_AUTO_FLUSH). Maybe -1 is better since that's cleaner for
double & int types.

On flushing pending deletes by RAM usage: should we just bundle this
up under "flush by RAM usage"? Ie "when total RAM usage, either from
buffered deletes, buffered docs, anything else, exceeds X then it's
time to flush"? (Instead of having a separate "max RAM usage for
buffered deletes" trigger).

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: setRAMBufferSizeMB vs. setMaxBufferedDocs [ In reply to ]
On 9/24/07, Michael McCandless <lucene@mikemccandless.com> wrote:
> On flushing pending deletes by RAM usage: should we just bundle this
> up under "flush by RAM usage"? Ie "when total RAM usage, either from
> buffered deletes, buffered docs, anything else, exceeds X then it's
> time to flush"? (Instead of having a separate "max RAM usage for
> buffered deletes" trigger).

Agree. We don't need a separate ram usage for deletes.

Ning

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: setRAMBufferSizeMB vs. setMaxBufferedDocs [ In reply to ]
Ning Li wrote:

> On 9/24/07, Michael McCandless <lucene@mikemccandless.com> wrote:
> > On flushing pending deletes by RAM usage: should we just bundle this
> > up under "flush by RAM usage"? Ie "when total RAM usage, either from
> > buffered deletes, buffered docs, anything else, exceeds X then it's
> > time to flush"? (Instead of having a separate "max RAM usage for
> > buffered deletes" trigger).
>
> Agree. We don't need a separate ram usage for deletes.

Yes, that's what I had in mind too - single limit on entire ram usage.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org