I ran another test. I thought I had increased the RAM buffer size to
8G and heap to 16G. However I still see two segments in the index that
was created. And looking at the infostream I see:
dir=MMapDirectory@/local/home/sokolovm/workspace/knn-perf/glove-100-angular.hdf5-train-200-200.index
lockFactory=org\
.apache.lucene.store.NativeFSLockFactory@4466af20
index=
version=9.4.0
analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
ramBufferSizeMB=8000.0
maxBufferedDocs=-1
...
perThreadHardLimitMB=1945
...
DWPT 0 [2022-09-13T02:42:53.329404950Z; main]: flush postings as
segment _6 numDocs=555373
IW 0 [2022-09-13T02:42:53.330671171Z; main]: 0 msec to write norms
IW 0 [2022-09-13T02:42:53.331113184Z; main]: 0 msec to write docValues
IW 0 [2022-09-13T02:42:53.331320146Z; main]: 0 msec to write points
IW 0 [2022-09-13T02:42:56.424195657Z; main]: 3092 msec to write vectors
IW 0 [2022-09-13T02:42:56.429239944Z; main]: 4 msec to finish stored fields
IW 0 [2022-09-13T02:42:56.429593512Z; main]: 0 msec to write postings
and finish vectors
IW 0 [2022-09-13T02:42:56.430309031Z; main]: 0 msec to write fieldInfos
DWPT 0 [2022-09-13T02:42:56.431721622Z; main]: new segment has 0 deleted docs
DWPT 0 [2022-09-13T02:42:56.431921144Z; main]: new segment has 0
soft-deleted docs
DWPT 0 [2022-09-13T02:42:56.435738086Z; main]: new segment has no
vectors; no norms; no docValues; no prox; freqs
DWPT 0 [2022-09-13T02:42:56.435952356Z; main]:
flushedFiles=[_6_Lucene94HnswVectorsFormat_0.vec, _6.fdm, _6.fdt, _6_\
Lucene94HnswVectorsFormat_0.vem, _6.fnm, _6.fdx,
_6_Lucene94HnswVectorsFormat_0.vex]
DWPT 0 [2022-09-13T02:42:56.436121861Z; main]: flushed codec=Lucene94
DWPT 0 [2022-09-13T02:42:56.437691468Z; main]: flushed: segment=_6
ramUsed=1,945.002 MB newFlushedSize=1,065.701 MB \
docs/MB=521.134
so I think it's this perThreadHardLimit that is triggering the
flushes? TBH this isn't something I had seen before; but the docs say:
/**
* Expert: Sets the maximum memory consumption per thread triggering
a forced flush if exceeded. A
* {@link DocumentsWriterPerThread} is forcefully flushed once it
exceeds this limit even if the
* {@link #getRAMBufferSizeMB()} has not been exceeded. This is a
safety limit to prevent a {@link
* DocumentsWriterPerThread} from address space exhaustion due to
its internal 32 bit signed
* integer based memory addressing. The given value must be less
that 2GB (2048MB)
*
* @see #DEFAULT_RAM_PER_THREAD_HARD_LIMIT_MB
*/
On Mon, Sep 12, 2022 at 6:28 PM Michael Sokolov <msokolov@gmail.com> wrote:
>
> Hi Mayya, thanks for persisting - I think we need to wrestle this to
> the ground for sure. In the test I ran, RAM buffer was the default
> checked in, which is weirdly: 1994MB. I did not specifically set heap
> size. I used maxConn/M=200. I'll try with larger buffer to see if I
> can get 9.4 to produce a single segment for the same test settings. I
> see you used a much smaller M (16), which should have produced quite
> small graphs, and I agree, should have been a single segment. Were you
> able to verify the number of segments?
>
> Agree that decrease in recall is not expected when more segments are produced.
>
> On Mon, Sep 12, 2022 at 1:51 PM Mayya Sharipova
> <mayya.sharipova@elastic.co.invalid> wrote:
> >
> > Hello Michael,
> > Thanks for checking.
> > Sorry for bringing this up again.
> > First of all, I am ok with proceeding with the Lucene 9.4 release and leaving the performance investigations for later.
> >
> > I am interested in what's the maxConn/M value you used for your tests? What was the heap memory and the size of the RAM buffer for indexing?
> > Usually, when we have multiple segments, recall should increase, not decrease. But I agree that with multiple segments we can see a big drop in QPS.
> >
> > Here is my investigation with detailed output of the performance difference between 9.3 and 9.4 releases. In my tests I used a large indexing buffer (2Gb) and large heap (5Gb) to end up with a single segment for both 9.3 and 9.4 tests, but still see a big drop in QPS in 9.4.
> >
> > Thank you.
> >
> >
> >
> >
> >
> > On Fri, Sep 9, 2022 at 12:21 PM Alan Woodward <romseygeek@gmail.com> wrote:
> >>
> >> Done. Thanks!
> >>
> >> > On 9 Sep 2022, at 16:32, Michael Sokolov <msokolov@gmail.com> wrote:
> >> >
> >> > Hi Alan - I checked out the interval queries patch; seems pretty safe,
> >> > please go ahead and port to 9.4. Thanks!
> >> >
> >> > Mike
> >> >
> >> > On Fri, Sep 9, 2022 at 10:41 AM Alan Woodward <romseygeek@gmail.com> wrote:
> >> >>
> >> >> Hi Mike,
> >> >>
> >> >> I’ve opened https://github.com/apache/lucene/pull/11760 as a small bug fix PR for a problem with interval queries. Am I OK to port this to the 9.4 branch?
> >> >>
> >> >> Thanks, Alan
> >> >>
> >> >> On 2 Sep 2022, at 20:42, Michael Sokolov <msokolov@gmail.com> wrote:
> >> >>
> >> >> NOTICE:
> >> >>
> >> >> Branch branch_9_4 has been cut and versions updated to 9.5 on stable branch.
> >> >>
> >> >> Please observe the normal rules:
> >> >>
> >> >> * No new features may be committed to the branch.
> >> >> * Documentation patches, build patches and serious bug fixes may be
> >> >> committed to the branch. However, you should submit all patches you
> >> >> want to commit to Jira first to give others the chance to review
> >> >> and possibly vote against the patch. Keep in mind that it is our
> >> >> main intention to keep the branch as stable as possible.
> >> >> * All patches that are intended for the branch should first be committed
> >> >> to the unstable branch, merged into the stable branch, and then into
> >> >> the current release branch.
> >> >> * Normal unstable and stable branch development may continue as usual.
> >> >> However, if you plan to commit a big change to the unstable branch
> >> >> while the branch feature freeze is in effect, think twice: can't the
> >> >> addition wait a couple more days? Merges of bug fixes into the branch
> >> >> may become more difficult.
> >> >> * Only Jira issues with Fix version 9.4 and priority "Blocker" will delay
> >> >> a release candidate build.
> >> >>
> >> >> ---------------------------------------------------------------------
> >> >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> >> >> For additional commands, e-mail: dev-help@lucene.apache.org
> >> >>
> >> >>
> >> >
> >> > ---------------------------------------------------------------------
> >> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> >> > For additional commands, e-mail: dev-help@lucene.apache.org
> >> >
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: dev-help@lucene.apache.org
> >>
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org