Mailing List Archive: Subject: New branch and feature freeze for Lucene 9.4.0

Re: Subject: New branch and feature freeze for Lucene 9.4.0 [ In reply to ]

Sep 19, 2022, 3:11 PM

Post #26 of 37 (395 views)

Using the ann-benchmarks framework, I still saw a similar regression as
Mayya between 9.3 and 9.4. I investigated and found it was due to
"KnnGraphTester to use KnnVectorQuery" (
https://github.com/apache/lucene/pull/796), specifically the change to the
warm-up strategy. If I revert it, the results look exactly as expected.

I guess we can keep an eye on the nightly benchmarks tomorrow to
double-check there's no drop. It would also be nice to formalize the
ann-benchmarks set-up and run it regularly (like we've discussed in
https://github.com/apache/lucene/issues/10665).

Julie

On Mon, Sep 19, 2022 at 10:33 AM Michael Sokolov <msokolov@gmail.com> wrote:

> Thanks for your speedy testing! I am observing comparable latencies *when
> the index geometry (ie number of segments)* is unchanged. Agree we can
> leave this for a later day. I'll proceed to cut 9.4 artifacts
>
> On Mon, Sep 19, 2022 at 11:02 AM Mayya Sharipova
> <mayya.sharipova@elastic.co.invalid> wrote:
>
>> It would be great if you all are able to test again with
>>> https://github.com/apache/lucene/pull/11781/ applied
>>
>>
>>
>> I ran the ann benchmarks with this change, and was happy to confirm that
>> in my test recall with this PR is the same as in 9.3 branch, although QPS
>> is lower, but we can investigate QPSs later.
>>
>> glove-100-angular M:16 efConstruction:100
>> 9.3 recall9.3 QPSthis PR recallthis PR QPS
>> n_cands=10 0.620 2745.933 0.620 1675.500
>> n_cands=20 0.680 2288.665 0.680 1512.744
>> n_cands=40 0.746 1770.243 0.746 1040.240
>> n_cands=80 0.809 1226.738 0.809 695.236
>> n_cands=120 0.843 948.908 0.843 525.914
>> n_cands=200 0.878 671.781 0.878 351.529
>> n_cands=400 0.918 392.265 0.918 207.854
>> n_cands=600 0.937 282.403 0.937 144.311
>> n_cands=800 0.949 214.620 0.949 116.875
>>
>> On Sun, Sep 18, 2022 at 6:25 PM Michael Sokolov <msokolov@gmail.com>
>> wrote:
>>
>>> OK, I think I was wrong about latency having increased due to a change
>>> in KnnGraphTester -- I did some testing there and couldn't reproduce.
>>> There does seem to be a slight vector search latency increase,
>>> possibly noise, but maybe due to the branching introduced to check
>>> whether to do byte vs float operations? It would be a little
>>> surprising if that were the case given the small number of branchings
>>> compared to the number of multiplies in dot-product though.
>>>
>>> On Sun, Sep 18, 2022 at 3:25 PM Michael Sokolov <msokolov@gmail.com>
>>> wrote:
>>> >
>>> > Thanks for the deep-dive Julie. I was able to reproduce the changing
>>> > recall. I had introduced some bugs in the diversity checks (that may
>>> > have partially canceled each other out? it's hard to understand what
>>> > was happening in the buggy case) and posted a fix today
>>> > https://github.com/apache/lucene/pull/11781.
>>> >
>>> > There are a couple of other outstanding issues I found while doing a
>>> > bunch of git bisecting;
>>> >
>>> > I think we might have introduced a (test-only) performance regression
>>> > in KnnGraphTester
>>> >
>>> > We may still be over-allocating the size of NeighborArray, leading to
>>> > excessive segmentation? I wonder if we could avoid dynamic
>>> > re-allocation there, and simply initialize every neighbor array to
>>> > 2*M+1.
>>> >
>>> > While I don't think these are necessarily blockers, given that we are
>>> > releasing HNSW improvements, it seems like we should address these,
>>> > especially as the build-graph-on-index is one of the things we are
>>> > releasing, and it is (may be?) impacted. I will see if I can put up a
>>> > patch or two.
>>> >
>>> > It would be great if you all are able to test again with
>>> > https://github.com/apache/lucene/pull/11781/ applied
>>> >
>>> > -Mike
>>> >
>>> > On Fri, Sep 16, 2022 at 11:07 AM Adrien Grand <jpountz@gmail.com>
>>> wrote:
>>> > >
>>> > > Thank you Mike, I just backported the change.
>>> > >
>>> > > On Thu, Sep 15, 2022 at 6:32 PM Michael Sokolov <msokolov@gmail.com>
>>> wrote:
>>> > >>
>>> > >> it looks like a small bug fix, we have had on main (and 9.x?) for a
>>> > >> while now and no test failures showed up, I guess. Should be OK to
>>> > >> port. I plan to cut artifacts this weekend, or Monday at the latest,
>>> > >> but if you can do the backport today or tomorrow, that's fine by me.
>>> > >>
>>> > >> On Thu, Sep 15, 2022 at 10:55 AM Adrien Grand <jpountz@gmail.com>
>>> wrote:
>>> > >> >
>>> > >> > Mike, I'm tempted to backport
>>> https://github.com/apache/lucene/pull/1068 to branch_9_4, which is a
>>> bugfix that looks pretty safe to me. What do you think?
>>> > >> >
>>> > >> > On Tue, Sep 13, 2022 at 4:11 PM Mayya Sharipova <
>>> mayya.sharipova@elastic.co.invalid> wrote:
>>> > >> >>
>>> > >> >> Thanks for running more tests, Michael.
>>> > >> >> It is encouraging that you saw a similar performance between 9.3
>>> and 9.4. I will also run more tests with different parameters.
>>> > >> >>
>>> > >> >> On Tue, Sep 13, 2022 at 9:30 AM Michael Sokolov <
>>> msokolov@gmail.com> wrote:
>>> > >> >>>
>>> > >> >>> As a follow-up, I ran a test using the same parameters as
>>> above, only
>>> > >> >>> changing M=200 to M=16. This did result in a single segment in
>>> both
>>> > >> >>> cases (9.3, 9.4) and the performance was pretty similar; within
>>> noise
>>> > >> >>> I think. The main difference I saw was that the 9.3 index was
>>> written
>>> > >> >>> using CFS:
>>> > >> >>>
>>> > >> >>> 9.4:
>>> > >> >>> recall latency nDoc fanout maxConn beamWidth visited
>>> index ms
>>> > >> >>> 0.755 1.36 1000000 100 16 100 200 891402
>>> 1.00
>>> > >> >>> post-filter
>>> > >> >>> -rw-r--r-- 1 sokolovm amazon 382M Sep 13 13:06
>>> > >> >>> _0_Lucene94HnswVectorsFormat_0.vec
>>> > >> >>> -rw-r--r-- 1 sokolovm amazon 262K Sep 13 13:06
>>> > >> >>> _0_Lucene94HnswVectorsFormat_0.vem
>>> > >> >>> -rw-r--r-- 1 sokolovm amazon 131M Sep 13 13:06
>>> > >> >>> _0_Lucene94HnswVectorsFormat_0.vex
>>> > >> >>>
>>> > >> >>> 9.3:
>>> > >> >>> recall latency nDoc fanout maxConn beamWidth visited
>>> index ms
>>> > >> >>> 0.775 1.34 1000000 100 16 100 4033 977043
>>> > >> >>> rw-r--r-- 1 sokolovm amazon 297 Sep 13 13:26 _0.cfe
>>> > >> >>> -rw-r--r-- 1 sokolovm amazon 516M Sep 13 13:26 _0.cfs
>>> > >> >>> -rw-r--r-- 1 sokolovm amazon 340 Sep 13 13:26 _0.si
>>> > >> >>>
>>> > >> >>> On Tue, Sep 13, 2022 at 8:50 AM Michael Sokolov <
>>> msokolov@gmail.com> wrote:
>>> > >> >>> >
>>> > >> >>> > I ran another test. I thought I had increased the RAM buffer
>>> size to
>>> > >> >>> > 8G and heap to 16G. However I still see two segments in the
>>> index that
>>> > >> >>> > was created. And looking at the infostream I see:
>>> > >> >>> >
>>> > >> >>> > dir=MMapDirectory@
>>> /local/home/sokolovm/workspace/knn-perf/glove-100-angular.hdf5-train-200-200.index
>>> > >> >>> > lockFactory=org\
>>> > >> >>> > .apache.lucene.store.NativeFSLockFactory@4466af20
>>> > >> >>> > index=
>>> > >> >>> > version=9.4.0
>>> > >> >>> > analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
>>> > >> >>> > ramBufferSizeMB=8000.0
>>> > >> >>> > maxBufferedDocs=-1
>>> > >> >>> > ...
>>> > >> >>> > perThreadHardLimitMB=1945
>>> > >> >>> > ...
>>> > >> >>> > DWPT 0 [2022-09-13T02:42:53.329404950Z; main]: flush postings
>>> as
>>> > >> >>> > segment _6 numDocs=555373
>>> > >> >>> > IW 0 [2022-09-13T02:42:53.330671171Z; main]: 0 msec to write
>>> norms
>>> > >> >>> > IW 0 [2022-09-13T02:42:53.331113184Z; main]: 0 msec to write
>>> docValues
>>> > >> >>> > IW 0 [2022-09-13T02:42:53.331320146Z; main]: 0 msec to write
>>> points
>>> > >> >>> > IW 0 [2022-09-13T02:42:56.424195657Z; main]: 3092 msec to
>>> write vectors
>>> > >> >>> > IW 0 [2022-09-13T02:42:56.429239944Z; main]: 4 msec to finish
>>> stored fields
>>> > >> >>> > IW 0 [2022-09-13T02:42:56.429593512Z; main]: 0 msec to write
>>> postings
>>> > >> >>> > and finish vectors
>>> > >> >>> > IW 0 [2022-09-13T02:42:56.430309031Z; main]: 0 msec to write
>>> fieldInfos
>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.431721622Z; main]: new segment
>>> has 0 deleted docs
>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.431921144Z; main]: new segment
>>> has 0
>>> > >> >>> > soft-deleted docs
>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.435738086Z; main]: new segment
>>> has no
>>> > >> >>> > vectors; no norms; no docValues; no prox; freqs
>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.435952356Z; main]:
>>> > >> >>> > flushedFiles=[_6_Lucene94HnswVectorsFormat_0.vec, _6.fdm,
>>> _6.fdt, _6_\
>>> > >> >>> > Lucene94HnswVectorsFormat_0.vem, _6.fnm, _6.fdx,
>>> > >> >>> > _6_Lucene94HnswVectorsFormat_0.vex]
>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.436121861Z; main]: flushed
>>> codec=Lucene94
>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.437691468Z; main]: flushed:
>>> segment=_6
>>> > >> >>> > ramUsed=1,945.002 MB newFlushedSize=1,065.701 MB \
>>> > >> >>> > docs/MB=521.134
>>> > >> >>> >
>>> > >> >>> > so I think it's this perThreadHardLimit that is triggering the
>>> > >> >>> > flushes? TBH this isn't something I had seen before; but the
>>> docs say:
>>> > >> >>> >
>>> > >> >>> > /**
>>> > >> >>> > * Expert: Sets the maximum memory consumption per thread
>>> triggering
>>> > >> >>> > a forced flush if exceeded. A
>>> > >> >>> > * {@link DocumentsWriterPerThread} is forcefully flushed
>>> once it
>>> > >> >>> > exceeds this limit even if the
>>> > >> >>> > * {@link #getRAMBufferSizeMB()} has not been exceeded.
>>> This is a
>>> > >> >>> > safety limit to prevent a {@link
>>> > >> >>> > * DocumentsWriterPerThread} from address space exhaustion
>>> due to
>>> > >> >>> > its internal 32 bit signed
>>> > >> >>> > * integer based memory addressing. The given value must be
>>> less
>>> > >> >>> > that 2GB (2048MB)
>>> > >> >>> > *
>>> > >> >>> > * @see #DEFAULT_RAM_PER_THREAD_HARD_LIMIT_MB
>>> > >> >>> > */
>>> > >> >>> >
>>> > >> >>> > On Mon, Sep 12, 2022 at 6:28 PM Michael Sokolov <
>>> msokolov@gmail.com> wrote:
>>> > >> >>> > >
>>> > >> >>> > > Hi Mayya, thanks for persisting - I think we need to
>>> wrestle this to
>>> > >> >>> > > the ground for sure. In the test I ran, RAM buffer was the
>>> default
>>> > >> >>> > > checked in, which is weirdly: 1994MB. I did not
>>> specifically set heap
>>> > >> >>> > > size. I used maxConn/M=200. I'll try with larger buffer to
>>> see if I
>>> > >> >>> > > can get 9.4 to produce a single segment for the same test
>>> settings. I
>>> > >> >>> > > see you used a much smaller M (16), which should have
>>> produced quite
>>> > >> >>> > > small graphs, and I agree, should have been a single
>>> segment. Were you
>>> > >> >>> > > able to verify the number of segments?
>>> > >> >>> > >
>>> > >> >>> > > Agree that decrease in recall is not expected when more
>>> segments are produced.
>>> > >> >>> > >
>>> > >> >>> > > On Mon, Sep 12, 2022 at 1:51 PM Mayya Sharipova
>>> > >> >>> > > <mayya.sharipova@elastic.co.invalid> wrote:
>>> > >> >>> > > >
>>> > >> >>> > > > Hello Michael,
>>> > >> >>> > > > Thanks for checking.
>>> > >> >>> > > > Sorry for bringing this up again.
>>> > >> >>> > > > First of all, I am ok with proceeding with the Lucene 9.4
>>> release and leaving the performance investigations for later.
>>> > >> >>> > > >
>>> > >> >>> > > > I am interested in what's the maxConn/M value you used
>>> for your tests? What was the heap memory and the size of the RAM buffer for
>>> indexing?
>>> > >> >>> > > > Usually, when we have multiple segments, recall should
>>> increase, not decrease. But I agree that with multiple segments we can see
>>> a big drop in QPS.
>>> > >> >>> > > >
>>> > >> >>> > > > Here is my investigation with detailed output of the
>>> performance difference between 9.3 and 9.4 releases. In my tests I used a
>>> large indexing buffer (2Gb) and large heap (5Gb) to end up with a single
>>> segment for both 9.3 and 9.4 tests, but still see a big drop in QPS in 9.4.
>>> > >> >>> > > >
>>> > >> >>> > > > Thank you.
>>> > >> >>> > > >
>>> > >> >>> > > >
>>> > >> >>> > > >
>>> > >> >>> > > >
>>> > >> >>> > > >
>>> > >> >>> > > > On Fri, Sep 9, 2022 at 12:21 PM Alan Woodward <
>>> romseygeek@gmail.com> wrote:
>>> > >> >>> > > >>
>>> > >> >>> > > >> Done. Thanks!
>>> > >> >>> > > >>
>>> > >> >>> > > >> > On 9 Sep 2022, at 16:32, Michael Sokolov <
>>> msokolov@gmail.com> wrote:
>>> > >> >>> > > >> >
>>> > >> >>> > > >> > Hi Alan - I checked out the interval queries patch;
>>> seems pretty safe,
>>> > >> >>> > > >> > please go ahead and port to 9.4. Thanks!
>>> > >> >>> > > >> >
>>> > >> >>> > > >> > Mike
>>> > >> >>> > > >> >
>>> > >> >>> > > >> > On Fri, Sep 9, 2022 at 10:41 AM Alan Woodward <
>>> romseygeek@gmail.com> wrote:
>>> > >> >>> > > >> >>
>>> > >> >>> > > >> >> Hi Mike,
>>> > >> >>> > > >> >>
>>> > >> >>> > > >> >> I’ve opened
>>> https://github.com/apache/lucene/pull/11760 as a small bug fix PR for a
>>> problem with interval queries. Am I OK to port this to the 9.4 branch?
>>> > >> >>> > > >> >>
>>> > >> >>> > > >> >> Thanks, Alan
>>> > >> >>> > > >> >>
>>> > >> >>> > > >> >> On 2 Sep 2022, at 20:42, Michael Sokolov <
>>> msokolov@gmail.com> wrote:
>>> > >> >>> > > >> >>
>>> > >> >>> > > >> >> NOTICE:
>>> > >> >>> > > >> >>
>>> > >> >>> > > >> >> Branch branch_9_4 has been cut and versions updated
>>> to 9.5 on stable branch.
>>> > >> >>> > > >> >>
>>> > >> >>> > > >> >> Please observe the normal rules:
>>> > >> >>> > > >> >>
>>> > >> >>> > > >> >> * No new features may be committed to the branch.
>>> > >> >>> > > >> >> * Documentation patches, build patches and serious
>>> bug fixes may be
>>> > >> >>> > > >> >> committed to the branch. However, you should submit
>>> all patches you
>>> > >> >>> > > >> >> want to commit to Jira first to give others the
>>> chance to review
>>> > >> >>> > > >> >> and possibly vote against the patch. Keep in mind
>>> that it is our
>>> > >> >>> > > >> >> main intention to keep the branch as stable as
>>> possible.
>>> > >> >>> > > >> >> * All patches that are intended for the branch should
>>> first be committed
>>> > >> >>> > > >> >> to the unstable branch, merged into the stable
>>> branch, and then into
>>> > >> >>> > > >> >> the current release branch.
>>> > >> >>> > > >> >> * Normal unstable and stable branch development may
>>> continue as usual.
>>> > >> >>> > > >> >> However, if you plan to commit a big change to the
>>> unstable branch
>>> > >> >>> > > >> >> while the branch feature freeze is in effect, think
>>> twice: can't the
>>> > >> >>> > > >> >> addition wait a couple more days? Merges of bug fixes
>>> into the branch
>>> > >> >>> > > >> >> may become more difficult.
>>> > >> >>> > > >> >> * Only Jira issues with Fix version 9.4 and priority
>>> "Blocker" will delay
>>> > >> >>> > > >> >> a release candidate build.
>>> > >> >>> > > >> >>
>>> > >> >>> > > >> >>
>>> ---------------------------------------------------------------------
>>> > >> >>> > > >> >> To unsubscribe, e-mail:
>>> dev-unsubscribe@lucene.apache.org
>>> > >> >>> > > >> >> For additional commands, e-mail:
>>> dev-help@lucene.apache.org
>>> > >> >>> > > >> >>
>>> > >> >>> > > >> >>
>>> > >> >>> > > >> >
>>> > >> >>> > > >> >
>>> ---------------------------------------------------------------------
>>> > >> >>> > > >> > To unsubscribe, e-mail:
>>> dev-unsubscribe@lucene.apache.org
>>> > >> >>> > > >> > For additional commands, e-mail:
>>> dev-help@lucene.apache.org
>>> > >> >>> > > >> >
>>> > >> >>> > > >>
>>> > >> >>> > > >>
>>> > >> >>> > > >>
>>> ---------------------------------------------------------------------
>>> > >> >>> > > >> To unsubscribe, e-mail:
>>> dev-unsubscribe@lucene.apache.org
>>> > >> >>> > > >> For additional commands, e-mail:
>>> dev-help@lucene.apache.org
>>> > >> >>> > > >>
>>> > >> >>>
>>> > >> >>>
>>> ---------------------------------------------------------------------
>>> > >> >>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>> > >> >>> For additional commands, e-mail: dev-help@lucene.apache.org
>>> > >> >>>
>>> > >> >
>>> > >> >
>>> > >> > --
>>> > >> > Adrien
>>> > >>
>>> > >>
>>> ---------------------------------------------------------------------
>>> > >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>> > >> For additional commands, e-mail: dev-help@lucene.apache.org
>>> > >>
>>> > >
>>> > >
>>> > > --
>>> > > Adrien
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>
>>>

Re: Subject: New branch and feature freeze for Lucene 9.4.0 [ In reply to ]

msokolov at gmail

Sep 19, 2022, 3:45 PM

Post #27 of 37 (395 views)

Permalink

I'm confused, since warming should not be counted in the timings. Are you
saying that the recall was affected??

On Mon, Sep 19, 2022 at 6:12 PM Julie Tibshirani <julietibs@gmail.com>
wrote:

> Using the ann-benchmarks framework, I still saw a similar regression as
> Mayya between 9.3 and 9.4. I investigated and found it was due to
> "KnnGraphTester to use KnnVectorQuery" (
> https://github.com/apache/lucene/pull/796), specifically the change to
> the warm-up strategy. If I revert it, the results look exactly as expected.
>
> I guess we can keep an eye on the nightly benchmarks tomorrow to
> double-check there's no drop. It would also be nice to formalize the
> ann-benchmarks set-up and run it regularly (like we've discussed in
> https://github.com/apache/lucene/issues/10665).
>
> Julie
>
> On Mon, Sep 19, 2022 at 10:33 AM Michael Sokolov <msokolov@gmail.com>
> wrote:
>
>> Thanks for your speedy testing! I am observing comparable latencies *when
>> the index geometry (ie number of segments)* is unchanged. Agree we can
>> leave this for a later day. I'll proceed to cut 9.4 artifacts
>>
>> On Mon, Sep 19, 2022 at 11:02 AM Mayya Sharipova
>> <mayya.sharipova@elastic.co.invalid> wrote:
>>
>>> It would be great if you all are able to test again with
>>>> https://github.com/apache/lucene/pull/11781/ applied
>>>
>>>
>>>
>>> I ran the ann benchmarks with this change, and was happy to confirm
>>> that in my test recall with this PR is the same as in 9.3 branch, although
>>> QPS is lower, but we can investigate QPSs later.
>>>
>>> glove-100-angular M:16 efConstruction:100
>>> 9.3 recall9.3 QPSthis PR recallthis PR QPS
>>> n_cands=10 0.620 2745.933 0.620 1675.500
>>> n_cands=20 0.680 2288.665 0.680 1512.744
>>> n_cands=40 0.746 1770.243 0.746 1040.240
>>> n_cands=80 0.809 1226.738 0.809 695.236
>>> n_cands=120 0.843 948.908 0.843 525.914
>>> n_cands=200 0.878 671.781 0.878 351.529
>>> n_cands=400 0.918 392.265 0.918 207.854
>>> n_cands=600 0.937 282.403 0.937 144.311
>>> n_cands=800 0.949 214.620 0.949 116.875
>>>
>>> On Sun, Sep 18, 2022 at 6:25 PM Michael Sokolov <msokolov@gmail.com>
>>> wrote:
>>>
>>>> OK, I think I was wrong about latency having increased due to a change
>>>> in KnnGraphTester -- I did some testing there and couldn't reproduce.
>>>> There does seem to be a slight vector search latency increase,
>>>> possibly noise, but maybe due to the branching introduced to check
>>>> whether to do byte vs float operations? It would be a little
>>>> surprising if that were the case given the small number of branchings
>>>> compared to the number of multiplies in dot-product though.
>>>>
>>>> On Sun, Sep 18, 2022 at 3:25 PM Michael Sokolov <msokolov@gmail.com>
>>>> wrote:
>>>> >
>>>> > Thanks for the deep-dive Julie. I was able to reproduce the changing
>>>> > recall. I had introduced some bugs in the diversity checks (that may
>>>> > have partially canceled each other out? it's hard to understand what
>>>> > was happening in the buggy case) and posted a fix today
>>>> > https://github.com/apache/lucene/pull/11781.
>>>> >
>>>> > There are a couple of other outstanding issues I found while doing a
>>>> > bunch of git bisecting;
>>>> >
>>>> > I think we might have introduced a (test-only) performance regression
>>>> > in KnnGraphTester
>>>> >
>>>> > We may still be over-allocating the size of NeighborArray, leading to
>>>> > excessive segmentation? I wonder if we could avoid dynamic
>>>> > re-allocation there, and simply initialize every neighbor array to
>>>> > 2*M+1.
>>>> >
>>>> > While I don't think these are necessarily blockers, given that we are
>>>> > releasing HNSW improvements, it seems like we should address these,
>>>> > especially as the build-graph-on-index is one of the things we are
>>>> > releasing, and it is (may be?) impacted. I will see if I can put up a
>>>> > patch or two.
>>>> >
>>>> > It would be great if you all are able to test again with
>>>> > https://github.com/apache/lucene/pull/11781/ applied
>>>> >
>>>> > -Mike
>>>> >
>>>> > On Fri, Sep 16, 2022 at 11:07 AM Adrien Grand <jpountz@gmail.com>
>>>> wrote:
>>>> > >
>>>> > > Thank you Mike, I just backported the change.
>>>> > >
>>>> > > On Thu, Sep 15, 2022 at 6:32 PM Michael Sokolov <msokolov@gmail.com>
>>>> wrote:
>>>> > >>
>>>> > >> it looks like a small bug fix, we have had on main (and 9.x?) for a
>>>> > >> while now and no test failures showed up, I guess. Should be OK to
>>>> > >> port. I plan to cut artifacts this weekend, or Monday at the
>>>> latest,
>>>> > >> but if you can do the backport today or tomorrow, that's fine by
>>>> me.
>>>> > >>
>>>> > >> On Thu, Sep 15, 2022 at 10:55 AM Adrien Grand <jpountz@gmail.com>
>>>> wrote:
>>>> > >> >
>>>> > >> > Mike, I'm tempted to backport
>>>> https://github.com/apache/lucene/pull/1068 to branch_9_4, which is a
>>>> bugfix that looks pretty safe to me. What do you think?
>>>> > >> >
>>>> > >> > On Tue, Sep 13, 2022 at 4:11 PM Mayya Sharipova <
>>>> mayya.sharipova@elastic.co.invalid> wrote:
>>>> > >> >>
>>>> > >> >> Thanks for running more tests, Michael.
>>>> > >> >> It is encouraging that you saw a similar performance between
>>>> 9.3 and 9.4. I will also run more tests with different parameters.
>>>> > >> >>
>>>> > >> >> On Tue, Sep 13, 2022 at 9:30 AM Michael Sokolov <
>>>> msokolov@gmail.com> wrote:
>>>> > >> >>>
>>>> > >> >>> As a follow-up, I ran a test using the same parameters as
>>>> above, only
>>>> > >> >>> changing M=200 to M=16. This did result in a single segment in
>>>> both
>>>> > >> >>> cases (9.3, 9.4) and the performance was pretty similar;
>>>> within noise
>>>> > >> >>> I think. The main difference I saw was that the 9.3 index was
>>>> written
>>>> > >> >>> using CFS:
>>>> > >> >>>
>>>> > >> >>> 9.4:
>>>> > >> >>> recall latency nDoc fanout maxConn beamWidth
>>>> visited index ms
>>>> > >> >>> 0.755 1.36 1000000 100 16 100 200
>>>> 891402 1.00
>>>> > >> >>> post-filter
>>>> > >> >>> -rw-r--r-- 1 sokolovm amazon 382M Sep 13 13:06
>>>> > >> >>> _0_Lucene94HnswVectorsFormat_0.vec
>>>> > >> >>> -rw-r--r-- 1 sokolovm amazon 262K Sep 13 13:06
>>>> > >> >>> _0_Lucene94HnswVectorsFormat_0.vem
>>>> > >> >>> -rw-r--r-- 1 sokolovm amazon 131M Sep 13 13:06
>>>> > >> >>> _0_Lucene94HnswVectorsFormat_0.vex
>>>> > >> >>>
>>>> > >> >>> 9.3:
>>>> > >> >>> recall latency nDoc fanout maxConn beamWidth
>>>> visited index ms
>>>> > >> >>> 0.775 1.34 1000000 100 16 100 4033 977043
>>>> > >> >>> rw-r--r-- 1 sokolovm amazon 297 Sep 13 13:26 _0.cfe
>>>> > >> >>> -rw-r--r-- 1 sokolovm amazon 516M Sep 13 13:26 _0.cfs
>>>> > >> >>> -rw-r--r-- 1 sokolovm amazon 340 Sep 13 13:26 _0.si
>>>> > >> >>>
>>>> > >> >>> On Tue, Sep 13, 2022 at 8:50 AM Michael Sokolov <
>>>> msokolov@gmail.com> wrote:
>>>> > >> >>> >
>>>> > >> >>> > I ran another test. I thought I had increased the RAM buffer
>>>> size to
>>>> > >> >>> > 8G and heap to 16G. However I still see two segments in the
>>>> index that
>>>> > >> >>> > was created. And looking at the infostream I see:
>>>> > >> >>> >
>>>> > >> >>> > dir=MMapDirectory@
>>>> /local/home/sokolovm/workspace/knn-perf/glove-100-angular.hdf5-train-200-200.index
>>>> > >> >>> > lockFactory=org\
>>>> > >> >>> > .apache.lucene.store.NativeFSLockFactory@4466af20
>>>> > >> >>> > index=
>>>> > >> >>> > version=9.4.0
>>>> > >> >>> > analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
>>>> > >> >>> > ramBufferSizeMB=8000.0
>>>> > >> >>> > maxBufferedDocs=-1
>>>> > >> >>> > ...
>>>> > >> >>> > perThreadHardLimitMB=1945
>>>> > >> >>> > ...
>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:53.329404950Z; main]: flush
>>>> postings as
>>>> > >> >>> > segment _6 numDocs=555373
>>>> > >> >>> > IW 0 [2022-09-13T02:42:53.330671171Z; main]: 0 msec to write
>>>> norms
>>>> > >> >>> > IW 0 [2022-09-13T02:42:53.331113184Z; main]: 0 msec to write
>>>> docValues
>>>> > >> >>> > IW 0 [2022-09-13T02:42:53.331320146Z; main]: 0 msec to write
>>>> points
>>>> > >> >>> > IW 0 [2022-09-13T02:42:56.424195657Z; main]: 3092 msec to
>>>> write vectors
>>>> > >> >>> > IW 0 [2022-09-13T02:42:56.429239944Z; main]: 4 msec to
>>>> finish stored fields
>>>> > >> >>> > IW 0 [2022-09-13T02:42:56.429593512Z; main]: 0 msec to write
>>>> postings
>>>> > >> >>> > and finish vectors
>>>> > >> >>> > IW 0 [2022-09-13T02:42:56.430309031Z; main]: 0 msec to write
>>>> fieldInfos
>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.431721622Z; main]: new segment
>>>> has 0 deleted docs
>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.431921144Z; main]: new segment
>>>> has 0
>>>> > >> >>> > soft-deleted docs
>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.435738086Z; main]: new segment
>>>> has no
>>>> > >> >>> > vectors; no norms; no docValues; no prox; freqs
>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.435952356Z; main]:
>>>> > >> >>> > flushedFiles=[_6_Lucene94HnswVectorsFormat_0.vec, _6.fdm,
>>>> _6.fdt, _6_\
>>>> > >> >>> > Lucene94HnswVectorsFormat_0.vem, _6.fnm, _6.fdx,
>>>> > >> >>> > _6_Lucene94HnswVectorsFormat_0.vex]
>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.436121861Z; main]: flushed
>>>> codec=Lucene94
>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.437691468Z; main]: flushed:
>>>> segment=_6
>>>> > >> >>> > ramUsed=1,945.002 MB newFlushedSize=1,065.701 MB \
>>>> > >> >>> > docs/MB=521.134
>>>> > >> >>> >
>>>> > >> >>> > so I think it's this perThreadHardLimit that is triggering
>>>> the
>>>> > >> >>> > flushes? TBH this isn't something I had seen before; but the
>>>> docs say:
>>>> > >> >>> >
>>>> > >> >>> > /**
>>>> > >> >>> > * Expert: Sets the maximum memory consumption per thread
>>>> triggering
>>>> > >> >>> > a forced flush if exceeded. A
>>>> > >> >>> > * {@link DocumentsWriterPerThread} is forcefully flushed
>>>> once it
>>>> > >> >>> > exceeds this limit even if the
>>>> > >> >>> > * {@link #getRAMBufferSizeMB()} has not been exceeded.
>>>> This is a
>>>> > >> >>> > safety limit to prevent a {@link
>>>> > >> >>> > * DocumentsWriterPerThread} from address space exhaustion
>>>> due to
>>>> > >> >>> > its internal 32 bit signed
>>>> > >> >>> > * integer based memory addressing. The given value must
>>>> be less
>>>> > >> >>> > that 2GB (2048MB)
>>>> > >> >>> > *
>>>> > >> >>> > * @see #DEFAULT_RAM_PER_THREAD_HARD_LIMIT_MB
>>>> > >> >>> > */
>>>> > >> >>> >
>>>> > >> >>> > On Mon, Sep 12, 2022 at 6:28 PM Michael Sokolov <
>>>> msokolov@gmail.com> wrote:
>>>> > >> >>> > >
>>>> > >> >>> > > Hi Mayya, thanks for persisting - I think we need to
>>>> wrestle this to
>>>> > >> >>> > > the ground for sure. In the test I ran, RAM buffer was the
>>>> default
>>>> > >> >>> > > checked in, which is weirdly: 1994MB. I did not
>>>> specifically set heap
>>>> > >> >>> > > size. I used maxConn/M=200. I'll try with larger buffer
>>>> to see if I
>>>> > >> >>> > > can get 9.4 to produce a single segment for the same test
>>>> settings. I
>>>> > >> >>> > > see you used a much smaller M (16), which should have
>>>> produced quite
>>>> > >> >>> > > small graphs, and I agree, should have been a single
>>>> segment. Were you
>>>> > >> >>> > > able to verify the number of segments?
>>>> > >> >>> > >
>>>> > >> >>> > > Agree that decrease in recall is not expected when more
>>>> segments are produced.
>>>> > >> >>> > >
>>>> > >> >>> > > On Mon, Sep 12, 2022 at 1:51 PM Mayya Sharipova
>>>> > >> >>> > > <mayya.sharipova@elastic.co.invalid> wrote:
>>>> > >> >>> > > >
>>>> > >> >>> > > > Hello Michael,
>>>> > >> >>> > > > Thanks for checking.
>>>> > >> >>> > > > Sorry for bringing this up again.
>>>> > >> >>> > > > First of all, I am ok with proceeding with the Lucene
>>>> 9.4 release and leaving the performance investigations for later.
>>>> > >> >>> > > >
>>>> > >> >>> > > > I am interested in what's the maxConn/M value you used
>>>> for your tests? What was the heap memory and the size of the RAM buffer for
>>>> indexing?
>>>> > >> >>> > > > Usually, when we have multiple segments, recall should
>>>> increase, not decrease. But I agree that with multiple segments we can see
>>>> a big drop in QPS.
>>>> > >> >>> > > >
>>>> > >> >>> > > > Here is my investigation with detailed output of the
>>>> performance difference between 9.3 and 9.4 releases. In my tests I used a
>>>> large indexing buffer (2Gb) and large heap (5Gb) to end up with a single
>>>> segment for both 9.3 and 9.4 tests, but still see a big drop in QPS in 9.4.
>>>> > >> >>> > > >
>>>> > >> >>> > > > Thank you.
>>>> > >> >>> > > >
>>>> > >> >>> > > >
>>>> > >> >>> > > >
>>>> > >> >>> > > >
>>>> > >> >>> > > >
>>>> > >> >>> > > > On Fri, Sep 9, 2022 at 12:21 PM Alan Woodward <
>>>> romseygeek@gmail.com> wrote:
>>>> > >> >>> > > >>
>>>> > >> >>> > > >> Done. Thanks!
>>>> > >> >>> > > >>
>>>> > >> >>> > > >> > On 9 Sep 2022, at 16:32, Michael Sokolov <
>>>> msokolov@gmail.com> wrote:
>>>> > >> >>> > > >> >
>>>> > >> >>> > > >> > Hi Alan - I checked out the interval queries patch;
>>>> seems pretty safe,
>>>> > >> >>> > > >> > please go ahead and port to 9.4. Thanks!
>>>> > >> >>> > > >> >
>>>> > >> >>> > > >> > Mike
>>>> > >> >>> > > >> >
>>>> > >> >>> > > >> > On Fri, Sep 9, 2022 at 10:41 AM Alan Woodward <
>>>> romseygeek@gmail.com> wrote:
>>>> > >> >>> > > >> >>
>>>> > >> >>> > > >> >> Hi Mike,
>>>> > >> >>> > > >> >>
>>>> > >> >>> > > >> >> I’ve opened
>>>> https://github.com/apache/lucene/pull/11760 as a small bug fix PR for
>>>> a problem with interval queries. Am I OK to port this to the 9.4 branch?
>>>> > >> >>> > > >> >>
>>>> > >> >>> > > >> >> Thanks, Alan
>>>> > >> >>> > > >> >>
>>>> > >> >>> > > >> >> On 2 Sep 2022, at 20:42, Michael Sokolov <
>>>> msokolov@gmail.com> wrote:
>>>> > >> >>> > > >> >>
>>>> > >> >>> > > >> >> NOTICE:
>>>> > >> >>> > > >> >>
>>>> > >> >>> > > >> >> Branch branch_9_4 has been cut and versions updated
>>>> to 9.5 on stable branch.
>>>> > >> >>> > > >> >>
>>>> > >> >>> > > >> >> Please observe the normal rules:
>>>> > >> >>> > > >> >>
>>>> > >> >>> > > >> >> * No new features may be committed to the branch.
>>>> > >> >>> > > >> >> * Documentation patches, build patches and serious
>>>> bug fixes may be
>>>> > >> >>> > > >> >> committed to the branch. However, you should submit
>>>> all patches you
>>>> > >> >>> > > >> >> want to commit to Jira first to give others the
>>>> chance to review
>>>> > >> >>> > > >> >> and possibly vote against the patch. Keep in mind
>>>> that it is our
>>>> > >> >>> > > >> >> main intention to keep the branch as stable as
>>>> possible.
>>>> > >> >>> > > >> >> * All patches that are intended for the branch
>>>> should first be committed
>>>> > >> >>> > > >> >> to the unstable branch, merged into the stable
>>>> branch, and then into
>>>> > >> >>> > > >> >> the current release branch.
>>>> > >> >>> > > >> >> * Normal unstable and stable branch development may
>>>> continue as usual.
>>>> > >> >>> > > >> >> However, if you plan to commit a big change to the
>>>> unstable branch
>>>> > >> >>> > > >> >> while the branch feature freeze is in effect, think
>>>> twice: can't the
>>>> > >> >>> > > >> >> addition wait a couple more days? Merges of bug
>>>> fixes into the branch
>>>> > >> >>> > > >> >> may become more difficult.
>>>> > >> >>> > > >> >> * Only Jira issues with Fix version 9.4 and priority
>>>> "Blocker" will delay
>>>> > >> >>> > > >> >> a release candidate build.
>>>> > >> >>> > > >> >>
>>>> > >> >>> > > >> >>
>>>> ---------------------------------------------------------------------
>>>> > >> >>> > > >> >> To unsubscribe, e-mail:
>>>> dev-unsubscribe@lucene.apache.org
>>>> > >> >>> > > >> >> For additional commands, e-mail:
>>>> dev-help@lucene.apache.org
>>>> > >> >>> > > >> >>
>>>> > >> >>> > > >> >>
>>>> > >> >>> > > >> >
>>>> > >> >>> > > >> >
>>>> ---------------------------------------------------------------------
>>>> > >> >>> > > >> > To unsubscribe, e-mail:
>>>> dev-unsubscribe@lucene.apache.org
>>>> > >> >>> > > >> > For additional commands, e-mail:
>>>> dev-help@lucene.apache.org
>>>> > >> >>> > > >> >
>>>> > >> >>> > > >>
>>>> > >> >>> > > >>
>>>> > >> >>> > > >>
>>>> ---------------------------------------------------------------------
>>>> > >> >>> > > >> To unsubscribe, e-mail:
>>>> dev-unsubscribe@lucene.apache.org
>>>> > >> >>> > > >> For additional commands, e-mail:
>>>> dev-help@lucene.apache.org
>>>> > >> >>> > > >>
>>>> > >> >>>
>>>> > >> >>>
>>>> ---------------------------------------------------------------------
>>>> > >> >>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>> > >> >>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>> > >> >>>
>>>> > >> >
>>>> > >> >
>>>> > >> > --
>>>> > >> > Adrien
>>>> > >>
>>>> > >>
>>>> ---------------------------------------------------------------------
>>>> > >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>> > >> For additional commands, e-mail: dev-help@lucene.apache.org
>>>> > >>
>>>> > >
>>>> > >
>>>> > > --
>>>> > > Adrien
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>
>>>>

Re: Subject: New branch and feature freeze for Lucene 9.4.0 [ In reply to ]

julietibs at gmail

Sep 19, 2022, 4:25 PM

Post #28 of 37 (395 views)

Permalink

Sorry for the confusion. To explain, I use a local ann-benchmarks set-up
that makes use of KnnGraphTester. It is a bit hacky and I accidentally
included the warm-ups in the final timings. So the change to warm-up
explains why we saw different results in our tests. This is great
motivation to solidify and publish my local ann-benchmarks set-up so that
it's not so fragile!

In summary, with your latest fix the recall and QPS look good to me -- I
don't detect any regression between 9.3 and 9.4.

Julie

On Mon, Sep 19, 2022 at 3:45 PM Michael Sokolov <msokolov@gmail.com> wrote:

> I'm confused, since warming should not be counted in the timings. Are you
> saying that the recall was affected??
>
> On Mon, Sep 19, 2022 at 6:12 PM Julie Tibshirani <julietibs@gmail.com>
> wrote:
>
>> Using the ann-benchmarks framework, I still saw a similar regression as
>> Mayya between 9.3 and 9.4. I investigated and found it was due to
>> "KnnGraphTester to use KnnVectorQuery" (
>> https://github.com/apache/lucene/pull/796), specifically the change to
>> the warm-up strategy. If I revert it, the results look exactly as expected.
>>
>> I guess we can keep an eye on the nightly benchmarks tomorrow to
>> double-check there's no drop. It would also be nice to formalize the
>> ann-benchmarks set-up and run it regularly (like we've discussed in
>> https://github.com/apache/lucene/issues/10665).
>>
>> Julie
>>
>> On Mon, Sep 19, 2022 at 10:33 AM Michael Sokolov <msokolov@gmail.com>
>> wrote:
>>
>>> Thanks for your speedy testing! I am observing comparable latencies
>>> *when the index geometry (ie number of segments)* is unchanged. Agree we
>>> can leave this for a later day. I'll proceed to cut 9.4 artifacts
>>>
>>> On Mon, Sep 19, 2022 at 11:02 AM Mayya Sharipova
>>> <mayya.sharipova@elastic.co.invalid> wrote:
>>>
>>>> It would be great if you all are able to test again with
>>>>> https://github.com/apache/lucene/pull/11781/ applied
>>>>
>>>>
>>>>
>>>> I ran the ann benchmarks with this change, and was happy to confirm
>>>> that in my test recall with this PR is the same as in 9.3 branch, although
>>>> QPS is lower, but we can investigate QPSs later.
>>>>
>>>> glove-100-angular M:16 efConstruction:100
>>>> 9.3 recall9.3 QPSthis PR recallthis PR QPS
>>>> n_cands=10 0.620 2745.933 0.620 1675.500
>>>> n_cands=20 0.680 2288.665 0.680 1512.744
>>>> n_cands=40 0.746 1770.243 0.746 1040.240
>>>> n_cands=80 0.809 1226.738 0.809 695.236
>>>> n_cands=120 0.843 948.908 0.843 525.914
>>>> n_cands=200 0.878 671.781 0.878 351.529
>>>> n_cands=400 0.918 392.265 0.918 207.854
>>>> n_cands=600 0.937 282.403 0.937 144.311
>>>> n_cands=800 0.949 214.620 0.949 116.875
>>>>
>>>> On Sun, Sep 18, 2022 at 6:25 PM Michael Sokolov <msokolov@gmail.com>
>>>> wrote:
>>>>
>>>>> OK, I think I was wrong about latency having increased due to a change
>>>>> in KnnGraphTester -- I did some testing there and couldn't reproduce.
>>>>> There does seem to be a slight vector search latency increase,
>>>>> possibly noise, but maybe due to the branching introduced to check
>>>>> whether to do byte vs float operations? It would be a little
>>>>> surprising if that were the case given the small number of branchings
>>>>> compared to the number of multiplies in dot-product though.
>>>>>
>>>>> On Sun, Sep 18, 2022 at 3:25 PM Michael Sokolov <msokolov@gmail.com>
>>>>> wrote:
>>>>> >
>>>>> > Thanks for the deep-dive Julie. I was able to reproduce the changing
>>>>> > recall. I had introduced some bugs in the diversity checks (that may
>>>>> > have partially canceled each other out? it's hard to understand what
>>>>> > was happening in the buggy case) and posted a fix today
>>>>> > https://github.com/apache/lucene/pull/11781.
>>>>> >
>>>>> > There are a couple of other outstanding issues I found while doing a
>>>>> > bunch of git bisecting;
>>>>> >
>>>>> > I think we might have introduced a (test-only) performance regression
>>>>> > in KnnGraphTester
>>>>> >
>>>>> > We may still be over-allocating the size of NeighborArray, leading to
>>>>> > excessive segmentation? I wonder if we could avoid dynamic
>>>>> > re-allocation there, and simply initialize every neighbor array to
>>>>> > 2*M+1.
>>>>> >
>>>>> > While I don't think these are necessarily blockers, given that we are
>>>>> > releasing HNSW improvements, it seems like we should address these,
>>>>> > especially as the build-graph-on-index is one of the things we are
>>>>> > releasing, and it is (may be?) impacted. I will see if I can put up a
>>>>> > patch or two.
>>>>> >
>>>>> > It would be great if you all are able to test again with
>>>>> > https://github.com/apache/lucene/pull/11781/ applied
>>>>> >
>>>>> > -Mike
>>>>> >
>>>>> > On Fri, Sep 16, 2022 at 11:07 AM Adrien Grand <jpountz@gmail.com>
>>>>> wrote:
>>>>> > >
>>>>> > > Thank you Mike, I just backported the change.
>>>>> > >
>>>>> > > On Thu, Sep 15, 2022 at 6:32 PM Michael Sokolov <
>>>>> msokolov@gmail.com> wrote:
>>>>> > >>
>>>>> > >> it looks like a small bug fix, we have had on main (and 9.x?) for
>>>>> a
>>>>> > >> while now and no test failures showed up, I guess. Should be OK to
>>>>> > >> port. I plan to cut artifacts this weekend, or Monday at the
>>>>> latest,
>>>>> > >> but if you can do the backport today or tomorrow, that's fine by
>>>>> me.
>>>>> > >>
>>>>> > >> On Thu, Sep 15, 2022 at 10:55 AM Adrien Grand <jpountz@gmail.com>
>>>>> wrote:
>>>>> > >> >
>>>>> > >> > Mike, I'm tempted to backport
>>>>> https://github.com/apache/lucene/pull/1068 to branch_9_4, which is a
>>>>> bugfix that looks pretty safe to me. What do you think?
>>>>> > >> >
>>>>> > >> > On Tue, Sep 13, 2022 at 4:11 PM Mayya Sharipova <
>>>>> mayya.sharipova@elastic.co.invalid> wrote:
>>>>> > >> >>
>>>>> > >> >> Thanks for running more tests, Michael.
>>>>> > >> >> It is encouraging that you saw a similar performance between
>>>>> 9.3 and 9.4. I will also run more tests with different parameters.
>>>>> > >> >>
>>>>> > >> >> On Tue, Sep 13, 2022 at 9:30 AM Michael Sokolov <
>>>>> msokolov@gmail.com> wrote:
>>>>> > >> >>>
>>>>> > >> >>> As a follow-up, I ran a test using the same parameters as
>>>>> above, only
>>>>> > >> >>> changing M=200 to M=16. This did result in a single segment
>>>>> in both
>>>>> > >> >>> cases (9.3, 9.4) and the performance was pretty similar;
>>>>> within noise
>>>>> > >> >>> I think. The main difference I saw was that the 9.3 index was
>>>>> written
>>>>> > >> >>> using CFS:
>>>>> > >> >>>
>>>>> > >> >>> 9.4:
>>>>> > >> >>> recall latency nDoc fanout maxConn beamWidth
>>>>> visited index ms
>>>>> > >> >>> 0.755 1.36 1000000 100 16 100 200
>>>>> 891402 1.00
>>>>> > >> >>> post-filter
>>>>> > >> >>> -rw-r--r-- 1 sokolovm amazon 382M Sep 13 13:06
>>>>> > >> >>> _0_Lucene94HnswVectorsFormat_0.vec
>>>>> > >> >>> -rw-r--r-- 1 sokolovm amazon 262K Sep 13 13:06
>>>>> > >> >>> _0_Lucene94HnswVectorsFormat_0.vem
>>>>> > >> >>> -rw-r--r-- 1 sokolovm amazon 131M Sep 13 13:06
>>>>> > >> >>> _0_Lucene94HnswVectorsFormat_0.vex
>>>>> > >> >>>
>>>>> > >> >>> 9.3:
>>>>> > >> >>> recall latency nDoc fanout maxConn beamWidth
>>>>> visited index ms
>>>>> > >> >>> 0.775 1.34 1000000 100 16 100 4033 977043
>>>>> > >> >>> rw-r--r-- 1 sokolovm amazon 297 Sep 13 13:26 _0.cfe
>>>>> > >> >>> -rw-r--r-- 1 sokolovm amazon 516M Sep 13 13:26 _0.cfs
>>>>> > >> >>> -rw-r--r-- 1 sokolovm amazon 340 Sep 13 13:26 _0.si
>>>>> > >> >>>
>>>>> > >> >>> On Tue, Sep 13, 2022 at 8:50 AM Michael Sokolov <
>>>>> msokolov@gmail.com> wrote:
>>>>> > >> >>> >
>>>>> > >> >>> > I ran another test. I thought I had increased the RAM
>>>>> buffer size to
>>>>> > >> >>> > 8G and heap to 16G. However I still see two segments in the
>>>>> index that
>>>>> > >> >>> > was created. And looking at the infostream I see:
>>>>> > >> >>> >
>>>>> > >> >>> > dir=MMapDirectory@
>>>>> /local/home/sokolovm/workspace/knn-perf/glove-100-angular.hdf5-train-200-200.index
>>>>> > >> >>> > lockFactory=org\
>>>>> > >> >>> > .apache.lucene.store.NativeFSLockFactory@4466af20
>>>>> > >> >>> > index=
>>>>> > >> >>> > version=9.4.0
>>>>> > >> >>> >
>>>>> analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
>>>>> > >> >>> > ramBufferSizeMB=8000.0
>>>>> > >> >>> > maxBufferedDocs=-1
>>>>> > >> >>> > ...
>>>>> > >> >>> > perThreadHardLimitMB=1945
>>>>> > >> >>> > ...
>>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:53.329404950Z; main]: flush
>>>>> postings as
>>>>> > >> >>> > segment _6 numDocs=555373
>>>>> > >> >>> > IW 0 [2022-09-13T02:42:53.330671171Z; main]: 0 msec to
>>>>> write norms
>>>>> > >> >>> > IW 0 [2022-09-13T02:42:53.331113184Z; main]: 0 msec to
>>>>> write docValues
>>>>> > >> >>> > IW 0 [2022-09-13T02:42:53.331320146Z; main]: 0 msec to
>>>>> write points
>>>>> > >> >>> > IW 0 [2022-09-13T02:42:56.424195657Z; main]: 3092 msec to
>>>>> write vectors
>>>>> > >> >>> > IW 0 [2022-09-13T02:42:56.429239944Z; main]: 4 msec to
>>>>> finish stored fields
>>>>> > >> >>> > IW 0 [2022-09-13T02:42:56.429593512Z; main]: 0 msec to
>>>>> write postings
>>>>> > >> >>> > and finish vectors
>>>>> > >> >>> > IW 0 [2022-09-13T02:42:56.430309031Z; main]: 0 msec to
>>>>> write fieldInfos
>>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.431721622Z; main]: new segment
>>>>> has 0 deleted docs
>>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.431921144Z; main]: new segment
>>>>> has 0
>>>>> > >> >>> > soft-deleted docs
>>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.435738086Z; main]: new segment
>>>>> has no
>>>>> > >> >>> > vectors; no norms; no docValues; no prox; freqs
>>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.435952356Z; main]:
>>>>> > >> >>> > flushedFiles=[_6_Lucene94HnswVectorsFormat_0.vec, _6.fdm,
>>>>> _6.fdt, _6_\
>>>>> > >> >>> > Lucene94HnswVectorsFormat_0.vem, _6.fnm, _6.fdx,
>>>>> > >> >>> > _6_Lucene94HnswVectorsFormat_0.vex]
>>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.436121861Z; main]: flushed
>>>>> codec=Lucene94
>>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.437691468Z; main]: flushed:
>>>>> segment=_6
>>>>> > >> >>> > ramUsed=1,945.002 MB newFlushedSize=1,065.701 MB \
>>>>> > >> >>> > docs/MB=521.134
>>>>> > >> >>> >
>>>>> > >> >>> > so I think it's this perThreadHardLimit that is triggering
>>>>> the
>>>>> > >> >>> > flushes? TBH this isn't something I had seen before; but
>>>>> the docs say:
>>>>> > >> >>> >
>>>>> > >> >>> > /**
>>>>> > >> >>> > * Expert: Sets the maximum memory consumption per thread
>>>>> triggering
>>>>> > >> >>> > a forced flush if exceeded. A
>>>>> > >> >>> > * {@link DocumentsWriterPerThread} is forcefully flushed
>>>>> once it
>>>>> > >> >>> > exceeds this limit even if the
>>>>> > >> >>> > * {@link #getRAMBufferSizeMB()} has not been exceeded.
>>>>> This is a
>>>>> > >> >>> > safety limit to prevent a {@link
>>>>> > >> >>> > * DocumentsWriterPerThread} from address space
>>>>> exhaustion due to
>>>>> > >> >>> > its internal 32 bit signed
>>>>> > >> >>> > * integer based memory addressing. The given value must
>>>>> be less
>>>>> > >> >>> > that 2GB (2048MB)
>>>>> > >> >>> > *
>>>>> > >> >>> > * @see #DEFAULT_RAM_PER_THREAD_HARD_LIMIT_MB
>>>>> > >> >>> > */
>>>>> > >> >>> >
>>>>> > >> >>> > On Mon, Sep 12, 2022 at 6:28 PM Michael Sokolov <
>>>>> msokolov@gmail.com> wrote:
>>>>> > >> >>> > >
>>>>> > >> >>> > > Hi Mayya, thanks for persisting - I think we need to
>>>>> wrestle this to
>>>>> > >> >>> > > the ground for sure. In the test I ran, RAM buffer was
>>>>> the default
>>>>> > >> >>> > > checked in, which is weirdly: 1994MB. I did not
>>>>> specifically set heap
>>>>> > >> >>> > > size. I used maxConn/M=200. I'll try with larger buffer
>>>>> to see if I
>>>>> > >> >>> > > can get 9.4 to produce a single segment for the same test
>>>>> settings. I
>>>>> > >> >>> > > see you used a much smaller M (16), which should have
>>>>> produced quite
>>>>> > >> >>> > > small graphs, and I agree, should have been a single
>>>>> segment. Were you
>>>>> > >> >>> > > able to verify the number of segments?
>>>>> > >> >>> > >
>>>>> > >> >>> > > Agree that decrease in recall is not expected when more
>>>>> segments are produced.
>>>>> > >> >>> > >
>>>>> > >> >>> > > On Mon, Sep 12, 2022 at 1:51 PM Mayya Sharipova
>>>>> > >> >>> > > <mayya.sharipova@elastic.co.invalid> wrote:
>>>>> > >> >>> > > >
>>>>> > >> >>> > > > Hello Michael,
>>>>> > >> >>> > > > Thanks for checking.
>>>>> > >> >>> > > > Sorry for bringing this up again.
>>>>> > >> >>> > > > First of all, I am ok with proceeding with the Lucene
>>>>> 9.4 release and leaving the performance investigations for later.
>>>>> > >> >>> > > >
>>>>> > >> >>> > > > I am interested in what's the maxConn/M value you used
>>>>> for your tests? What was the heap memory and the size of the RAM buffer for
>>>>> indexing?
>>>>> > >> >>> > > > Usually, when we have multiple segments, recall should
>>>>> increase, not decrease. But I agree that with multiple segments we can see
>>>>> a big drop in QPS.
>>>>> > >> >>> > > >
>>>>> > >> >>> > > > Here is my investigation with detailed output of the
>>>>> performance difference between 9.3 and 9.4 releases. In my tests I used a
>>>>> large indexing buffer (2Gb) and large heap (5Gb) to end up with a single
>>>>> segment for both 9.3 and 9.4 tests, but still see a big drop in QPS in 9.4.
>>>>> > >> >>> > > >
>>>>> > >> >>> > > > Thank you.
>>>>> > >> >>> > > >
>>>>> > >> >>> > > >
>>>>> > >> >>> > > >
>>>>> > >> >>> > > >
>>>>> > >> >>> > > >
>>>>> > >> >>> > > > On Fri, Sep 9, 2022 at 12:21 PM Alan Woodward <
>>>>> romseygeek@gmail.com> wrote:
>>>>> > >> >>> > > >>
>>>>> > >> >>> > > >> Done. Thanks!
>>>>> > >> >>> > > >>
>>>>> > >> >>> > > >> > On 9 Sep 2022, at 16:32, Michael Sokolov <
>>>>> msokolov@gmail.com> wrote:
>>>>> > >> >>> > > >> >
>>>>> > >> >>> > > >> > Hi Alan - I checked out the interval queries patch;
>>>>> seems pretty safe,
>>>>> > >> >>> > > >> > please go ahead and port to 9.4. Thanks!
>>>>> > >> >>> > > >> >
>>>>> > >> >>> > > >> > Mike
>>>>> > >> >>> > > >> >
>>>>> > >> >>> > > >> > On Fri, Sep 9, 2022 at 10:41 AM Alan Woodward <
>>>>> romseygeek@gmail.com> wrote:
>>>>> > >> >>> > > >> >>
>>>>> > >> >>> > > >> >> Hi Mike,
>>>>> > >> >>> > > >> >>
>>>>> > >> >>> > > >> >> I’ve opened
>>>>> https://github.com/apache/lucene/pull/11760 as a small bug fix PR for
>>>>> a problem with interval queries. Am I OK to port this to the 9.4 branch?
>>>>> > >> >>> > > >> >>
>>>>> > >> >>> > > >> >> Thanks, Alan
>>>>> > >> >>> > > >> >>
>>>>> > >> >>> > > >> >> On 2 Sep 2022, at 20:42, Michael Sokolov <
>>>>> msokolov@gmail.com> wrote:
>>>>> > >> >>> > > >> >>
>>>>> > >> >>> > > >> >> NOTICE:
>>>>> > >> >>> > > >> >>
>>>>> > >> >>> > > >> >> Branch branch_9_4 has been cut and versions updated
>>>>> to 9.5 on stable branch.
>>>>> > >> >>> > > >> >>
>>>>> > >> >>> > > >> >> Please observe the normal rules:
>>>>> > >> >>> > > >> >>
>>>>> > >> >>> > > >> >> * No new features may be committed to the branch.
>>>>> > >> >>> > > >> >> * Documentation patches, build patches and serious
>>>>> bug fixes may be
>>>>> > >> >>> > > >> >> committed to the branch. However, you should submit
>>>>> all patches you
>>>>> > >> >>> > > >> >> want to commit to Jira first to give others the
>>>>> chance to review
>>>>> > >> >>> > > >> >> and possibly vote against the patch. Keep in mind
>>>>> that it is our
>>>>> > >> >>> > > >> >> main intention to keep the branch as stable as
>>>>> possible.
>>>>> > >> >>> > > >> >> * All patches that are intended for the branch
>>>>> should first be committed
>>>>> > >> >>> > > >> >> to the unstable branch, merged into the stable
>>>>> branch, and then into
>>>>> > >> >>> > > >> >> the current release branch.
>>>>> > >> >>> > > >> >> * Normal unstable and stable branch development may
>>>>> continue as usual.
>>>>> > >> >>> > > >> >> However, if you plan to commit a big change to the
>>>>> unstable branch
>>>>> > >> >>> > > >> >> while the branch feature freeze is in effect, think
>>>>> twice: can't the
>>>>> > >> >>> > > >> >> addition wait a couple more days? Merges of bug
>>>>> fixes into the branch
>>>>> > >> >>> > > >> >> may become more difficult.
>>>>> > >> >>> > > >> >> * Only Jira issues with Fix version 9.4 and
>>>>> priority "Blocker" will delay
>>>>> > >> >>> > > >> >> a release candidate build.
>>>>> > >> >>> > > >> >>
>>>>> > >> >>> > > >> >>
>>>>> ---------------------------------------------------------------------
>>>>> > >> >>> > > >> >> To unsubscribe, e-mail:
>>>>> dev-unsubscribe@lucene.apache.org
>>>>> > >> >>> > > >> >> For additional commands, e-mail:
>>>>> dev-help@lucene.apache.org
>>>>> > >> >>> > > >> >>
>>>>> > >> >>> > > >> >>
>>>>> > >> >>> > > >> >
>>>>> > >> >>> > > >> >
>>>>> ---------------------------------------------------------------------
>>>>> > >> >>> > > >> > To unsubscribe, e-mail:
>>>>> dev-unsubscribe@lucene.apache.org
>>>>> > >> >>> > > >> > For additional commands, e-mail:
>>>>> dev-help@lucene.apache.org
>>>>> > >> >>> > > >> >
>>>>> > >> >>> > > >>
>>>>> > >> >>> > > >>
>>>>> > >> >>> > > >>
>>>>> ---------------------------------------------------------------------
>>>>> > >> >>> > > >> To unsubscribe, e-mail:
>>>>> dev-unsubscribe@lucene.apache.org
>>>>> > >> >>> > > >> For additional commands, e-mail:
>>>>> dev-help@lucene.apache.org
>>>>> > >> >>> > > >>
>>>>> > >> >>>
>>>>> > >> >>>
>>>>> ---------------------------------------------------------------------
>>>>> > >> >>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>>> > >> >>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>> > >> >>>
>>>>> > >> >
>>>>> > >> >
>>>>> > >> > --
>>>>> > >> > Adrien
>>>>> > >>
>>>>> > >>
>>>>> ---------------------------------------------------------------------
>>>>> > >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>>> > >> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>> > >>
>>>>> > >
>>>>> > >
>>>>> > > --
>>>>> > > Adrien
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>>
>>>>>

Re: Subject: New branch and feature freeze for Lucene 9.4.0 [ In reply to ]

jpountz at gmail

Sep 20, 2022, 3:24 AM

Post #29 of 37 (395 views)

Permalink

Hi Mike,

If you have not started a RC yet, I'd like to include some small fixes for
bugs that were recently introduced in Lucene:
- https://github.com/apache/lucene/pull/11792
- https://github.com/apache/lucene/pull/11794

On Tue, Sep 20, 2022 at 1:26 AM Julie Tibshirani <julietibs@gmail.com>
wrote:

> Sorry for the confusion. To explain, I use a local ann-benchmarks set-up
> that makes use of KnnGraphTester. It is a bit hacky and I accidentally
> included the warm-ups in the final timings. So the change to warm-up
> explains why we saw different results in our tests. This is great
> motivation to solidify and publish my local ann-benchmarks set-up so that
> it's not so fragile!
>
> In summary, with your latest fix the recall and QPS look good to me -- I
> don't detect any regression between 9.3 and 9.4.
>
> Julie
>
> On Mon, Sep 19, 2022 at 3:45 PM Michael Sokolov <msokolov@gmail.com>
> wrote:
>
>> I'm confused, since warming should not be counted in the timings. Are you
>> saying that the recall was affected??
>>
>> On Mon, Sep 19, 2022 at 6:12 PM Julie Tibshirani <julietibs@gmail.com>
>> wrote:
>>
>>> Using the ann-benchmarks framework, I still saw a similar regression as
>>> Mayya between 9.3 and 9.4. I investigated and found it was due to
>>> "KnnGraphTester to use KnnVectorQuery" (
>>> https://github.com/apache/lucene/pull/796), specifically the change to
>>> the warm-up strategy. If I revert it, the results look exactly as expected.
>>>
>>> I guess we can keep an eye on the nightly benchmarks tomorrow to
>>> double-check there's no drop. It would also be nice to formalize the
>>> ann-benchmarks set-up and run it regularly (like we've discussed in
>>> https://github.com/apache/lucene/issues/10665).
>>>
>>> Julie
>>>
>>> On Mon, Sep 19, 2022 at 10:33 AM Michael Sokolov <msokolov@gmail.com>
>>> wrote:
>>>
>>>> Thanks for your speedy testing! I am observing comparable latencies
>>>> *when the index geometry (ie number of segments)* is unchanged. Agree we
>>>> can leave this for a later day. I'll proceed to cut 9.4 artifacts
>>>>
>>>> On Mon, Sep 19, 2022 at 11:02 AM Mayya Sharipova
>>>> <mayya.sharipova@elastic.co.invalid> wrote:
>>>>
>>>>> It would be great if you all are able to test again with
>>>>>> https://github.com/apache/lucene/pull/11781/ applied
>>>>>
>>>>>
>>>>>
>>>>> I ran the ann benchmarks with this change, and was happy to confirm
>>>>> that in my test recall with this PR is the same as in 9.3 branch, although
>>>>> QPS is lower, but we can investigate QPSs later.
>>>>>
>>>>> glove-100-angular M:16 efConstruction:100
>>>>> 9.3 recall9.3 QPSthis PR recallthis PR QPS
>>>>> n_cands=10 0.620 2745.933 0.620 1675.500
>>>>> n_cands=20 0.680 2288.665 0.680 1512.744
>>>>> n_cands=40 0.746 1770.243 0.746 1040.240
>>>>> n_cands=80 0.809 1226.738 0.809 695.236
>>>>> n_cands=120 0.843 948.908 0.843 525.914
>>>>> n_cands=200 0.878 671.781 0.878 351.529
>>>>> n_cands=400 0.918 392.265 0.918 207.854
>>>>> n_cands=600 0.937 282.403 0.937 144.311
>>>>> n_cands=800 0.949 214.620 0.949 116.875
>>>>>
>>>>> On Sun, Sep 18, 2022 at 6:25 PM Michael Sokolov <msokolov@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> OK, I think I was wrong about latency having increased due to a change
>>>>>> in KnnGraphTester -- I did some testing there and couldn't reproduce.
>>>>>> There does seem to be a slight vector search latency increase,
>>>>>> possibly noise, but maybe due to the branching introduced to check
>>>>>> whether to do byte vs float operations? It would be a little
>>>>>> surprising if that were the case given the small number of branchings
>>>>>> compared to the number of multiplies in dot-product though.
>>>>>>
>>>>>> On Sun, Sep 18, 2022 at 3:25 PM Michael Sokolov <msokolov@gmail.com>
>>>>>> wrote:
>>>>>> >
>>>>>> > Thanks for the deep-dive Julie. I was able to reproduce the changing
>>>>>> > recall. I had introduced some bugs in the diversity checks (that may
>>>>>> > have partially canceled each other out? it's hard to understand what
>>>>>> > was happening in the buggy case) and posted a fix today
>>>>>> > https://github.com/apache/lucene/pull/11781.
>>>>>> >
>>>>>> > There are a couple of other outstanding issues I found while doing a
>>>>>> > bunch of git bisecting;
>>>>>> >
>>>>>> > I think we might have introduced a (test-only) performance
>>>>>> regression
>>>>>> > in KnnGraphTester
>>>>>> >
>>>>>> > We may still be over-allocating the size of NeighborArray, leading
>>>>>> to
>>>>>> > excessive segmentation? I wonder if we could avoid dynamic
>>>>>> > re-allocation there, and simply initialize every neighbor array to
>>>>>> > 2*M+1.
>>>>>> >
>>>>>> > While I don't think these are necessarily blockers, given that we
>>>>>> are
>>>>>> > releasing HNSW improvements, it seems like we should address these,
>>>>>> > especially as the build-graph-on-index is one of the things we are
>>>>>> > releasing, and it is (may be?) impacted. I will see if I can put up
>>>>>> a
>>>>>> > patch or two.
>>>>>> >
>>>>>> > It would be great if you all are able to test again with
>>>>>> > https://github.com/apache/lucene/pull/11781/ applied
>>>>>> >
>>>>>> > -Mike
>>>>>> >
>>>>>> > On Fri, Sep 16, 2022 at 11:07 AM Adrien Grand <jpountz@gmail.com>
>>>>>> wrote:
>>>>>> > >
>>>>>> > > Thank you Mike, I just backported the change.
>>>>>> > >
>>>>>> > > On Thu, Sep 15, 2022 at 6:32 PM Michael Sokolov <
>>>>>> msokolov@gmail.com> wrote:
>>>>>> > >>
>>>>>> > >> it looks like a small bug fix, we have had on main (and 9.x?)
>>>>>> for a
>>>>>> > >> while now and no test failures showed up, I guess. Should be OK
>>>>>> to
>>>>>> > >> port. I plan to cut artifacts this weekend, or Monday at the
>>>>>> latest,
>>>>>> > >> but if you can do the backport today or tomorrow, that's fine by
>>>>>> me.
>>>>>> > >>
>>>>>> > >> On Thu, Sep 15, 2022 at 10:55 AM Adrien Grand <jpountz@gmail.com>
>>>>>> wrote:
>>>>>> > >> >
>>>>>> > >> > Mike, I'm tempted to backport
>>>>>> https://github.com/apache/lucene/pull/1068 to branch_9_4, which is a
>>>>>> bugfix that looks pretty safe to me. What do you think?
>>>>>> > >> >
>>>>>> > >> > On Tue, Sep 13, 2022 at 4:11 PM Mayya Sharipova <
>>>>>> mayya.sharipova@elastic.co.invalid> wrote:
>>>>>> > >> >>
>>>>>> > >> >> Thanks for running more tests, Michael.
>>>>>> > >> >> It is encouraging that you saw a similar performance between
>>>>>> 9.3 and 9.4. I will also run more tests with different parameters.
>>>>>> > >> >>
>>>>>> > >> >> On Tue, Sep 13, 2022 at 9:30 AM Michael Sokolov <
>>>>>> msokolov@gmail.com> wrote:
>>>>>> > >> >>>
>>>>>> > >> >>> As a follow-up, I ran a test using the same parameters as
>>>>>> above, only
>>>>>> > >> >>> changing M=200 to M=16. This did result in a single segment
>>>>>> in both
>>>>>> > >> >>> cases (9.3, 9.4) and the performance was pretty similar;
>>>>>> within noise
>>>>>> > >> >>> I think. The main difference I saw was that the 9.3 index
>>>>>> was written
>>>>>> > >> >>> using CFS:
>>>>>> > >> >>>
>>>>>> > >> >>> 9.4:
>>>>>> > >> >>> recall latency nDoc fanout maxConn beamWidth
>>>>>> visited index ms
>>>>>> > >> >>> 0.755 1.36 1000000 100 16 100 200
>>>>>> 891402 1.00
>>>>>> > >> >>> post-filter
>>>>>> > >> >>> -rw-r--r-- 1 sokolovm amazon 382M Sep 13 13:06
>>>>>> > >> >>> _0_Lucene94HnswVectorsFormat_0.vec
>>>>>> > >> >>> -rw-r--r-- 1 sokolovm amazon 262K Sep 13 13:06
>>>>>> > >> >>> _0_Lucene94HnswVectorsFormat_0.vem
>>>>>> > >> >>> -rw-r--r-- 1 sokolovm amazon 131M Sep 13 13:06
>>>>>> > >> >>> _0_Lucene94HnswVectorsFormat_0.vex
>>>>>> > >> >>>
>>>>>> > >> >>> 9.3:
>>>>>> > >> >>> recall latency nDoc fanout maxConn beamWidth
>>>>>> visited index ms
>>>>>> > >> >>> 0.775 1.34 1000000 100 16 100 4033
>>>>>> 977043
>>>>>> > >> >>> rw-r--r-- 1 sokolovm amazon 297 Sep 13 13:26 _0.cfe
>>>>>> > >> >>> -rw-r--r-- 1 sokolovm amazon 516M Sep 13 13:26 _0.cfs
>>>>>> > >> >>> -rw-r--r-- 1 sokolovm amazon 340 Sep 13 13:26 _0.si
>>>>>> > >> >>>
>>>>>> > >> >>> On Tue, Sep 13, 2022 at 8:50 AM Michael Sokolov <
>>>>>> msokolov@gmail.com> wrote:
>>>>>> > >> >>> >
>>>>>> > >> >>> > I ran another test. I thought I had increased the RAM
>>>>>> buffer size to
>>>>>> > >> >>> > 8G and heap to 16G. However I still see two segments in
>>>>>> the index that
>>>>>> > >> >>> > was created. And looking at the infostream I see:
>>>>>> > >> >>> >
>>>>>> > >> >>> > dir=MMapDirectory@
>>>>>> /local/home/sokolovm/workspace/knn-perf/glove-100-angular.hdf5-train-200-200.index
>>>>>> > >> >>> > lockFactory=org\
>>>>>> > >> >>> > .apache.lucene.store.NativeFSLockFactory@4466af20
>>>>>> > >> >>> > index=
>>>>>> > >> >>> > version=9.4.0
>>>>>> > >> >>> >
>>>>>> analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
>>>>>> > >> >>> > ramBufferSizeMB=8000.0
>>>>>> > >> >>> > maxBufferedDocs=-1
>>>>>> > >> >>> > ...
>>>>>> > >> >>> > perThreadHardLimitMB=1945
>>>>>> > >> >>> > ...
>>>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:53.329404950Z; main]: flush
>>>>>> postings as
>>>>>> > >> >>> > segment _6 numDocs=555373
>>>>>> > >> >>> > IW 0 [2022-09-13T02:42:53.330671171Z; main]: 0 msec to
>>>>>> write norms
>>>>>> > >> >>> > IW 0 [2022-09-13T02:42:53.331113184Z; main]: 0 msec to
>>>>>> write docValues
>>>>>> > >> >>> > IW 0 [2022-09-13T02:42:53.331320146Z; main]: 0 msec to
>>>>>> write points
>>>>>> > >> >>> > IW 0 [2022-09-13T02:42:56.424195657Z; main]: 3092 msec to
>>>>>> write vectors
>>>>>> > >> >>> > IW 0 [2022-09-13T02:42:56.429239944Z; main]: 4 msec to
>>>>>> finish stored fields
>>>>>> > >> >>> > IW 0 [2022-09-13T02:42:56.429593512Z; main]: 0 msec to
>>>>>> write postings
>>>>>> > >> >>> > and finish vectors
>>>>>> > >> >>> > IW 0 [2022-09-13T02:42:56.430309031Z; main]: 0 msec to
>>>>>> write fieldInfos
>>>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.431721622Z; main]: new segment
>>>>>> has 0 deleted docs
>>>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.431921144Z; main]: new segment
>>>>>> has 0
>>>>>> > >> >>> > soft-deleted docs
>>>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.435738086Z; main]: new segment
>>>>>> has no
>>>>>> > >> >>> > vectors; no norms; no docValues; no prox; freqs
>>>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.435952356Z; main]:
>>>>>> > >> >>> > flushedFiles=[_6_Lucene94HnswVectorsFormat_0.vec, _6.fdm,
>>>>>> _6.fdt, _6_\
>>>>>> > >> >>> > Lucene94HnswVectorsFormat_0.vem, _6.fnm, _6.fdx,
>>>>>> > >> >>> > _6_Lucene94HnswVectorsFormat_0.vex]
>>>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.436121861Z; main]: flushed
>>>>>> codec=Lucene94
>>>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.437691468Z; main]: flushed:
>>>>>> segment=_6
>>>>>> > >> >>> > ramUsed=1,945.002 MB newFlushedSize=1,065.701 MB \
>>>>>> > >> >>> > docs/MB=521.134
>>>>>> > >> >>> >
>>>>>> > >> >>> > so I think it's this perThreadHardLimit that is triggering
>>>>>> the
>>>>>> > >> >>> > flushes? TBH this isn't something I had seen before; but
>>>>>> the docs say:
>>>>>> > >> >>> >
>>>>>> > >> >>> > /**
>>>>>> > >> >>> > * Expert: Sets the maximum memory consumption per
>>>>>> thread triggering
>>>>>> > >> >>> > a forced flush if exceeded. A
>>>>>> > >> >>> > * {@link DocumentsWriterPerThread} is forcefully
>>>>>> flushed once it
>>>>>> > >> >>> > exceeds this limit even if the
>>>>>> > >> >>> > * {@link #getRAMBufferSizeMB()} has not been exceeded.
>>>>>> This is a
>>>>>> > >> >>> > safety limit to prevent a {@link
>>>>>> > >> >>> > * DocumentsWriterPerThread} from address space
>>>>>> exhaustion due to
>>>>>> > >> >>> > its internal 32 bit signed
>>>>>> > >> >>> > * integer based memory addressing. The given value must
>>>>>> be less
>>>>>> > >> >>> > that 2GB (2048MB)
>>>>>> > >> >>> > *
>>>>>> > >> >>> > * @see #DEFAULT_RAM_PER_THREAD_HARD_LIMIT_MB
>>>>>> > >> >>> > */
>>>>>> > >> >>> >
>>>>>> > >> >>> > On Mon, Sep 12, 2022 at 6:28 PM Michael Sokolov <
>>>>>> msokolov@gmail.com> wrote:
>>>>>> > >> >>> > >
>>>>>> > >> >>> > > Hi Mayya, thanks for persisting - I think we need to
>>>>>> wrestle this to
>>>>>> > >> >>> > > the ground for sure. In the test I ran, RAM buffer was
>>>>>> the default
>>>>>> > >> >>> > > checked in, which is weirdly: 1994MB. I did not
>>>>>> specifically set heap
>>>>>> > >> >>> > > size. I used maxConn/M=200. I'll try with larger buffer
>>>>>> to see if I
>>>>>> > >> >>> > > can get 9.4 to produce a single segment for the same
>>>>>> test settings. I
>>>>>> > >> >>> > > see you used a much smaller M (16), which should have
>>>>>> produced quite
>>>>>> > >> >>> > > small graphs, and I agree, should have been a single
>>>>>> segment. Were you
>>>>>> > >> >>> > > able to verify the number of segments?
>>>>>> > >> >>> > >
>>>>>> > >> >>> > > Agree that decrease in recall is not expected when more
>>>>>> segments are produced.
>>>>>> > >> >>> > >
>>>>>> > >> >>> > > On Mon, Sep 12, 2022 at 1:51 PM Mayya Sharipova
>>>>>> > >> >>> > > <mayya.sharipova@elastic.co.invalid> wrote:
>>>>>> > >> >>> > > >
>>>>>> > >> >>> > > > Hello Michael,
>>>>>> > >> >>> > > > Thanks for checking.
>>>>>> > >> >>> > > > Sorry for bringing this up again.
>>>>>> > >> >>> > > > First of all, I am ok with proceeding with the Lucene
>>>>>> 9.4 release and leaving the performance investigations for later.
>>>>>> > >> >>> > > >
>>>>>> > >> >>> > > > I am interested in what's the maxConn/M value you used
>>>>>> for your tests? What was the heap memory and the size of the RAM buffer for
>>>>>> indexing?
>>>>>> > >> >>> > > > Usually, when we have multiple segments, recall should
>>>>>> increase, not decrease. But I agree that with multiple segments we can see
>>>>>> a big drop in QPS.
>>>>>> > >> >>> > > >
>>>>>> > >> >>> > > > Here is my investigation with detailed output of the
>>>>>> performance difference between 9.3 and 9.4 releases. In my tests I used a
>>>>>> large indexing buffer (2Gb) and large heap (5Gb) to end up with a single
>>>>>> segment for both 9.3 and 9.4 tests, but still see a big drop in QPS in 9.4.
>>>>>> > >> >>> > > >
>>>>>> > >> >>> > > > Thank you.
>>>>>> > >> >>> > > >
>>>>>> > >> >>> > > >
>>>>>> > >> >>> > > >
>>>>>> > >> >>> > > >
>>>>>> > >> >>> > > >
>>>>>> > >> >>> > > > On Fri, Sep 9, 2022 at 12:21 PM Alan Woodward <
>>>>>> romseygeek@gmail.com> wrote:
>>>>>> > >> >>> > > >>
>>>>>> > >> >>> > > >> Done. Thanks!
>>>>>> > >> >>> > > >>
>>>>>> > >> >>> > > >> > On 9 Sep 2022, at 16:32, Michael Sokolov <
>>>>>> msokolov@gmail.com> wrote:
>>>>>> > >> >>> > > >> >
>>>>>> > >> >>> > > >> > Hi Alan - I checked out the interval queries patch;
>>>>>> seems pretty safe,
>>>>>> > >> >>> > > >> > please go ahead and port to 9.4. Thanks!
>>>>>> > >> >>> > > >> >
>>>>>> > >> >>> > > >> > Mike
>>>>>> > >> >>> > > >> >
>>>>>> > >> >>> > > >> > On Fri, Sep 9, 2022 at 10:41 AM Alan Woodward <
>>>>>> romseygeek@gmail.com> wrote:
>>>>>> > >> >>> > > >> >>
>>>>>> > >> >>> > > >> >> Hi Mike,
>>>>>> > >> >>> > > >> >>
>>>>>> > >> >>> > > >> >> I’ve opened
>>>>>> https://github.com/apache/lucene/pull/11760 as a small bug fix PR
>>>>>> for a problem with interval queries. Am I OK to port this to the 9.4
>>>>>> branch?
>>>>>> > >> >>> > > >> >>
>>>>>> > >> >>> > > >> >> Thanks, Alan
>>>>>> > >> >>> > > >> >>
>>>>>> > >> >>> > > >> >> On 2 Sep 2022, at 20:42, Michael Sokolov <
>>>>>> msokolov@gmail.com> wrote:
>>>>>> > >> >>> > > >> >>
>>>>>> > >> >>> > > >> >> NOTICE:
>>>>>> > >> >>> > > >> >>
>>>>>> > >> >>> > > >> >> Branch branch_9_4 has been cut and versions
>>>>>> updated to 9.5 on stable branch.
>>>>>> > >> >>> > > >> >>
>>>>>> > >> >>> > > >> >> Please observe the normal rules:
>>>>>> > >> >>> > > >> >>
>>>>>> > >> >>> > > >> >> * No new features may be committed to the branch.
>>>>>> > >> >>> > > >> >> * Documentation patches, build patches and serious
>>>>>> bug fixes may be
>>>>>> > >> >>> > > >> >> committed to the branch. However, you should
>>>>>> submit all patches you
>>>>>> > >> >>> > > >> >> want to commit to Jira first to give others the
>>>>>> chance to review
>>>>>> > >> >>> > > >> >> and possibly vote against the patch. Keep in mind
>>>>>> that it is our
>>>>>> > >> >>> > > >> >> main intention to keep the branch as stable as
>>>>>> possible.
>>>>>> > >> >>> > > >> >> * All patches that are intended for the branch
>>>>>> should first be committed
>>>>>> > >> >>> > > >> >> to the unstable branch, merged into the stable
>>>>>> branch, and then into
>>>>>> > >> >>> > > >> >> the current release branch.
>>>>>> > >> >>> > > >> >> * Normal unstable and stable branch development
>>>>>> may continue as usual.
>>>>>> > >> >>> > > >> >> However, if you plan to commit a big change to the
>>>>>> unstable branch
>>>>>> > >> >>> > > >> >> while the branch feature freeze is in effect,
>>>>>> think twice: can't the
>>>>>> > >> >>> > > >> >> addition wait a couple more days? Merges of bug
>>>>>> fixes into the branch
>>>>>> > >> >>> > > >> >> may become more difficult.
>>>>>> > >> >>> > > >> >> * Only Jira issues with Fix version 9.4 and
>>>>>> priority "Blocker" will delay
>>>>>> > >> >>> > > >> >> a release candidate build.
>>>>>> > >> >>> > > >> >>
>>>>>> > >> >>> > > >> >>
>>>>>> ---------------------------------------------------------------------
>>>>>> > >> >>> > > >> >> To unsubscribe, e-mail:
>>>>>> dev-unsubscribe@lucene.apache.org
>>>>>> > >> >>> > > >> >> For additional commands, e-mail:
>>>>>> dev-help@lucene.apache.org
>>>>>> > >> >>> > > >> >>
>>>>>> > >> >>> > > >> >>
>>>>>> > >> >>> > > >> >
>>>>>> > >> >>> > > >> >
>>>>>> ---------------------------------------------------------------------
>>>>>> > >> >>> > > >> > To unsubscribe, e-mail:
>>>>>> dev-unsubscribe@lucene.apache.org
>>>>>> > >> >>> > > >> > For additional commands, e-mail:
>>>>>> dev-help@lucene.apache.org
>>>>>> > >> >>> > > >> >
>>>>>> > >> >>> > > >>
>>>>>> > >> >>> > > >>
>>>>>> > >> >>> > > >>
>>>>>> ---------------------------------------------------------------------
>>>>>> > >> >>> > > >> To unsubscribe, e-mail:
>>>>>> dev-unsubscribe@lucene.apache.org
>>>>>> > >> >>> > > >> For additional commands, e-mail:
>>>>>> dev-help@lucene.apache.org
>>>>>> > >> >>> > > >>
>>>>>> > >> >>>
>>>>>> > >> >>>
>>>>>> ---------------------------------------------------------------------
>>>>>> > >> >>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>>>> > >> >>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>>> > >> >>>
>>>>>> > >> >
>>>>>> > >> >
>>>>>> > >> > --
>>>>>> > >> > Adrien
>>>>>> > >>
>>>>>> > >>
>>>>>> ---------------------------------------------------------------------
>>>>>> > >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>>>> > >> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>>> > >>
>>>>>> > >
>>>>>> > >
>>>>>> > > --
>>>>>> > > Adrien
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>>>
>>>>>>

--
Adrien

Re: Subject: New branch and feature freeze for Lucene 9.4.0 [ In reply to ]

msokolov at gmail

Sep 20, 2022, 4:30 AM

Post #30 of 37 (395 views)

Permalink

well, I did start, optimistically, but I think I need to re-spin to include
a fix for this test failure that has been popping up, so I will pull these
in too.

On Tue, Sep 20, 2022 at 6:24 AM Adrien Grand <jpountz@gmail.com> wrote:

> Hi Mike,
>
> If you have not started a RC yet, I'd like to include some small fixes for
> bugs that were recently introduced in Lucene:
> - https://github.com/apache/lucene/pull/11792
> - https://github.com/apache/lucene/pull/11794
>
> On Tue, Sep 20, 2022 at 1:26 AM Julie Tibshirani <julietibs@gmail.com>
> wrote:
>
>> Sorry for the confusion. To explain, I use a local ann-benchmarks set-up
>> that makes use of KnnGraphTester. It is a bit hacky and I accidentally
>> included the warm-ups in the final timings. So the change to warm-up
>> explains why we saw different results in our tests. This is great
>> motivation to solidify and publish my local ann-benchmarks set-up so that
>> it's not so fragile!
>>
>> In summary, with your latest fix the recall and QPS look good to me -- I
>> don't detect any regression between 9.3 and 9.4.
>>
>> Julie
>>
>> On Mon, Sep 19, 2022 at 3:45 PM Michael Sokolov <msokolov@gmail.com>
>> wrote:
>>
>>> I'm confused, since warming should not be counted in the timings. Are
>>> you saying that the recall was affected??
>>>
>>> On Mon, Sep 19, 2022 at 6:12 PM Julie Tibshirani <julietibs@gmail.com>
>>> wrote:
>>>
>>>> Using the ann-benchmarks framework, I still saw a similar regression as
>>>> Mayya between 9.3 and 9.4. I investigated and found it was due to
>>>> "KnnGraphTester to use KnnVectorQuery" (
>>>> https://github.com/apache/lucene/pull/796), specifically the change to
>>>> the warm-up strategy. If I revert it, the results look exactly as expected.
>>>>
>>>> I guess we can keep an eye on the nightly benchmarks tomorrow to
>>>> double-check there's no drop. It would also be nice to formalize the
>>>> ann-benchmarks set-up and run it regularly (like we've discussed in
>>>> https://github.com/apache/lucene/issues/10665).
>>>>
>>>> Julie
>>>>
>>>> On Mon, Sep 19, 2022 at 10:33 AM Michael Sokolov <msokolov@gmail.com>
>>>> wrote:
>>>>
>>>>> Thanks for your speedy testing! I am observing comparable latencies
>>>>> *when the index geometry (ie number of segments)* is unchanged. Agree we
>>>>> can leave this for a later day. I'll proceed to cut 9.4 artifacts
>>>>>
>>>>> On Mon, Sep 19, 2022 at 11:02 AM Mayya Sharipova
>>>>> <mayya.sharipova@elastic.co.invalid> wrote:
>>>>>
>>>>>> It would be great if you all are able to test again with
>>>>>>> https://github.com/apache/lucene/pull/11781/ applied
>>>>>>
>>>>>>
>>>>>>
>>>>>> I ran the ann benchmarks with this change, and was happy to confirm
>>>>>> that in my test recall with this PR is the same as in 9.3 branch, although
>>>>>> QPS is lower, but we can investigate QPSs later.
>>>>>>
>>>>>> glove-100-angular M:16 efConstruction:100
>>>>>> 9.3 recall9.3 QPSthis PR recallthis PR QPS
>>>>>> n_cands=10 0.620 2745.933 0.620 1675.500
>>>>>> n_cands=20 0.680 2288.665 0.680 1512.744
>>>>>> n_cands=40 0.746 1770.243 0.746 1040.240
>>>>>> n_cands=80 0.809 1226.738 0.809 695.236
>>>>>> n_cands=120 0.843 948.908 0.843 525.914
>>>>>> n_cands=200 0.878 671.781 0.878 351.529
>>>>>> n_cands=400 0.918 392.265 0.918 207.854
>>>>>> n_cands=600 0.937 282.403 0.937 144.311
>>>>>> n_cands=800 0.949 214.620 0.949 116.875
>>>>>>
>>>>>> On Sun, Sep 18, 2022 at 6:25 PM Michael Sokolov <msokolov@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> OK, I think I was wrong about latency having increased due to a
>>>>>>> change
>>>>>>> in KnnGraphTester -- I did some testing there and couldn't reproduce.
>>>>>>> There does seem to be a slight vector search latency increase,
>>>>>>> possibly noise, but maybe due to the branching introduced to check
>>>>>>> whether to do byte vs float operations? It would be a little
>>>>>>> surprising if that were the case given the small number of branchings
>>>>>>> compared to the number of multiplies in dot-product though.
>>>>>>>
>>>>>>> On Sun, Sep 18, 2022 at 3:25 PM Michael Sokolov <msokolov@gmail.com>
>>>>>>> wrote:
>>>>>>> >
>>>>>>> > Thanks for the deep-dive Julie. I was able to reproduce the
>>>>>>> changing
>>>>>>> > recall. I had introduced some bugs in the diversity checks (that
>>>>>>> may
>>>>>>> > have partially canceled each other out? it's hard to understand
>>>>>>> what
>>>>>>> > was happening in the buggy case) and posted a fix today
>>>>>>> > https://github.com/apache/lucene/pull/11781.
>>>>>>> >
>>>>>>> > There are a couple of other outstanding issues I found while doing
>>>>>>> a
>>>>>>> > bunch of git bisecting;
>>>>>>> >
>>>>>>> > I think we might have introduced a (test-only) performance
>>>>>>> regression
>>>>>>> > in KnnGraphTester
>>>>>>> >
>>>>>>> > We may still be over-allocating the size of NeighborArray, leading
>>>>>>> to
>>>>>>> > excessive segmentation? I wonder if we could avoid dynamic
>>>>>>> > re-allocation there, and simply initialize every neighbor array to
>>>>>>> > 2*M+1.
>>>>>>> >
>>>>>>> > While I don't think these are necessarily blockers, given that we
>>>>>>> are
>>>>>>> > releasing HNSW improvements, it seems like we should address these,
>>>>>>> > especially as the build-graph-on-index is one of the things we are
>>>>>>> > releasing, and it is (may be?) impacted. I will see if I can put
>>>>>>> up a
>>>>>>> > patch or two.
>>>>>>> >
>>>>>>> > It would be great if you all are able to test again with
>>>>>>> > https://github.com/apache/lucene/pull/11781/ applied
>>>>>>> >
>>>>>>> > -Mike
>>>>>>> >
>>>>>>> > On Fri, Sep 16, 2022 at 11:07 AM Adrien Grand <jpountz@gmail.com>
>>>>>>> wrote:
>>>>>>> > >
>>>>>>> > > Thank you Mike, I just backported the change.
>>>>>>> > >
>>>>>>> > > On Thu, Sep 15, 2022 at 6:32 PM Michael Sokolov <
>>>>>>> msokolov@gmail.com> wrote:
>>>>>>> > >>
>>>>>>> > >> it looks like a small bug fix, we have had on main (and 9.x?)
>>>>>>> for a
>>>>>>> > >> while now and no test failures showed up, I guess. Should be OK
>>>>>>> to
>>>>>>> > >> port. I plan to cut artifacts this weekend, or Monday at the
>>>>>>> latest,
>>>>>>> > >> but if you can do the backport today or tomorrow, that's fine
>>>>>>> by me.
>>>>>>> > >>
>>>>>>> > >> On Thu, Sep 15, 2022 at 10:55 AM Adrien Grand <
>>>>>>> jpountz@gmail.com> wrote:
>>>>>>> > >> >
>>>>>>> > >> > Mike, I'm tempted to backport
>>>>>>> https://github.com/apache/lucene/pull/1068 to branch_9_4, which is
>>>>>>> a bugfix that looks pretty safe to me. What do you think?
>>>>>>> > >> >
>>>>>>> > >> > On Tue, Sep 13, 2022 at 4:11 PM Mayya Sharipova <
>>>>>>> mayya.sharipova@elastic.co.invalid> wrote:
>>>>>>> > >> >>
>>>>>>> > >> >> Thanks for running more tests, Michael.
>>>>>>> > >> >> It is encouraging that you saw a similar performance between
>>>>>>> 9.3 and 9.4. I will also run more tests with different parameters.
>>>>>>> > >> >>
>>>>>>> > >> >> On Tue, Sep 13, 2022 at 9:30 AM Michael Sokolov <
>>>>>>> msokolov@gmail.com> wrote:
>>>>>>> > >> >>>
>>>>>>> > >> >>> As a follow-up, I ran a test using the same parameters as
>>>>>>> above, only
>>>>>>> > >> >>> changing M=200 to M=16. This did result in a single segment
>>>>>>> in both
>>>>>>> > >> >>> cases (9.3, 9.4) and the performance was pretty similar;
>>>>>>> within noise
>>>>>>> > >> >>> I think. The main difference I saw was that the 9.3 index
>>>>>>> was written
>>>>>>> > >> >>> using CFS:
>>>>>>> > >> >>>
>>>>>>> > >> >>> 9.4:
>>>>>>> > >> >>> recall latency nDoc fanout maxConn beamWidth
>>>>>>> visited index ms
>>>>>>> > >> >>> 0.755 1.36 1000000 100 16 100 200
>>>>>>> 891402 1.00
>>>>>>> > >> >>> post-filter
>>>>>>> > >> >>> -rw-r--r-- 1 sokolovm amazon 382M Sep 13 13:06
>>>>>>> > >> >>> _0_Lucene94HnswVectorsFormat_0.vec
>>>>>>> > >> >>> -rw-r--r-- 1 sokolovm amazon 262K Sep 13 13:06
>>>>>>> > >> >>> _0_Lucene94HnswVectorsFormat_0.vem
>>>>>>> > >> >>> -rw-r--r-- 1 sokolovm amazon 131M Sep 13 13:06
>>>>>>> > >> >>> _0_Lucene94HnswVectorsFormat_0.vex
>>>>>>> > >> >>>
>>>>>>> > >> >>> 9.3:
>>>>>>> > >> >>> recall latency nDoc fanout maxConn beamWidth
>>>>>>> visited index ms
>>>>>>> > >> >>> 0.775 1.34 1000000 100 16 100 4033
>>>>>>> 977043
>>>>>>> > >> >>> rw-r--r-- 1 sokolovm amazon 297 Sep 13 13:26 _0.cfe
>>>>>>> > >> >>> -rw-r--r-- 1 sokolovm amazon 516M Sep 13 13:26 _0.cfs
>>>>>>> > >> >>> -rw-r--r-- 1 sokolovm amazon 340 Sep 13 13:26 _0.si
>>>>>>> > >> >>>
>>>>>>> > >> >>> On Tue, Sep 13, 2022 at 8:50 AM Michael Sokolov <
>>>>>>> msokolov@gmail.com> wrote:
>>>>>>> > >> >>> >
>>>>>>> > >> >>> > I ran another test. I thought I had increased the RAM
>>>>>>> buffer size to
>>>>>>> > >> >>> > 8G and heap to 16G. However I still see two segments in
>>>>>>> the index that
>>>>>>> > >> >>> > was created. And looking at the infostream I see:
>>>>>>> > >> >>> >
>>>>>>> > >> >>> > dir=MMapDirectory@
>>>>>>> /local/home/sokolovm/workspace/knn-perf/glove-100-angular.hdf5-train-200-200.index
>>>>>>> > >> >>> > lockFactory=org\
>>>>>>> > >> >>> > .apache.lucene.store.NativeFSLockFactory@4466af20
>>>>>>> > >> >>> > index=
>>>>>>> > >> >>> > version=9.4.0
>>>>>>> > >> >>> >
>>>>>>> analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
>>>>>>> > >> >>> > ramBufferSizeMB=8000.0
>>>>>>> > >> >>> > maxBufferedDocs=-1
>>>>>>> > >> >>> > ...
>>>>>>> > >> >>> > perThreadHardLimitMB=1945
>>>>>>> > >> >>> > ...
>>>>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:53.329404950Z; main]: flush
>>>>>>> postings as
>>>>>>> > >> >>> > segment _6 numDocs=555373
>>>>>>> > >> >>> > IW 0 [2022-09-13T02:42:53.330671171Z; main]: 0 msec to
>>>>>>> write norms
>>>>>>> > >> >>> > IW 0 [2022-09-13T02:42:53.331113184Z; main]: 0 msec to
>>>>>>> write docValues
>>>>>>> > >> >>> > IW 0 [2022-09-13T02:42:53.331320146Z; main]: 0 msec to
>>>>>>> write points
>>>>>>> > >> >>> > IW 0 [2022-09-13T02:42:56.424195657Z; main]: 3092 msec to
>>>>>>> write vectors
>>>>>>> > >> >>> > IW 0 [2022-09-13T02:42:56.429239944Z; main]: 4 msec to
>>>>>>> finish stored fields
>>>>>>> > >> >>> > IW 0 [2022-09-13T02:42:56.429593512Z; main]: 0 msec to
>>>>>>> write postings
>>>>>>> > >> >>> > and finish vectors
>>>>>>> > >> >>> > IW 0 [2022-09-13T02:42:56.430309031Z; main]: 0 msec to
>>>>>>> write fieldInfos
>>>>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.431721622Z; main]: new
>>>>>>> segment has 0 deleted docs
>>>>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.431921144Z; main]: new
>>>>>>> segment has 0
>>>>>>> > >> >>> > soft-deleted docs
>>>>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.435738086Z; main]: new
>>>>>>> segment has no
>>>>>>> > >> >>> > vectors; no norms; no docValues; no prox; freqs
>>>>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.435952356Z; main]:
>>>>>>> > >> >>> > flushedFiles=[_6_Lucene94HnswVectorsFormat_0.vec, _6.fdm,
>>>>>>> _6.fdt, _6_\
>>>>>>> > >> >>> > Lucene94HnswVectorsFormat_0.vem, _6.fnm, _6.fdx,
>>>>>>> > >> >>> > _6_Lucene94HnswVectorsFormat_0.vex]
>>>>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.436121861Z; main]: flushed
>>>>>>> codec=Lucene94
>>>>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.437691468Z; main]: flushed:
>>>>>>> segment=_6
>>>>>>> > >> >>> > ramUsed=1,945.002 MB newFlushedSize=1,065.701 MB \
>>>>>>> > >> >>> > docs/MB=521.134
>>>>>>> > >> >>> >
>>>>>>> > >> >>> > so I think it's this perThreadHardLimit that is
>>>>>>> triggering the
>>>>>>> > >> >>> > flushes? TBH this isn't something I had seen before; but
>>>>>>> the docs say:
>>>>>>> > >> >>> >
>>>>>>> > >> >>> > /**
>>>>>>> > >> >>> > * Expert: Sets the maximum memory consumption per
>>>>>>> thread triggering
>>>>>>> > >> >>> > a forced flush if exceeded. A
>>>>>>> > >> >>> > * {@link DocumentsWriterPerThread} is forcefully
>>>>>>> flushed once it
>>>>>>> > >> >>> > exceeds this limit even if the
>>>>>>> > >> >>> > * {@link #getRAMBufferSizeMB()} has not been exceeded.
>>>>>>> This is a
>>>>>>> > >> >>> > safety limit to prevent a {@link
>>>>>>> > >> >>> > * DocumentsWriterPerThread} from address space
>>>>>>> exhaustion due to
>>>>>>> > >> >>> > its internal 32 bit signed
>>>>>>> > >> >>> > * integer based memory addressing. The given value
>>>>>>> must be less
>>>>>>> > >> >>> > that 2GB (2048MB)
>>>>>>> > >> >>> > *
>>>>>>> > >> >>> > * @see #DEFAULT_RAM_PER_THREAD_HARD_LIMIT_MB
>>>>>>> > >> >>> > */
>>>>>>> > >> >>> >
>>>>>>> > >> >>> > On Mon, Sep 12, 2022 at 6:28 PM Michael Sokolov <
>>>>>>> msokolov@gmail.com> wrote:
>>>>>>> > >> >>> > >
>>>>>>> > >> >>> > > Hi Mayya, thanks for persisting - I think we need to
>>>>>>> wrestle this to
>>>>>>> > >> >>> > > the ground for sure. In the test I ran, RAM buffer was
>>>>>>> the default
>>>>>>> > >> >>> > > checked in, which is weirdly: 1994MB. I did not
>>>>>>> specifically set heap
>>>>>>> > >> >>> > > size. I used maxConn/M=200. I'll try with larger
>>>>>>> buffer to see if I
>>>>>>> > >> >>> > > can get 9.4 to produce a single segment for the same
>>>>>>> test settings. I
>>>>>>> > >> >>> > > see you used a much smaller M (16), which should have
>>>>>>> produced quite
>>>>>>> > >> >>> > > small graphs, and I agree, should have been a single
>>>>>>> segment. Were you
>>>>>>> > >> >>> > > able to verify the number of segments?
>>>>>>> > >> >>> > >
>>>>>>> > >> >>> > > Agree that decrease in recall is not expected when more
>>>>>>> segments are produced.
>>>>>>> > >> >>> > >
>>>>>>> > >> >>> > > On Mon, Sep 12, 2022 at 1:51 PM Mayya Sharipova
>>>>>>> > >> >>> > > <mayya.sharipova@elastic.co.invalid> wrote:
>>>>>>> > >> >>> > > >
>>>>>>> > >> >>> > > > Hello Michael,
>>>>>>> > >> >>> > > > Thanks for checking.
>>>>>>> > >> >>> > > > Sorry for bringing this up again.
>>>>>>> > >> >>> > > > First of all, I am ok with proceeding with the Lucene
>>>>>>> 9.4 release and leaving the performance investigations for later.
>>>>>>> > >> >>> > > >
>>>>>>> > >> >>> > > > I am interested in what's the maxConn/M value you
>>>>>>> used for your tests? What was the heap memory and the size of the RAM
>>>>>>> buffer for indexing?
>>>>>>> > >> >>> > > > Usually, when we have multiple segments, recall
>>>>>>> should increase, not decrease. But I agree that with multiple segments we
>>>>>>> can see a big drop in QPS.
>>>>>>> > >> >>> > > >
>>>>>>> > >> >>> > > > Here is my investigation with detailed output of the
>>>>>>> performance difference between 9.3 and 9.4 releases. In my tests I used a
>>>>>>> large indexing buffer (2Gb) and large heap (5Gb) to end up with a single
>>>>>>> segment for both 9.3 and 9.4 tests, but still see a big drop in QPS in 9.4.
>>>>>>> > >> >>> > > >
>>>>>>> > >> >>> > > > Thank you.
>>>>>>> > >> >>> > > >
>>>>>>> > >> >>> > > >
>>>>>>> > >> >>> > > >
>>>>>>> > >> >>> > > >
>>>>>>> > >> >>> > > >
>>>>>>> > >> >>> > > > On Fri, Sep 9, 2022 at 12:21 PM Alan Woodward <
>>>>>>> romseygeek@gmail.com> wrote:
>>>>>>> > >> >>> > > >>
>>>>>>> > >> >>> > > >> Done. Thanks!
>>>>>>> > >> >>> > > >>
>>>>>>> > >> >>> > > >> > On 9 Sep 2022, at 16:32, Michael Sokolov <
>>>>>>> msokolov@gmail.com> wrote:
>>>>>>> > >> >>> > > >> >
>>>>>>> > >> >>> > > >> > Hi Alan - I checked out the interval queries
>>>>>>> patch; seems pretty safe,
>>>>>>> > >> >>> > > >> > please go ahead and port to 9.4. Thanks!
>>>>>>> > >> >>> > > >> >
>>>>>>> > >> >>> > > >> > Mike
>>>>>>> > >> >>> > > >> >
>>>>>>> > >> >>> > > >> > On Fri, Sep 9, 2022 at 10:41 AM Alan Woodward <
>>>>>>> romseygeek@gmail.com> wrote:
>>>>>>> > >> >>> > > >> >>
>>>>>>> > >> >>> > > >> >> Hi Mike,
>>>>>>> > >> >>> > > >> >>
>>>>>>> > >> >>> > > >> >> I’ve opened
>>>>>>> https://github.com/apache/lucene/pull/11760 as a small bug fix PR
>>>>>>> for a problem with interval queries. Am I OK to port this to the 9.4
>>>>>>> branch?
>>>>>>> > >> >>> > > >> >>
>>>>>>> > >> >>> > > >> >> Thanks, Alan
>>>>>>> > >> >>> > > >> >>
>>>>>>> > >> >>> > > >> >> On 2 Sep 2022, at 20:42, Michael Sokolov <
>>>>>>> msokolov@gmail.com> wrote:
>>>>>>> > >> >>> > > >> >>
>>>>>>> > >> >>> > > >> >> NOTICE:
>>>>>>> > >> >>> > > >> >>
>>>>>>> > >> >>> > > >> >> Branch branch_9_4 has been cut and versions
>>>>>>> updated to 9.5 on stable branch.
>>>>>>> > >> >>> > > >> >>
>>>>>>> > >> >>> > > >> >> Please observe the normal rules:
>>>>>>> > >> >>> > > >> >>
>>>>>>> > >> >>> > > >> >> * No new features may be committed to the branch.
>>>>>>> > >> >>> > > >> >> * Documentation patches, build patches and
>>>>>>> serious bug fixes may be
>>>>>>> > >> >>> > > >> >> committed to the branch. However, you should
>>>>>>> submit all patches you
>>>>>>> > >> >>> > > >> >> want to commit to Jira first to give others the
>>>>>>> chance to review
>>>>>>> > >> >>> > > >> >> and possibly vote against the patch. Keep in mind
>>>>>>> that it is our
>>>>>>> > >> >>> > > >> >> main intention to keep the branch as stable as
>>>>>>> possible.
>>>>>>> > >> >>> > > >> >> * All patches that are intended for the branch
>>>>>>> should first be committed
>>>>>>> > >> >>> > > >> >> to the unstable branch, merged into the stable
>>>>>>> branch, and then into
>>>>>>> > >> >>> > > >> >> the current release branch.
>>>>>>> > >> >>> > > >> >> * Normal unstable and stable branch development
>>>>>>> may continue as usual.
>>>>>>> > >> >>> > > >> >> However, if you plan to commit a big change to
>>>>>>> the unstable branch
>>>>>>> > >> >>> > > >> >> while the branch feature freeze is in effect,
>>>>>>> think twice: can't the
>>>>>>> > >> >>> > > >> >> addition wait a couple more days? Merges of bug
>>>>>>> fixes into the branch
>>>>>>> > >> >>> > > >> >> may become more difficult.
>>>>>>> > >> >>> > > >> >> * Only Jira issues with Fix version 9.4 and
>>>>>>> priority "Blocker" will delay
>>>>>>> > >> >>> > > >> >> a release candidate build.
>>>>>>> > >> >>> > > >> >>
>>>>>>> > >> >>> > > >> >>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> > >> >>> > > >> >> To unsubscribe, e-mail:
>>>>>>> dev-unsubscribe@lucene.apache.org
>>>>>>> > >> >>> > > >> >> For additional commands, e-mail:
>>>>>>> dev-help@lucene.apache.org
>>>>>>> > >> >>> > > >> >>
>>>>>>> > >> >>> > > >> >>
>>>>>>> > >> >>> > > >> >
>>>>>>> > >> >>> > > >> >
>>>>>>> ---------------------------------------------------------------------
>>>>>>> > >> >>> > > >> > To unsubscribe, e-mail:
>>>>>>> dev-unsubscribe@lucene.apache.org
>>>>>>> > >> >>> > > >> > For additional commands, e-mail:
>>>>>>> dev-help@lucene.apache.org
>>>>>>> > >> >>> > > >> >
>>>>>>> > >> >>> > > >>
>>>>>>> > >> >>> > > >>
>>>>>>> > >> >>> > > >>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> > >> >>> > > >> To unsubscribe, e-mail:
>>>>>>> dev-unsubscribe@lucene.apache.org
>>>>>>> > >> >>> > > >> For additional commands, e-mail:
>>>>>>> dev-help@lucene.apache.org
>>>>>>> > >> >>> > > >>
>>>>>>> > >> >>>
>>>>>>> > >> >>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> > >> >>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>>>>> > >> >>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>>>> > >> >>>
>>>>>>> > >> >
>>>>>>> > >> >
>>>>>>> > >> > --
>>>>>>> > >> > Adrien
>>>>>>> > >>
>>>>>>> > >>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> > >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>>>>> > >> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>>>> > >>
>>>>>>> > >
>>>>>>> > >
>>>>>>> > > --
>>>>>>> > > Adrien
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>>>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>>>>
>>>>>>>
>
> --
> Adrien
>

Re: Subject: New branch and feature freeze for Lucene 9.4.0 [ In reply to ]

jpountz at gmail

Sep 20, 2022, 6:58 AM

Post #31 of 37 (395 views)

Permalink

Both changes are on branch_9_4 now.

On Tue, Sep 20, 2022 at 1:31 PM Michael Sokolov <msokolov@gmail.com> wrote:

> well, I did start, optimistically, but I think I need to re-spin to
> include a fix for this test failure that has been popping up, so I will
> pull these in too.
>
> On Tue, Sep 20, 2022 at 6:24 AM Adrien Grand <jpountz@gmail.com> wrote:
>
>> Hi Mike,
>>
>> If you have not started a RC yet, I'd like to include some small fixes
>> for bugs that were recently introduced in Lucene:
>> - https://github.com/apache/lucene/pull/11792
>> - https://github.com/apache/lucene/pull/11794
>>
>> On Tue, Sep 20, 2022 at 1:26 AM Julie Tibshirani <julietibs@gmail.com>
>> wrote:
>>
>>> Sorry for the confusion. To explain, I use a local ann-benchmarks set-up
>>> that makes use of KnnGraphTester. It is a bit hacky and I accidentally
>>> included the warm-ups in the final timings. So the change to warm-up
>>> explains why we saw different results in our tests. This is great
>>> motivation to solidify and publish my local ann-benchmarks set-up so that
>>> it's not so fragile!
>>>
>>> In summary, with your latest fix the recall and QPS look good to me -- I
>>> don't detect any regression between 9.3 and 9.4.
>>>
>>> Julie
>>>
>>> On Mon, Sep 19, 2022 at 3:45 PM Michael Sokolov <msokolov@gmail.com>
>>> wrote:
>>>
>>>> I'm confused, since warming should not be counted in the timings. Are
>>>> you saying that the recall was affected??
>>>>
>>>> On Mon, Sep 19, 2022 at 6:12 PM Julie Tibshirani <julietibs@gmail.com>
>>>> wrote:
>>>>
>>>>> Using the ann-benchmarks framework, I still saw a similar regression
>>>>> as Mayya between 9.3 and 9.4. I investigated and found it was due to
>>>>> "KnnGraphTester to use KnnVectorQuery" (
>>>>> https://github.com/apache/lucene/pull/796), specifically the change
>>>>> to the warm-up strategy. If I revert it, the results look exactly as
>>>>> expected.
>>>>>
>>>>> I guess we can keep an eye on the nightly benchmarks tomorrow to
>>>>> double-check there's no drop. It would also be nice to formalize the
>>>>> ann-benchmarks set-up and run it regularly (like we've discussed in
>>>>> https://github.com/apache/lucene/issues/10665).
>>>>>
>>>>> Julie
>>>>>
>>>>> On Mon, Sep 19, 2022 at 10:33 AM Michael Sokolov <msokolov@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Thanks for your speedy testing! I am observing comparable latencies
>>>>>> *when the index geometry (ie number of segments)* is unchanged. Agree we
>>>>>> can leave this for a later day. I'll proceed to cut 9.4 artifacts
>>>>>>
>>>>>> On Mon, Sep 19, 2022 at 11:02 AM Mayya Sharipova
>>>>>> <mayya.sharipova@elastic.co.invalid> wrote:
>>>>>>
>>>>>>> It would be great if you all are able to test again with
>>>>>>>> https://github.com/apache/lucene/pull/11781/ applied
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I ran the ann benchmarks with this change, and was happy to confirm
>>>>>>> that in my test recall with this PR is the same as in 9.3 branch, although
>>>>>>> QPS is lower, but we can investigate QPSs later.
>>>>>>>
>>>>>>> glove-100-angular M:16 efConstruction:100
>>>>>>> 9.3 recall9.3 QPSthis PR recallthis PR QPS
>>>>>>> n_cands=10 0.620 2745.933 0.620 1675.500
>>>>>>> n_cands=20 0.680 2288.665 0.680 1512.744
>>>>>>> n_cands=40 0.746 1770.243 0.746 1040.240
>>>>>>> n_cands=80 0.809 1226.738 0.809 695.236
>>>>>>> n_cands=120 0.843 948.908 0.843 525.914
>>>>>>> n_cands=200 0.878 671.781 0.878 351.529
>>>>>>> n_cands=400 0.918 392.265 0.918 207.854
>>>>>>> n_cands=600 0.937 282.403 0.937 144.311
>>>>>>> n_cands=800 0.949 214.620 0.949 116.875
>>>>>>>
>>>>>>> On Sun, Sep 18, 2022 at 6:25 PM Michael Sokolov <msokolov@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> OK, I think I was wrong about latency having increased due to a
>>>>>>>> change
>>>>>>>> in KnnGraphTester -- I did some testing there and couldn't
>>>>>>>> reproduce.
>>>>>>>> There does seem to be a slight vector search latency increase,
>>>>>>>> possibly noise, but maybe due to the branching introduced to check
>>>>>>>> whether to do byte vs float operations? It would be a little
>>>>>>>> surprising if that were the case given the small number of
>>>>>>>> branchings
>>>>>>>> compared to the number of multiplies in dot-product though.
>>>>>>>>
>>>>>>>> On Sun, Sep 18, 2022 at 3:25 PM Michael Sokolov <msokolov@gmail.com>
>>>>>>>> wrote:
>>>>>>>> >
>>>>>>>> > Thanks for the deep-dive Julie. I was able to reproduce the
>>>>>>>> changing
>>>>>>>> > recall. I had introduced some bugs in the diversity checks (that
>>>>>>>> may
>>>>>>>> > have partially canceled each other out? it's hard to understand
>>>>>>>> what
>>>>>>>> > was happening in the buggy case) and posted a fix today
>>>>>>>> > https://github.com/apache/lucene/pull/11781.
>>>>>>>> >
>>>>>>>> > There are a couple of other outstanding issues I found while
>>>>>>>> doing a
>>>>>>>> > bunch of git bisecting;
>>>>>>>> >
>>>>>>>> > I think we might have introduced a (test-only) performance
>>>>>>>> regression
>>>>>>>> > in KnnGraphTester
>>>>>>>> >
>>>>>>>> > We may still be over-allocating the size of NeighborArray,
>>>>>>>> leading to
>>>>>>>> > excessive segmentation? I wonder if we could avoid dynamic
>>>>>>>> > re-allocation there, and simply initialize every neighbor array to
>>>>>>>> > 2*M+1.
>>>>>>>> >
>>>>>>>> > While I don't think these are necessarily blockers, given that we
>>>>>>>> are
>>>>>>>> > releasing HNSW improvements, it seems like we should address
>>>>>>>> these,
>>>>>>>> > especially as the build-graph-on-index is one of the things we are
>>>>>>>> > releasing, and it is (may be?) impacted. I will see if I can put
>>>>>>>> up a
>>>>>>>> > patch or two.
>>>>>>>> >
>>>>>>>> > It would be great if you all are able to test again with
>>>>>>>> > https://github.com/apache/lucene/pull/11781/ applied
>>>>>>>> >
>>>>>>>> > -Mike
>>>>>>>> >
>>>>>>>> > On Fri, Sep 16, 2022 at 11:07 AM Adrien Grand <jpountz@gmail.com>
>>>>>>>> wrote:
>>>>>>>> > >
>>>>>>>> > > Thank you Mike, I just backported the change.
>>>>>>>> > >
>>>>>>>> > > On Thu, Sep 15, 2022 at 6:32 PM Michael Sokolov <
>>>>>>>> msokolov@gmail.com> wrote:
>>>>>>>> > >>
>>>>>>>> > >> it looks like a small bug fix, we have had on main (and 9.x?)
>>>>>>>> for a
>>>>>>>> > >> while now and no test failures showed up, I guess. Should be
>>>>>>>> OK to
>>>>>>>> > >> port. I plan to cut artifacts this weekend, or Monday at the
>>>>>>>> latest,
>>>>>>>> > >> but if you can do the backport today or tomorrow, that's fine
>>>>>>>> by me.
>>>>>>>> > >>
>>>>>>>> > >> On Thu, Sep 15, 2022 at 10:55 AM Adrien Grand <
>>>>>>>> jpountz@gmail.com> wrote:
>>>>>>>> > >> >
>>>>>>>> > >> > Mike, I'm tempted to backport
>>>>>>>> https://github.com/apache/lucene/pull/1068 to branch_9_4, which is
>>>>>>>> a bugfix that looks pretty safe to me. What do you think?
>>>>>>>> > >> >
>>>>>>>> > >> > On Tue, Sep 13, 2022 at 4:11 PM Mayya Sharipova <
>>>>>>>> mayya.sharipova@elastic.co.invalid> wrote:
>>>>>>>> > >> >>
>>>>>>>> > >> >> Thanks for running more tests, Michael.
>>>>>>>> > >> >> It is encouraging that you saw a similar performance
>>>>>>>> between 9.3 and 9.4. I will also run more tests with different parameters.
>>>>>>>> > >> >>
>>>>>>>> > >> >> On Tue, Sep 13, 2022 at 9:30 AM Michael Sokolov <
>>>>>>>> msokolov@gmail.com> wrote:
>>>>>>>> > >> >>>
>>>>>>>> > >> >>> As a follow-up, I ran a test using the same parameters as
>>>>>>>> above, only
>>>>>>>> > >> >>> changing M=200 to M=16. This did result in a single
>>>>>>>> segment in both
>>>>>>>> > >> >>> cases (9.3, 9.4) and the performance was pretty similar;
>>>>>>>> within noise
>>>>>>>> > >> >>> I think. The main difference I saw was that the 9.3 index
>>>>>>>> was written
>>>>>>>> > >> >>> using CFS:
>>>>>>>> > >> >>>
>>>>>>>> > >> >>> 9.4:
>>>>>>>> > >> >>> recall latency nDoc fanout maxConn beamWidth
>>>>>>>> visited index ms
>>>>>>>> > >> >>> 0.755 1.36 1000000 100 16 100 200
>>>>>>>> 891402 1.00
>>>>>>>> > >> >>> post-filter
>>>>>>>> > >> >>> -rw-r--r-- 1 sokolovm amazon 382M Sep 13 13:06
>>>>>>>> > >> >>> _0_Lucene94HnswVectorsFormat_0.vec
>>>>>>>> > >> >>> -rw-r--r-- 1 sokolovm amazon 262K Sep 13 13:06
>>>>>>>> > >> >>> _0_Lucene94HnswVectorsFormat_0.vem
>>>>>>>> > >> >>> -rw-r--r-- 1 sokolovm amazon 131M Sep 13 13:06
>>>>>>>> > >> >>> _0_Lucene94HnswVectorsFormat_0.vex
>>>>>>>> > >> >>>
>>>>>>>> > >> >>> 9.3:
>>>>>>>> > >> >>> recall latency nDoc fanout maxConn beamWidth
>>>>>>>> visited index ms
>>>>>>>> > >> >>> 0.775 1.34 1000000 100 16 100 4033
>>>>>>>> 977043
>>>>>>>> > >> >>> rw-r--r-- 1 sokolovm amazon 297 Sep 13 13:26 _0.cfe
>>>>>>>> > >> >>> -rw-r--r-- 1 sokolovm amazon 516M Sep 13 13:26 _0.cfs
>>>>>>>> > >> >>> -rw-r--r-- 1 sokolovm amazon 340 Sep 13 13:26 _0.si
>>>>>>>> > >> >>>
>>>>>>>> > >> >>> On Tue, Sep 13, 2022 at 8:50 AM Michael Sokolov <
>>>>>>>> msokolov@gmail.com> wrote:
>>>>>>>> > >> >>> >
>>>>>>>> > >> >>> > I ran another test. I thought I had increased the RAM
>>>>>>>> buffer size to
>>>>>>>> > >> >>> > 8G and heap to 16G. However I still see two segments in
>>>>>>>> the index that
>>>>>>>> > >> >>> > was created. And looking at the infostream I see:
>>>>>>>> > >> >>> >
>>>>>>>> > >> >>> > dir=MMapDirectory@
>>>>>>>> /local/home/sokolovm/workspace/knn-perf/glove-100-angular.hdf5-train-200-200.index
>>>>>>>> > >> >>> > lockFactory=org\
>>>>>>>> > >> >>> > .apache.lucene.store.NativeFSLockFactory@4466af20
>>>>>>>> > >> >>> > index=
>>>>>>>> > >> >>> > version=9.4.0
>>>>>>>> > >> >>> >
>>>>>>>> analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
>>>>>>>> > >> >>> > ramBufferSizeMB=8000.0
>>>>>>>> > >> >>> > maxBufferedDocs=-1
>>>>>>>> > >> >>> > ...
>>>>>>>> > >> >>> > perThreadHardLimitMB=1945
>>>>>>>> > >> >>> > ...
>>>>>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:53.329404950Z; main]: flush
>>>>>>>> postings as
>>>>>>>> > >> >>> > segment _6 numDocs=555373
>>>>>>>> > >> >>> > IW 0 [2022-09-13T02:42:53.330671171Z; main]: 0 msec to
>>>>>>>> write norms
>>>>>>>> > >> >>> > IW 0 [2022-09-13T02:42:53.331113184Z; main]: 0 msec to
>>>>>>>> write docValues
>>>>>>>> > >> >>> > IW 0 [2022-09-13T02:42:53.331320146Z; main]: 0 msec to
>>>>>>>> write points
>>>>>>>> > >> >>> > IW 0 [2022-09-13T02:42:56.424195657Z; main]: 3092 msec
>>>>>>>> to write vectors
>>>>>>>> > >> >>> > IW 0 [2022-09-13T02:42:56.429239944Z; main]: 4 msec to
>>>>>>>> finish stored fields
>>>>>>>> > >> >>> > IW 0 [2022-09-13T02:42:56.429593512Z; main]: 0 msec to
>>>>>>>> write postings
>>>>>>>> > >> >>> > and finish vectors
>>>>>>>> > >> >>> > IW 0 [2022-09-13T02:42:56.430309031Z; main]: 0 msec to
>>>>>>>> write fieldInfos
>>>>>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.431721622Z; main]: new
>>>>>>>> segment has 0 deleted docs
>>>>>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.431921144Z; main]: new
>>>>>>>> segment has 0
>>>>>>>> > >> >>> > soft-deleted docs
>>>>>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.435738086Z; main]: new
>>>>>>>> segment has no
>>>>>>>> > >> >>> > vectors; no norms; no docValues; no prox; freqs
>>>>>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.435952356Z; main]:
>>>>>>>> > >> >>> > flushedFiles=[_6_Lucene94HnswVectorsFormat_0.vec,
>>>>>>>> _6.fdm, _6.fdt, _6_\
>>>>>>>> > >> >>> > Lucene94HnswVectorsFormat_0.vem, _6.fnm, _6.fdx,
>>>>>>>> > >> >>> > _6_Lucene94HnswVectorsFormat_0.vex]
>>>>>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.436121861Z; main]: flushed
>>>>>>>> codec=Lucene94
>>>>>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.437691468Z; main]: flushed:
>>>>>>>> segment=_6
>>>>>>>> > >> >>> > ramUsed=1,945.002 MB newFlushedSize=1,065.701 MB \
>>>>>>>> > >> >>> > docs/MB=521.134
>>>>>>>> > >> >>> >
>>>>>>>> > >> >>> > so I think it's this perThreadHardLimit that is
>>>>>>>> triggering the
>>>>>>>> > >> >>> > flushes? TBH this isn't something I had seen before; but
>>>>>>>> the docs say:
>>>>>>>> > >> >>> >
>>>>>>>> > >> >>> > /**
>>>>>>>> > >> >>> > * Expert: Sets the maximum memory consumption per
>>>>>>>> thread triggering
>>>>>>>> > >> >>> > a forced flush if exceeded. A
>>>>>>>> > >> >>> > * {@link DocumentsWriterPerThread} is forcefully
>>>>>>>> flushed once it
>>>>>>>> > >> >>> > exceeds this limit even if the
>>>>>>>> > >> >>> > * {@link #getRAMBufferSizeMB()} has not been
>>>>>>>> exceeded. This is a
>>>>>>>> > >> >>> > safety limit to prevent a {@link
>>>>>>>> > >> >>> > * DocumentsWriterPerThread} from address space
>>>>>>>> exhaustion due to
>>>>>>>> > >> >>> > its internal 32 bit signed
>>>>>>>> > >> >>> > * integer based memory addressing. The given value
>>>>>>>> must be less
>>>>>>>> > >> >>> > that 2GB (2048MB)
>>>>>>>> > >> >>> > *
>>>>>>>> > >> >>> > * @see #DEFAULT_RAM_PER_THREAD_HARD_LIMIT_MB
>>>>>>>> > >> >>> > */
>>>>>>>> > >> >>> >
>>>>>>>> > >> >>> > On Mon, Sep 12, 2022 at 6:28 PM Michael Sokolov <
>>>>>>>> msokolov@gmail.com> wrote:
>>>>>>>> > >> >>> > >
>>>>>>>> > >> >>> > > Hi Mayya, thanks for persisting - I think we need to
>>>>>>>> wrestle this to
>>>>>>>> > >> >>> > > the ground for sure. In the test I ran, RAM buffer was
>>>>>>>> the default
>>>>>>>> > >> >>> > > checked in, which is weirdly: 1994MB. I did not
>>>>>>>> specifically set heap
>>>>>>>> > >> >>> > > size. I used maxConn/M=200. I'll try with larger
>>>>>>>> buffer to see if I
>>>>>>>> > >> >>> > > can get 9.4 to produce a single segment for the same
>>>>>>>> test settings. I
>>>>>>>> > >> >>> > > see you used a much smaller M (16), which should have
>>>>>>>> produced quite
>>>>>>>> > >> >>> > > small graphs, and I agree, should have been a single
>>>>>>>> segment. Were you
>>>>>>>> > >> >>> > > able to verify the number of segments?
>>>>>>>> > >> >>> > >
>>>>>>>> > >> >>> > > Agree that decrease in recall is not expected when
>>>>>>>> more segments are produced.
>>>>>>>> > >> >>> > >
>>>>>>>> > >> >>> > > On Mon, Sep 12, 2022 at 1:51 PM Mayya Sharipova
>>>>>>>> > >> >>> > > <mayya.sharipova@elastic.co.invalid> wrote:
>>>>>>>> > >> >>> > > >
>>>>>>>> > >> >>> > > > Hello Michael,
>>>>>>>> > >> >>> > > > Thanks for checking.
>>>>>>>> > >> >>> > > > Sorry for bringing this up again.
>>>>>>>> > >> >>> > > > First of all, I am ok with proceeding with the
>>>>>>>> Lucene 9.4 release and leaving the performance investigations for later.
>>>>>>>> > >> >>> > > >
>>>>>>>> > >> >>> > > > I am interested in what's the maxConn/M value you
>>>>>>>> used for your tests? What was the heap memory and the size of the RAM
>>>>>>>> buffer for indexing?
>>>>>>>> > >> >>> > > > Usually, when we have multiple segments, recall
>>>>>>>> should increase, not decrease. But I agree that with multiple segments we
>>>>>>>> can see a big drop in QPS.
>>>>>>>> > >> >>> > > >
>>>>>>>> > >> >>> > > > Here is my investigation with detailed output of the
>>>>>>>> performance difference between 9.3 and 9.4 releases. In my tests I used a
>>>>>>>> large indexing buffer (2Gb) and large heap (5Gb) to end up with a single
>>>>>>>> segment for both 9.3 and 9.4 tests, but still see a big drop in QPS in 9.4.
>>>>>>>> > >> >>> > > >
>>>>>>>> > >> >>> > > > Thank you.
>>>>>>>> > >> >>> > > >
>>>>>>>> > >> >>> > > >
>>>>>>>> > >> >>> > > >
>>>>>>>> > >> >>> > > >
>>>>>>>> > >> >>> > > >
>>>>>>>> > >> >>> > > > On Fri, Sep 9, 2022 at 12:21 PM Alan Woodward <
>>>>>>>> romseygeek@gmail.com> wrote:
>>>>>>>> > >> >>> > > >>
>>>>>>>> > >> >>> > > >> Done. Thanks!
>>>>>>>> > >> >>> > > >>
>>>>>>>> > >> >>> > > >> > On 9 Sep 2022, at 16:32, Michael Sokolov <
>>>>>>>> msokolov@gmail.com> wrote:
>>>>>>>> > >> >>> > > >> >
>>>>>>>> > >> >>> > > >> > Hi Alan - I checked out the interval queries
>>>>>>>> patch; seems pretty safe,
>>>>>>>> > >> >>> > > >> > please go ahead and port to 9.4. Thanks!
>>>>>>>> > >> >>> > > >> >
>>>>>>>> > >> >>> > > >> > Mike
>>>>>>>> > >> >>> > > >> >
>>>>>>>> > >> >>> > > >> > On Fri, Sep 9, 2022 at 10:41 AM Alan Woodward <
>>>>>>>> romseygeek@gmail.com> wrote:
>>>>>>>> > >> >>> > > >> >>
>>>>>>>> > >> >>> > > >> >> Hi Mike,
>>>>>>>> > >> >>> > > >> >>
>>>>>>>> > >> >>> > > >> >> I’ve opened
>>>>>>>> https://github.com/apache/lucene/pull/11760 as a small bug fix PR
>>>>>>>> for a problem with interval queries. Am I OK to port this to the 9.4
>>>>>>>> branch?
>>>>>>>> > >> >>> > > >> >>
>>>>>>>> > >> >>> > > >> >> Thanks, Alan
>>>>>>>> > >> >>> > > >> >>
>>>>>>>> > >> >>> > > >> >> On 2 Sep 2022, at 20:42, Michael Sokolov <
>>>>>>>> msokolov@gmail.com> wrote:
>>>>>>>> > >> >>> > > >> >>
>>>>>>>> > >> >>> > > >> >> NOTICE:
>>>>>>>> > >> >>> > > >> >>
>>>>>>>> > >> >>> > > >> >> Branch branch_9_4 has been cut and versions
>>>>>>>> updated to 9.5 on stable branch.
>>>>>>>> > >> >>> > > >> >>
>>>>>>>> > >> >>> > > >> >> Please observe the normal rules:
>>>>>>>> > >> >>> > > >> >>
>>>>>>>> > >> >>> > > >> >> * No new features may be committed to the branch.
>>>>>>>> > >> >>> > > >> >> * Documentation patches, build patches and
>>>>>>>> serious bug fixes may be
>>>>>>>> > >> >>> > > >> >> committed to the branch. However, you should
>>>>>>>> submit all patches you
>>>>>>>> > >> >>> > > >> >> want to commit to Jira first to give others the
>>>>>>>> chance to review
>>>>>>>> > >> >>> > > >> >> and possibly vote against the patch. Keep in
>>>>>>>> mind that it is our
>>>>>>>> > >> >>> > > >> >> main intention to keep the branch as stable as
>>>>>>>> possible.
>>>>>>>> > >> >>> > > >> >> * All patches that are intended for the branch
>>>>>>>> should first be committed
>>>>>>>> > >> >>> > > >> >> to the unstable branch, merged into the stable
>>>>>>>> branch, and then into
>>>>>>>> > >> >>> > > >> >> the current release branch.
>>>>>>>> > >> >>> > > >> >> * Normal unstable and stable branch development
>>>>>>>> may continue as usual.
>>>>>>>> > >> >>> > > >> >> However, if you plan to commit a big change to
>>>>>>>> the unstable branch
>>>>>>>> > >> >>> > > >> >> while the branch feature freeze is in effect,
>>>>>>>> think twice: can't the
>>>>>>>> > >> >>> > > >> >> addition wait a couple more days? Merges of bug
>>>>>>>> fixes into the branch
>>>>>>>> > >> >>> > > >> >> may become more difficult.
>>>>>>>> > >> >>> > > >> >> * Only Jira issues with Fix version 9.4 and
>>>>>>>> priority "Blocker" will delay
>>>>>>>> > >> >>> > > >> >> a release candidate build.
>>>>>>>> > >> >>> > > >> >>
>>>>>>>> > >> >>> > > >> >>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> > >> >>> > > >> >> To unsubscribe, e-mail:
>>>>>>>> dev-unsubscribe@lucene.apache.org
>>>>>>>> > >> >>> > > >> >> For additional commands, e-mail:
>>>>>>>> dev-help@lucene.apache.org
>>>>>>>> > >> >>> > > >> >>
>>>>>>>> > >> >>> > > >> >>
>>>>>>>> > >> >>> > > >> >
>>>>>>>> > >> >>> > > >> >
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> > >> >>> > > >> > To unsubscribe, e-mail:
>>>>>>>> dev-unsubscribe@lucene.apache.org
>>>>>>>> > >> >>> > > >> > For additional commands, e-mail:
>>>>>>>> dev-help@lucene.apache.org
>>>>>>>> > >> >>> > > >> >
>>>>>>>> > >> >>> > > >>
>>>>>>>> > >> >>> > > >>
>>>>>>>> > >> >>> > > >>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> > >> >>> > > >> To unsubscribe, e-mail:
>>>>>>>> dev-unsubscribe@lucene.apache.org
>>>>>>>> > >> >>> > > >> For additional commands, e-mail:
>>>>>>>> dev-help@lucene.apache.org
>>>>>>>> > >> >>> > > >>
>>>>>>>> > >> >>>
>>>>>>>> > >> >>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> > >> >>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>>>>>> > >> >>> For additional commands, e-mail:
>>>>>>>> dev-help@lucene.apache.org
>>>>>>>> > >> >>>
>>>>>>>> > >> >
>>>>>>>> > >> >
>>>>>>>> > >> > --
>>>>>>>> > >> > Adrien
>>>>>>>> > >>
>>>>>>>> > >>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> > >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>>>>>> > >> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>>>>> > >>
>>>>>>>> > >
>>>>>>>> > >
>>>>>>>> > > --
>>>>>>>> > > Adrien
>>>>>>>>
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>>>>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>>>>>
>>>>>>>>
>>
>> --
>> Adrien
>>
>

--
Adrien

Re: Subject: New branch and feature freeze for Lucene 9.4.0 [ In reply to ]

uwe at thetaphi

Sep 21, 2022, 1:32 AM

Post #32 of 37 (395 views)

Permalink

Hi,

JDK 19 was released yesterday and I am still waiting for AdoptOpenJDK to
publish Gradle Toolchain compatible releases to be available. To me the
schedule is a bit bad: Just on the day of the possibility to add
(optional) support for JDK 19 Panama powered MMAP, we started the release.

I would really like to have a JDK 19 release that adds support for JDK
19 preview (it can only be tested for exactly the JDK 19 series, won't
work with JDK 20), so we should have a release including the new code
between now and March (ideally before XMAS).

Options we have

* Let the 9.4.0 release go out now and add a 9.5.0 in a month or so (I
would be release manager). I do not want to add a bugfix release as
theres a small API change (MMAPDirectory ctor changes from int ->
long parameter for chunk size). We can name this release as the
"Java 19 release for early adopters"
* Wait a few days and respin the release after
https://github.com/apache/lucene/pull/912 went in? The code is
thoroughly tested by Policeman Jenkins since several months, just
the compilation does not work out of box until theres a Temurin
build of OpenJDK 19.

I just repeat, the above PR does not change any productive code, so it
should be bug-free. It only adds a few class files which are only used
when you pass "--enable-preview" to your JDK. This makes it easy for
users to check Solr or Elasticsearch with JDK 19. No risk, it only
activates when you enable it.

Thoughts?

Uwe

Am 02.09.2022 um 21:42 schrieb Michael Sokolov:
> NOTICE:
>
> Branch branch_9_4 has been cut and versions updated to 9.5 on stable branch.
>
> Please observe the normal rules:
>
> * No new features may be committed to the branch.
> * Documentation patches, build patches and serious bug fixes may be
> committed to the branch. However, you should submit all patches you
> want to commit to Jira first to give others the chance to review
> and possibly vote against the patch. Keep in mind that it is our
> main intention to keep the branch as stable as possible.
> * All patches that are intended for the branch should first be committed
> to the unstable branch, merged into the stable branch, and then into
> the current release branch.
> * Normal unstable and stable branch development may continue as usual.
> However, if you plan to commit a big change to the unstable branch
> while the branch feature freeze is in effect, think twice: can't the
> addition wait a couple more days? Merges of bug fixes into the branch
> may become more difficult.
> * Only Jira issues with Fix version 9.4 and priority "Blocker" will delay
> a release candidate build.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail:dev-help@lucene.apache.org
>
--
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail:uwe@thetaphi.de

Re: Subject: New branch and feature freeze for Lucene 9.4.0 [ In reply to ]

msokolov at gmail

Sep 21, 2022, 5:05 AM

Post #33 of 37 (395 views)

Permalink

I see; I would kind of like to get the release out before ApacheCon
NA, which starts Oct 3. Do you think it's likely AdoptOpenJDK will
release its JDK19 in the next week (say by Sep 26)?

On Wed, Sep 21, 2022 at 4:32 AM Uwe Schindler <uwe@thetaphi.de> wrote:
>
> Hi,
>
> JDK 19 was released yesterday and I am still waiting for AdoptOpenJDK to publish Gradle Toolchain compatible releases to be available. To me the schedule is a bit bad: Just on the day of the possibility to add (optional) support for JDK 19 Panama powered MMAP, we started the release.
>
> I would really like to have a JDK 19 release that adds support for JDK 19 preview (it can only be tested for exactly the JDK 19 series, won't work with JDK 20), so we should have a release including the new code between now and March (ideally before XMAS).
>
> Options we have
>
> Let the 9.4.0 release go out now and add a 9.5.0 in a month or so (I would be release manager). I do not want to add a bugfix release as theres a small API change (MMAPDirectory ctor changes from int -> long parameter for chunk size). We can name this release as the "Java 19 release for early adopters"
> Wait a few days and respin the release after https://github.com/apache/lucene/pull/912 went in? The code is thoroughly tested by Policeman Jenkins since several months, just the compilation does not work out of box until theres a Temurin build of OpenJDK 19.
>
> I just repeat, the above PR does not change any productive code, so it should be bug-free. It only adds a few class files which are only used when you pass "--enable-preview" to your JDK. This makes it easy for users to check Solr or Elasticsearch with JDK 19. No risk, it only activates when you enable it.
>
> Thoughts?
>
> Uwe
>
> Am 02.09.2022 um 21:42 schrieb Michael Sokolov:
>
> NOTICE:
>
> Branch branch_9_4 has been cut and versions updated to 9.5 on stable branch.
>
> Please observe the normal rules:
>
> * No new features may be committed to the branch.
> * Documentation patches, build patches and serious bug fixes may be
> committed to the branch. However, you should submit all patches you
> want to commit to Jira first to give others the chance to review
> and possibly vote against the patch. Keep in mind that it is our
> main intention to keep the branch as stable as possible.
> * All patches that are intended for the branch should first be committed
> to the unstable branch, merged into the stable branch, and then into
> the current release branch.
> * Normal unstable and stable branch development may continue as usual.
> However, if you plan to commit a big change to the unstable branch
> while the branch feature freeze is in effect, think twice: can't the
> addition wait a couple more days? Merges of bug fixes into the branch
> may become more difficult.
> * Only Jira issues with Fix version 9.4 and priority "Blocker" will delay
> a release candidate build.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
> --
> Uwe Schindler
> Achterdiek 19, D-28357 Bremen
> https://www.thetaphi.de
> eMail: uwe@thetaphi.de

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Subject: New branch and feature freeze for Lucene 9.4.0 [ In reply to ]

uwe at thetaphi

Sep 21, 2022, 5:31 AM

Post #34 of 37 (395 views)

Permalink

Hi,

I will check later today how long it took last time in March. I would
have expected that they just need to wait until the builds and tests are
done so it gets released.

I don't want to hold up the release. The vote is still ongoning, so we
have all options.

Uwe

Am 21.09.2022 um 14:05 schrieb Michael Sokolov:
> I see; I would kind of like to get the release out before ApacheCon
> NA, which starts Oct 3. Do you think it's likely AdoptOpenJDK will
> release its JDK19 in the next week (say by Sep 26)?
>
> On Wed, Sep 21, 2022 at 4:32 AM Uwe Schindler <uwe@thetaphi.de> wrote:
>> Hi,
>>
>> JDK 19 was released yesterday and I am still waiting for AdoptOpenJDK to publish Gradle Toolchain compatible releases to be available. To me the schedule is a bit bad: Just on the day of the possibility to add (optional) support for JDK 19 Panama powered MMAP, we started the release.
>>
>> I would really like to have a JDK 19 release that adds support for JDK 19 preview (it can only be tested for exactly the JDK 19 series, won't work with JDK 20), so we should have a release including the new code between now and March (ideally before XMAS).
>>
>> Options we have
>>
>> Let the 9.4.0 release go out now and add a 9.5.0 in a month or so (I would be release manager). I do not want to add a bugfix release as theres a small API change (MMAPDirectory ctor changes from int -> long parameter for chunk size). We can name this release as the "Java 19 release for early adopters"
>> Wait a few days and respin the release after https://github.com/apache/lucene/pull/912 went in? The code is thoroughly tested by Policeman Jenkins since several months, just the compilation does not work out of box until theres a Temurin build of OpenJDK 19.
>>
>> I just repeat, the above PR does not change any productive code, so it should be bug-free. It only adds a few class files which are only used when you pass "--enable-preview" to your JDK. This makes it easy for users to check Solr or Elasticsearch with JDK 19. No risk, it only activates when you enable it.
>>
>> Thoughts?
>>
>> Uwe
>>
>> Am 02.09.2022 um 21:42 schrieb Michael Sokolov:
>>
>> NOTICE:
>>
>> Branch branch_9_4 has been cut and versions updated to 9.5 on stable branch.
>>
>> Please observe the normal rules:
>>
>> * No new features may be committed to the branch.
>> * Documentation patches, build patches and serious bug fixes may be
>> committed to the branch. However, you should submit all patches you
>> want to commit to Jira first to give others the chance to review
>> and possibly vote against the patch. Keep in mind that it is our
>> main intention to keep the branch as stable as possible.
>> * All patches that are intended for the branch should first be committed
>> to the unstable branch, merged into the stable branch, and then into
>> the current release branch.
>> * Normal unstable and stable branch development may continue as usual.
>> However, if you plan to commit a big change to the unstable branch
>> while the branch feature freeze is in effect, think twice: can't the
>> addition wait a couple more days? Merges of bug fixes into the branch
>> may become more difficult.
>> * Only Jira issues with Fix version 9.4 and priority "Blocker" will delay
>> a release candidate build.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>> --
>> Uwe Schindler
>> Achterdiek 19, D-28357 Bremen
>> https://www.thetaphi.de
>> eMail: uwe@thetaphi.de
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
--
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail: uwe@thetaphi.de

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Subject: New branch and feature freeze for Lucene 9.4.0 [ In reply to ]

uwe at thetaphi

Sep 21, 2022, 10:34 AM

Post #35 of 37 (395 views)

Permalink

FYI, here (https://github.com/adoptium/adoptium/issues/171) Eclipse says:

* Add website banner (automate* via github workflow in website
repository) - Announce that we target releases to be available within
48-72 hours of the GA tags being available

Uwe

Am 21.09.2022 um 14:31 schrieb Uwe Schindler:
> Hi,
>
> I will check later today how long it took last time in March. I would
> have expected that they just need to wait until the builds and tests
> are done so it gets released.
>
> I don't want to hold up the release. The vote is still ongoning, so we
> have all options.
>
> Uwe
>
> Am 21.09.2022 um 14:05 schrieb Michael Sokolov:
>> I see; I would kind of like to get the release out before ApacheCon
>> NA, which starts Oct 3. Do you think it's likely AdoptOpenJDK will
>> release its JDK19 in the next week (say by Sep 26)?
>>
>> On Wed, Sep 21, 2022 at 4:32 AM Uwe Schindler <uwe@thetaphi.de> wrote:
>>> Hi,
>>>
>>> JDK 19 was released yesterday and I am still waiting for
>>> AdoptOpenJDK to publish Gradle Toolchain compatible releases to be
>>> available. To me the schedule is a bit bad: Just on the day of the
>>> possibility to add (optional) support for JDK 19 Panama powered
>>> MMAP, we started the release.
>>>
>>> I would really like to have a JDK 19 release that adds support for
>>> JDK 19 preview (it can only be tested for exactly the JDK 19 series,
>>> won't work with JDK 20), so we should have a release including the
>>> new code between now and March (ideally before XMAS).
>>>
>>> Options we have
>>>
>>> Let the 9.4.0 release go out now and add a 9.5.0 in a month or so (I
>>> would be release manager). I do not want to add a bugfix release as
>>> theres a small API change (MMAPDirectory ctor changes from int ->
>>> long parameter for chunk size). We can name this release as the
>>> "Java 19 release for early adopters"
>>> Wait a few days and respin the release after
>>> https://github.com/apache/lucene/pull/912 went in? The code is
>>> thoroughly tested by Policeman Jenkins since several months, just
>>> the compilation does not work out of box until theres a Temurin
>>> build of OpenJDK 19.
>>>
>>> I just repeat, the above PR does not change any productive code, so
>>> it should be bug-free. It only adds a few class files which are only
>>> used when you pass "--enable-preview" to your JDK. This makes it
>>> easy for users to check Solr or Elasticsearch with JDK 19. No risk,
>>> it only activates when you enable it.
>>>
>>> Thoughts?
>>>
>>> Uwe
>>>
>>> Am 02.09.2022 um 21:42 schrieb Michael Sokolov:
>>>
>>> NOTICE:
>>>
>>> Branch branch_9_4 has been cut and versions updated to 9.5 on stable
>>> branch.
>>>
>>> Please observe the normal rules:
>>>
>>> * No new features may be committed to the branch.
>>> * Documentation patches, build patches and serious bug fixes may be
>>>    committed to the branch. However, you should submit all patches you
>>>    want to commit to Jira first to give others the chance to review
>>>    and possibly vote against the patch. Keep in mind that it is our
>>>    main intention to keep the branch as stable as possible.
>>> * All patches that are intended for the branch should first be
>>> committed
>>>    to the unstable branch, merged into the stable branch, and then into
>>>    the current release branch.
>>> * Normal unstable and stable branch development may continue as usual.
>>>    However, if you plan to commit a big change to the unstable branch
>>>    while the branch feature freeze is in effect, think twice: can't the
>>>    addition wait a couple more days? Merges of bug fixes into the
>>> branch
>>>    may become more difficult.
>>> * Only Jira issues with Fix version 9.4 and priority "Blocker" will
>>> delay
>>>    a release candidate build.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>
>>> --
>>> Uwe Schindler
>>> Achterdiek 19, D-28357 Bremen
>>> https://www.thetaphi.de
>>> eMail: uwe@thetaphi.de
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
--
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail: uwe@thetaphi.de

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Subject: New branch and feature freeze for Lucene 9.4.0 [ In reply to ]

msokolov at gmail

Sep 21, 2022, 11:44 AM

Post #36 of 37 (395 views)

Permalink

OK, how does this sound: if there is a (JDK19 AdoptOpenJDK) release
this week as it seems there should be, and you are able to fast-follow
with the Lucene changes to use it then I can re-spin RC2 on Monday or
Tuesday.

On Wed, Sep 21, 2022 at 1:35 PM Uwe Schindler <uwe@thetaphi.de> wrote:
>
> FYI, here (https://github.com/adoptium/adoptium/issues/171) Eclipse says:
>
> * Add website banner (automate* via github workflow in website
> repository) - Announce that we target releases to be available within
> 48-72 hours of the GA tags being available
>
> Uwe
>
> Am 21.09.2022 um 14:31 schrieb Uwe Schindler:
> > Hi,
> >
> > I will check later today how long it took last time in March. I would
> > have expected that they just need to wait until the builds and tests
> > are done so it gets released.
> >
> > I don't want to hold up the release. The vote is still ongoning, so we
> > have all options.
> >
> > Uwe
> >
> > Am 21.09.2022 um 14:05 schrieb Michael Sokolov:
> >> I see; I would kind of like to get the release out before ApacheCon
> >> NA, which starts Oct 3. Do you think it's likely AdoptOpenJDK will
> >> release its JDK19 in the next week (say by Sep 26)?
> >>
> >> On Wed, Sep 21, 2022 at 4:32 AM Uwe Schindler <uwe@thetaphi.de> wrote:
> >>> Hi,
> >>>
> >>> JDK 19 was released yesterday and I am still waiting for
> >>> AdoptOpenJDK to publish Gradle Toolchain compatible releases to be
> >>> available. To me the schedule is a bit bad: Just on the day of the
> >>> possibility to add (optional) support for JDK 19 Panama powered
> >>> MMAP, we started the release.
> >>>
> >>> I would really like to have a JDK 19 release that adds support for
> >>> JDK 19 preview (it can only be tested for exactly the JDK 19 series,
> >>> won't work with JDK 20), so we should have a release including the
> >>> new code between now and March (ideally before XMAS).
> >>>
> >>> Options we have
> >>>
> >>> Let the 9.4.0 release go out now and add a 9.5.0 in a month or so (I
> >>> would be release manager). I do not want to add a bugfix release as
> >>> theres a small API change (MMAPDirectory ctor changes from int ->
> >>> long parameter for chunk size). We can name this release as the
> >>> "Java 19 release for early adopters"
> >>> Wait a few days and respin the release after
> >>> https://github.com/apache/lucene/pull/912 went in? The code is
> >>> thoroughly tested by Policeman Jenkins since several months, just
> >>> the compilation does not work out of box until theres a Temurin
> >>> build of OpenJDK 19.
> >>>
> >>> I just repeat, the above PR does not change any productive code, so
> >>> it should be bug-free. It only adds a few class files which are only
> >>> used when you pass "--enable-preview" to your JDK. This makes it
> >>> easy for users to check Solr or Elasticsearch with JDK 19. No risk,
> >>> it only activates when you enable it.
> >>>
> >>> Thoughts?
> >>>
> >>> Uwe
> >>>
> >>> Am 02.09.2022 um 21:42 schrieb Michael Sokolov:
> >>>
> >>> NOTICE:
> >>>
> >>> Branch branch_9_4 has been cut and versions updated to 9.5 on stable
> >>> branch.
> >>>
> >>> Please observe the normal rules:
> >>>
> >>> * No new features may be committed to the branch.
> >>> * Documentation patches, build patches and serious bug fixes may be
> >>> committed to the branch. However, you should submit all patches you
> >>> want to commit to Jira first to give others the chance to review
> >>> and possibly vote against the patch. Keep in mind that it is our
> >>> main intention to keep the branch as stable as possible.
> >>> * All patches that are intended for the branch should first be
> >>> committed
> >>> to the unstable branch, merged into the stable branch, and then into
> >>> the current release branch.
> >>> * Normal unstable and stable branch development may continue as usual.
> >>> However, if you plan to commit a big change to the unstable branch
> >>> while the branch feature freeze is in effect, think twice: can't the
> >>> addition wait a couple more days? Merges of bug fixes into the
> >>> branch
> >>> may become more difficult.
> >>> * Only Jira issues with Fix version 9.4 and priority "Blocker" will
> >>> delay
> >>> a release candidate build.
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> >>> For additional commands, e-mail: dev-help@lucene.apache.org
> >>>
> >>> --
> >>> Uwe Schindler
> >>> Achterdiek 19, D-28357 Bremen
> >>> https://www.thetaphi.de
> >>> eMail: uwe@thetaphi.de
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: dev-help@lucene.apache.org
> >>
> --
> Uwe Schindler
> Achterdiek 19, D-28357 Bremen
> https://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Subject: New branch and feature freeze for Lucene 9.4.0 [ In reply to ]

uwe at thetaphi

Sep 21, 2022, 1:11 PM

Post #37 of 37 (395 views)

Permalink

Looks like a fair deal. I will check daily for the release to appear.
Once all looks fine I will update the PR and change away from draft status.

Uwe

Am 21.09.2022 um 20:44 schrieb Michael Sokolov:
> OK, how does this sound: if there is a (JDK19 AdoptOpenJDK) release
> this week as it seems there should be, and you are able to fast-follow
> with the Lucene changes to use it then I can re-spin RC2 on Monday or
> Tuesday.
>
> On Wed, Sep 21, 2022 at 1:35 PM Uwe Schindler <uwe@thetaphi.de> wrote:
>> FYI, here (https://github.com/adoptium/adoptium/issues/171) Eclipse says:
>>
>> * Add website banner (automate* via github workflow in website
>> repository) - Announce that we target releases to be available within
>> 48-72 hours of the GA tags being available
>>
>> Uwe
>>
>> Am 21.09.2022 um 14:31 schrieb Uwe Schindler:
>>> Hi,
>>>
>>> I will check later today how long it took last time in March. I would
>>> have expected that they just need to wait until the builds and tests
>>> are done so it gets released.
>>>
>>> I don't want to hold up the release. The vote is still ongoning, so we
>>> have all options.
>>>
>>> Uwe
>>>
>>> Am 21.09.2022 um 14:05 schrieb Michael Sokolov:
>>>> I see; I would kind of like to get the release out before ApacheCon
>>>> NA, which starts Oct 3. Do you think it's likely AdoptOpenJDK will
>>>> release its JDK19 in the next week (say by Sep 26)?
>>>>
>>>> On Wed, Sep 21, 2022 at 4:32 AM Uwe Schindler <uwe@thetaphi.de> wrote:
>>>>> Hi,
>>>>>
>>>>> JDK 19 was released yesterday and I am still waiting for
>>>>> AdoptOpenJDK to publish Gradle Toolchain compatible releases to be
>>>>> available. To me the schedule is a bit bad: Just on the day of the
>>>>> possibility to add (optional) support for JDK 19 Panama powered
>>>>> MMAP, we started the release.
>>>>>
>>>>> I would really like to have a JDK 19 release that adds support for
>>>>> JDK 19 preview (it can only be tested for exactly the JDK 19 series,
>>>>> won't work with JDK 20), so we should have a release including the
>>>>> new code between now and March (ideally before XMAS).
>>>>>
>>>>> Options we have
>>>>>
>>>>> Let the 9.4.0 release go out now and add a 9.5.0 in a month or so (I
>>>>> would be release manager). I do not want to add a bugfix release as
>>>>> theres a small API change (MMAPDirectory ctor changes from int ->
>>>>> long parameter for chunk size). We can name this release as the
>>>>> "Java 19 release for early adopters"
>>>>> Wait a few days and respin the release after
>>>>> https://github.com/apache/lucene/pull/912 went in? The code is
>>>>> thoroughly tested by Policeman Jenkins since several months, just
>>>>> the compilation does not work out of box until theres a Temurin
>>>>> build of OpenJDK 19.
>>>>>
>>>>> I just repeat, the above PR does not change any productive code, so
>>>>> it should be bug-free. It only adds a few class files which are only
>>>>> used when you pass "--enable-preview" to your JDK. This makes it
>>>>> easy for users to check Solr or Elasticsearch with JDK 19. No risk,
>>>>> it only activates when you enable it.
>>>>>
>>>>> Thoughts?
>>>>>
>>>>> Uwe
>>>>>
>>>>> Am 02.09.2022 um 21:42 schrieb Michael Sokolov:
>>>>>
>>>>> NOTICE:
>>>>>
>>>>> Branch branch_9_4 has been cut and versions updated to 9.5 on stable
>>>>> branch.
>>>>>
>>>>> Please observe the normal rules:
>>>>>
>>>>> * No new features may be committed to the branch.
>>>>> * Documentation patches, build patches and serious bug fixes may be
>>>>> committed to the branch. However, you should submit all patches you
>>>>> want to commit to Jira first to give others the chance to review
>>>>> and possibly vote against the patch. Keep in mind that it is our
>>>>> main intention to keep the branch as stable as possible.
>>>>> * All patches that are intended for the branch should first be
>>>>> committed
>>>>> to the unstable branch, merged into the stable branch, and then into
>>>>> the current release branch.
>>>>> * Normal unstable and stable branch development may continue as usual.
>>>>> However, if you plan to commit a big change to the unstable branch
>>>>> while the branch feature freeze is in effect, think twice: can't the
>>>>> addition wait a couple more days? Merges of bug fixes into the
>>>>> branch
>>>>> may become more difficult.
>>>>> * Only Jira issues with Fix version 9.4 and priority "Blocker" will
>>>>> delay
>>>>> a release candidate build.
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>>
>>>>> --
>>>>> Uwe Schindler
>>>>> Achterdiek 19, D-28357 Bremen
>>>>> https://www.thetaphi.de
>>>>> eMail: uwe@thetaphi.de
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>
>> --
>> Uwe Schindler
>> Achterdiek 19, D-28357 Bremen
>> https://www.thetaphi.de
>> eMail: uwe@thetaphi.de
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
--
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail: uwe@thetaphi.de

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org