Mailing List Archive

> Branch branch_9_4 has been cut and versions updated to 9.5 on stable
branch.

Then the GitHub Milestone for 9.5 also needs to be created.

This time, I created Milestone 9.5.0. We should include it in the release
process.
https://github.com/apache/lucene/milestone/4

2022?9?3?(?) 4:42 Michael Sokolov <msokolov@gmail.com>:

> NOTICE:
>
> Branch branch_9_4 has been cut and versions updated to 9.5 on stable
> branch.
>
> Please observe the normal rules:
>
> * No new features may be committed to the branch.
> * Documentation patches, build patches and serious bug fixes may be
> committed to the branch. However, you should submit all patches you
> want to commit to Jira first to give others the chance to review
> and possibly vote against the patch. Keep in mind that it is our
> main intention to keep the branch as stable as possible.
> * All patches that are intended for the branch should first be committed
> to the unstable branch, merged into the stable branch, and then into
> the current release branch.
> * Normal unstable and stable branch development may continue as usual.
> However, if you plan to commit a big change to the unstable branch
> while the branch feature freeze is in effect, think twice: can't the
> addition wait a couple more days? Merges of bug fixes into the branch
> may become more difficult.
> * Only Jira issues with Fix version 9.4 and priority "Blocker" will delay
> a release candidate build.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Re: Subject: New branch and feature freeze for Lucene 9.4.0 [ In reply to ]

Sep 7, 2022, 9:31 AM

Post #3 of 37 (1272 views)

Hi Mike, I've been working on follow-up refactors to the vector encoding
work we just added in 9.4 (https://github.com/apache/lucene/pull/1054) and
had a couple things to check with you.

First, I opened a PR to remove LeafReader#searchNearestVectorsExhaustively (
https://github.com/apache/lucene/pull/11756). If you're happy with the
change, I'd like to backport it to 9.4 to minimize changes to the
LeafReader API. This seemed okay to me, since it's just a refactor and has
no new functionality.

I'm also looking into simplifying KnnVectorsWriter to remove the generics
we added. I may not complete this in time for 9.4. But I'm not too worried
about pushing this out, since this is a codec-level API, and changing it
won't be very disruptive to users.

Julie

On Fri, Sep 2, 2022 at 10:57 PM Tomoko Uchida <tomoko.uchida.1111@gmail.com>
wrote:

> > Branch branch_9_4 has been cut and versions updated to 9.5 on stable
> branch.
>
> Then the GitHub Milestone for 9.5 also needs to be created.
>
> This time, I created Milestone 9.5.0. We should include it in the release
> process.
> https://github.com/apache/lucene/milestone/4
>
>
> 2022?9?3?(?) 4:42 Michael Sokolov <msokolov@gmail.com>:
>
>> NOTICE:
>>
>> Branch branch_9_4 has been cut and versions updated to 9.5 on stable
>> branch.
>>
>> Please observe the normal rules:
>>
>> * No new features may be committed to the branch.
>> * Documentation patches, build patches and serious bug fixes may be
>> committed to the branch. However, you should submit all patches you
>> want to commit to Jira first to give others the chance to review
>> and possibly vote against the patch. Keep in mind that it is our
>> main intention to keep the branch as stable as possible.
>> * All patches that are intended for the branch should first be committed
>> to the unstable branch, merged into the stable branch, and then into
>> the current release branch.
>> * Normal unstable and stable branch development may continue as usual.
>> However, if you plan to commit a big change to the unstable branch
>> while the branch feature freeze is in effect, think twice: can't the
>> addition wait a couple more days? Merges of bug fixes into the branch
>> may become more difficult.
>> * Only Jira issues with Fix version 9.4 and priority "Blocker" will delay
>> a release candidate build.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>>

Re: Subject: New branch and feature freeze for Lucene 9.4.0 [ In reply to ]

Sep 8, 2022, 6:17 AM

Post #4 of 37 (1272 views)

Thanks Julie, I looked and left some minor comments. Let's target that
searchNearestVectors refactor for 9.4.0.

As for removing the generics, it would be great if we can further
simplify, but agree it doesn't seem critical to target this release,
although if we can get it pushed by next week that would be fine too;
it doesn't sound risky? But there is also a potential to touch a lot
of code, so let's not rush it either :)

We also need to wrap up https://github.com/apache/lucene/pull/11743. I
believe that looks ready to push and we have decided to maintain the
current indexing strategy. Mayya - are you ready to push, or do you
anticipate any further work there?

On Wed, Sep 7, 2022 at 12:32 PM Julie Tibshirani <julietibs@gmail.com> wrote:
>
> Hi Mike, I've been working on follow-up refactors to the vector encoding work we just added in 9.4 (https://github.com/apache/lucene/pull/1054) and had a couple things to check with you.
>
> First, I opened a PR to remove LeafReader#searchNearestVectorsExhaustively (https://github.com/apache/lucene/pull/11756). If you're happy with the change, I'd like to backport it to 9.4 to minimize changes to the LeafReader API. This seemed okay to me, since it's just a refactor and has no new functionality.
>
> I'm also looking into simplifying KnnVectorsWriter to remove the generics we added. I may not complete this in time for 9.4. But I'm not too worried about pushing this out, since this is a codec-level API, and changing it won't be very disruptive to users.
>
> Julie
>
> On Fri, Sep 2, 2022 at 10:57 PM Tomoko Uchida <tomoko.uchida.1111@gmail.com> wrote:
>>
>> > Branch branch_9_4 has been cut and versions updated to 9.5 on stable branch.
>>
>> Then the GitHub Milestone for 9.5 also needs to be created.
>>
>> This time, I created Milestone 9.5.0. We should include it in the release process.
>> https://github.com/apache/lucene/milestone/4
>>
>>
>> 2022?9?3?(?) 4:42 Michael Sokolov <msokolov@gmail.com>:
>>>
>>> NOTICE:
>>>
>>> Branch branch_9_4 has been cut and versions updated to 9.5 on stable branch.
>>>
>>> Please observe the normal rules:
>>>
>>> * No new features may be committed to the branch.
>>> * Documentation patches, build patches and serious bug fixes may be
>>> committed to the branch. However, you should submit all patches you
>>> want to commit to Jira first to give others the chance to review
>>> and possibly vote against the patch. Keep in mind that it is our
>>> main intention to keep the branch as stable as possible.
>>> * All patches that are intended for the branch should first be committed
>>> to the unstable branch, merged into the stable branch, and then into
>>> the current release branch.
>>> * Normal unstable and stable branch development may continue as usual.
>>> However, if you plan to commit a big change to the unstable branch
>>> while the branch feature freeze is in effect, think twice: can't the
>>> addition wait a couple more days? Merges of bug fixes into the branch
>>> may become more difficult.
>>> * Only Jira issues with Fix version 9.4 and priority "Blocker" will delay
>>> a release candidate build.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Subject: New branch and feature freeze for Lucene 9.4.0 [ In reply to ]

Sep 8, 2022, 8:06 AM

Post #5 of 37 (1272 views)

Hello Michael,
Thanks for following up on this.
I will try to merge https://github.com/apache/lucene/pull/11743 today or
tomorrow (will back port to 9.4 as well).

----
Meanwhile, I've identified a slight regression in the performance for the
vector search ann benchmarks between 9.3 and 9.4 releases.
I am not sure if this regression is a blocker, as the recall drop is very
slight.

For the glove database, recall dropped by 3-8%; QPS dropped by 30-50%

[image: image.png]

For the sift database, recall dropped by 1%; QPS dropped by up to 27-35%

[image: image.png]

I could not find yet any specific commit that introduced this regression.
One thing I tested was that "LUCENE-10592 Build HNSW Graph on indexing" did
not introduce this regression, as the recall and QPS with this commit are
comparable with the 9.3 branch.

Could it be regression in the way we test (as there were changes in
KnnGraphTester)?

On Thu, Sep 8, 2022 at 9:17 AM Michael Sokolov <msokolov@gmail.com> wrote:

> Thanks Julie, I looked and left some minor comments. Let's target that
> searchNearestVectors refactor for 9.4.0.
>
> As for removing the generics, it would be great if we can further
> simplify, but agree it doesn't seem critical to target this release,
> although if we can get it pushed by next week that would be fine too;
> it doesn't sound risky? But there is also a potential to touch a lot
> of code, so let's not rush it either :)
>
> We also need to wrap up https://github.com/apache/lucene/pull/11743. I
> believe that looks ready to push and we have decided to maintain the
> current indexing strategy. Mayya - are you ready to push, or do you
> anticipate any further work there?
>
> On Wed, Sep 7, 2022 at 12:32 PM Julie Tibshirani <julietibs@gmail.com>
> wrote:
> >
> > Hi Mike, I've been working on follow-up refactors to the vector encoding
> work we just added in 9.4 (https://github.com/apache/lucene/pull/1054)
> and had a couple things to check with you.
> >
> > First, I opened a PR to remove
> LeafReader#searchNearestVectorsExhaustively (
> https://github.com/apache/lucene/pull/11756). If you're happy with the
> change, I'd like to backport it to 9.4 to minimize changes to the
> LeafReader API. This seemed okay to me, since it's just a refactor and has
> no new functionality.
> >
> > I'm also looking into simplifying KnnVectorsWriter to remove the
> generics we added. I may not complete this in time for 9.4. But I'm not too
> worried about pushing this out, since this is a codec-level API, and
> changing it won't be very disruptive to users.
> >
> > Julie
> >
> > On Fri, Sep 2, 2022 at 10:57 PM Tomoko Uchida <
> tomoko.uchida.1111@gmail.com> wrote:
> >>
> >> > Branch branch_9_4 has been cut and versions updated to 9.5 on stable
> branch.
> >>
> >> Then the GitHub Milestone for 9.5 also needs to be created.
> >>
> >> This time, I created Milestone 9.5.0. We should include it in the
> release process.
> >> https://github.com/apache/lucene/milestone/4
> >>
> >>
> >> 2022?9?3?(?) 4:42 Michael Sokolov <msokolov@gmail.com>:
> >>>
> >>> NOTICE:
> >>>
> >>> Branch branch_9_4 has been cut and versions updated to 9.5 on stable
> branch.
> >>>
> >>> Please observe the normal rules:
> >>>
> >>> * No new features may be committed to the branch.
> >>> * Documentation patches, build patches and serious bug fixes may be
> >>> committed to the branch. However, you should submit all patches you
> >>> want to commit to Jira first to give others the chance to review
> >>> and possibly vote against the patch. Keep in mind that it is our
> >>> main intention to keep the branch as stable as possible.
> >>> * All patches that are intended for the branch should first be
> committed
> >>> to the unstable branch, merged into the stable branch, and then into
> >>> the current release branch.
> >>> * Normal unstable and stable branch development may continue as usual.
> >>> However, if you plan to commit a big change to the unstable branch
> >>> while the branch feature freeze is in effect, think twice: can't the
> >>> addition wait a couple more days? Merges of bug fixes into the branch
> >>> may become more difficult.
> >>> * Only Jira issues with Fix version 9.4 and priority "Blocker" will
> delay
> >>> a release candidate build.
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> >>> For additional commands, e-mail: dev-help@lucene.apache.org
> >>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Re: Subject: New branch and feature freeze for Lucene 9.4.0 [ In reply to ]

Sep 8, 2022, 10:01 AM

Post #6 of 37 (1272 views)

Hmm those ann-benchmarks regressions are concerning. I think we should
treat as a blocker for this release. I'll see if I can reproduce using
simpler test with KnnGraphTester alone. I will note that nightly benchmarks
haven't registered any QPS regression
https://home.apache.org/~mikemccand/lucenebench/VectorSearch.html

On Thu, Sep 8, 2022 at 11:07 AM Mayya Sharipova
<mayya.sharipova@elastic.co.invalid> wrote:

> Hello Michael,
> Thanks for following up on this.
> I will try to merge https://github.com/apache/lucene/pull/11743 today or
> tomorrow (will back port to 9.4 as well).
>
> ----
> Meanwhile, I've identified a slight regression in the performance for the
> vector search ann benchmarks between 9.3 and 9.4 releases.
> I am not sure if this regression is a blocker, as the recall drop is very
> slight.
>
> For the glove database, recall dropped by 3-8%; QPS dropped by 30-50%
>
> [image: image.png]
>
> For the sift database, recall dropped by 1%; QPS dropped by up to 27-35%
>
> [image: image.png]
>
> I could not find yet any specific commit that introduced this regression.
> One thing I tested was that "LUCENE-10592 Build HNSW Graph on indexing"
> did not introduce this regression, as the recall and QPS with this commit
> are comparable with the 9.3 branch.
>
> Could it be regression in the way we test (as there were changes in
> KnnGraphTester)?
>
>
>
> On Thu, Sep 8, 2022 at 9:17 AM Michael Sokolov <msokolov@gmail.com> wrote:
>
>> Thanks Julie, I looked and left some minor comments. Let's target that
>> searchNearestVectors refactor for 9.4.0.
>>
>> As for removing the generics, it would be great if we can further
>> simplify, but agree it doesn't seem critical to target this release,
>> although if we can get it pushed by next week that would be fine too;
>> it doesn't sound risky? But there is also a potential to touch a lot
>> of code, so let's not rush it either :)
>>
>> We also need to wrap up https://github.com/apache/lucene/pull/11743. I
>> believe that looks ready to push and we have decided to maintain the
>> current indexing strategy. Mayya - are you ready to push, or do you
>> anticipate any further work there?
>>
>> On Wed, Sep 7, 2022 at 12:32 PM Julie Tibshirani <julietibs@gmail.com>
>> wrote:
>> >
>> > Hi Mike, I've been working on follow-up refactors to the vector
>> encoding work we just added in 9.4 (
>> https://github.com/apache/lucene/pull/1054) and had a couple things to
>> check with you.
>> >
>> > First, I opened a PR to remove
>> LeafReader#searchNearestVectorsExhaustively (
>> https://github.com/apache/lucene/pull/11756). If you're happy with the
>> change, I'd like to backport it to 9.4 to minimize changes to the
>> LeafReader API. This seemed okay to me, since it's just a refactor and has
>> no new functionality.
>> >
>> > I'm also looking into simplifying KnnVectorsWriter to remove the
>> generics we added. I may not complete this in time for 9.4. But I'm not too
>> worried about pushing this out, since this is a codec-level API, and
>> changing it won't be very disruptive to users.
>> >
>> > Julie
>> >
>> > On Fri, Sep 2, 2022 at 10:57 PM Tomoko Uchida <
>> tomoko.uchida.1111@gmail.com> wrote:
>> >>
>> >> > Branch branch_9_4 has been cut and versions updated to 9.5 on stable
>> branch.
>> >>
>> >> Then the GitHub Milestone for 9.5 also needs to be created.
>> >>
>> >> This time, I created Milestone 9.5.0. We should include it in the
>> release process.
>> >> https://github.com/apache/lucene/milestone/4
>> >>
>> >>
>> >> 2022?9?3?(?) 4:42 Michael Sokolov <msokolov@gmail.com>:
>> >>>
>> >>> NOTICE:
>> >>>
>> >>> Branch branch_9_4 has been cut and versions updated to 9.5 on stable
>> branch.
>> >>>
>> >>> Please observe the normal rules:
>> >>>
>> >>> * No new features may be committed to the branch.
>> >>> * Documentation patches, build patches and serious bug fixes may be
>> >>> committed to the branch. However, you should submit all patches you
>> >>> want to commit to Jira first to give others the chance to review
>> >>> and possibly vote against the patch. Keep in mind that it is our
>> >>> main intention to keep the branch as stable as possible.
>> >>> * All patches that are intended for the branch should first be
>> committed
>> >>> to the unstable branch, merged into the stable branch, and then into
>> >>> the current release branch.
>> >>> * Normal unstable and stable branch development may continue as usual.
>> >>> However, if you plan to commit a big change to the unstable branch
>> >>> while the branch feature freeze is in effect, think twice: can't the
>> >>> addition wait a couple more days? Merges of bug fixes into the
>> branch
>> >>> may become more difficult.
>> >>> * Only Jira issues with Fix version 9.4 and priority "Blocker" will
>> delay
>> >>> a release candidate build.
>> >>>
>> >>> ---------------------------------------------------------------------
>> >>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> >>> For additional commands, e-mail: dev-help@lucene.apache.org
>> >>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>>

Re: Subject: New branch and feature freeze for Lucene 9.4.0 [ In reply to ]

Sep 9, 2022, 1:45 AM

Post #7 of 37 (1265 views)

A change in recall and QPS feels like something that could be caused by a
different organization of segments. Mayya, do you know if your Lucene
indexes had the same distribution of segments (number and size) in both
cases?

On Thu, Sep 8, 2022 at 7:02 PM Michael Sokolov <msokolov@gmail.com> wrote:

> Hmm those ann-benchmarks regressions are concerning. I think we should
> treat as a blocker for this release. I'll see if I can reproduce using
> simpler test with KnnGraphTester alone. I will note that nightly benchmarks
> haven't registered any QPS regression
> https://home.apache.org/~mikemccand/lucenebench/VectorSearch.html
>
> On Thu, Sep 8, 2022 at 11:07 AM Mayya Sharipova
> <mayya.sharipova@elastic.co.invalid> wrote:
>
>> Hello Michael,
>> Thanks for following up on this.
>> I will try to merge https://github.com/apache/lucene/pull/11743 today
>> or tomorrow (will back port to 9.4 as well).
>>
>> ----
>> Meanwhile, I've identified a slight regression in the performance for the
>> vector search ann benchmarks between 9.3 and 9.4 releases.
>> I am not sure if this regression is a blocker, as the recall drop is very
>> slight.
>>
>> For the glove database, recall dropped by 3-8%; QPS dropped by 30-50%
>>
>> [image: image.png]
>>
>> For the sift database, recall dropped by 1%; QPS dropped by up to 27-35%
>>
>> [image: image.png]
>>
>> I could not find yet any specific commit that introduced this regression.
>> One thing I tested was that "LUCENE-10592 Build HNSW Graph on indexing"
>> did not introduce this regression, as the recall and QPS with this commit
>> are comparable with the 9.3 branch.
>>
>> Could it be regression in the way we test (as there were changes in
>> KnnGraphTester)?
>>
>>
>>
>> On Thu, Sep 8, 2022 at 9:17 AM Michael Sokolov <msokolov@gmail.com>
>> wrote:
>>
>>> Thanks Julie, I looked and left some minor comments. Let's target that
>>> searchNearestVectors refactor for 9.4.0.
>>>
>>> As for removing the generics, it would be great if we can further
>>> simplify, but agree it doesn't seem critical to target this release,
>>> although if we can get it pushed by next week that would be fine too;
>>> it doesn't sound risky? But there is also a potential to touch a lot
>>> of code, so let's not rush it either :)
>>>
>>> We also need to wrap up https://github.com/apache/lucene/pull/11743. I
>>> believe that looks ready to push and we have decided to maintain the
>>> current indexing strategy. Mayya - are you ready to push, or do you
>>> anticipate any further work there?
>>>
>>> On Wed, Sep 7, 2022 at 12:32 PM Julie Tibshirani <julietibs@gmail.com>
>>> wrote:
>>> >
>>> > Hi Mike, I've been working on follow-up refactors to the vector
>>> encoding work we just added in 9.4 (
>>> https://github.com/apache/lucene/pull/1054) and had a couple things to
>>> check with you.
>>> >
>>> > First, I opened a PR to remove
>>> LeafReader#searchNearestVectorsExhaustively (
>>> https://github.com/apache/lucene/pull/11756). If you're happy with the
>>> change, I'd like to backport it to 9.4 to minimize changes to the
>>> LeafReader API. This seemed okay to me, since it's just a refactor and has
>>> no new functionality.
>>> >
>>> > I'm also looking into simplifying KnnVectorsWriter to remove the
>>> generics we added. I may not complete this in time for 9.4. But I'm not too
>>> worried about pushing this out, since this is a codec-level API, and
>>> changing it won't be very disruptive to users.
>>> >
>>> > Julie
>>> >
>>> > On Fri, Sep 2, 2022 at 10:57 PM Tomoko Uchida <
>>> tomoko.uchida.1111@gmail.com> wrote:
>>> >>
>>> >> > Branch branch_9_4 has been cut and versions updated to 9.5 on
>>> stable branch.
>>> >>
>>> >> Then the GitHub Milestone for 9.5 also needs to be created.
>>> >>
>>> >> This time, I created Milestone 9.5.0. We should include it in the
>>> release process.
>>> >> https://github.com/apache/lucene/milestone/4
>>> >>
>>> >>
>>> >> 2022?9?3?(?) 4:42 Michael Sokolov <msokolov@gmail.com>:
>>> >>>
>>> >>> NOTICE:
>>> >>>
>>> >>> Branch branch_9_4 has been cut and versions updated to 9.5 on stable
>>> branch.
>>> >>>
>>> >>> Please observe the normal rules:
>>> >>>
>>> >>> * No new features may be committed to the branch.
>>> >>> * Documentation patches, build patches and serious bug fixes may be
>>> >>> committed to the branch. However, you should submit all patches you
>>> >>> want to commit to Jira first to give others the chance to review
>>> >>> and possibly vote against the patch. Keep in mind that it is our
>>> >>> main intention to keep the branch as stable as possible.
>>> >>> * All patches that are intended for the branch should first be
>>> committed
>>> >>> to the unstable branch, merged into the stable branch, and then
>>> into
>>> >>> the current release branch.
>>> >>> * Normal unstable and stable branch development may continue as
>>> usual.
>>> >>> However, if you plan to commit a big change to the unstable branch
>>> >>> while the branch feature freeze is in effect, think twice: can't
>>> the
>>> >>> addition wait a couple more days? Merges of bug fixes into the
>>> branch
>>> >>> may become more difficult.
>>> >>> * Only Jira issues with Fix version 9.4 and priority "Blocker" will
>>> delay
>>> >>> a release candidate build.
>>> >>>
>>> >>> ---------------------------------------------------------------------
>>> >>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>> >>> For additional commands, e-mail: dev-help@lucene.apache.org
>>> >>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>
>>>

--
Adrien

Re: Subject: New branch and feature freeze for Lucene 9.4.0 [ In reply to ]

Sep 9, 2022, 5:44 AM

Post #8 of 37 (1263 views)

Thanks Adrien,
Thanks for the suggestion. I have not checked the size of segments (I can
do that later), but in both cases: in 9.3 and 9.4 we end up with a single
segment, because KnnGraphTester
<https://github.com/apache/lucene/blob/branch_9_4/lucene/core/src/test/org/apache/lucene/util/hnsw/KnnGraphTester.java#L704>
that is used for these tests, sets up a high indexing memory buffer – 2Gb.

On Fri, Sep 9, 2022 at 4:46 AM Adrien Grand <jpountz@gmail.com> wrote:

> A change in recall and QPS feels like something that could be caused by a
> different organization of segments. Mayya, do you know if your Lucene
> indexes had the same distribution of segments (number and size) in both
> cases?
>
> On Thu, Sep 8, 2022 at 7:02 PM Michael Sokolov <msokolov@gmail.com> wrote:
>
>> Hmm those ann-benchmarks regressions are concerning. I think we should
>> treat as a blocker for this release. I'll see if I can reproduce using
>> simpler test with KnnGraphTester alone. I will note that nightly benchmarks
>> haven't registered any QPS regression
>> https://home.apache.org/~mikemccand/lucenebench/VectorSearch.html
>>
>> On Thu, Sep 8, 2022 at 11:07 AM Mayya Sharipova
>> <mayya.sharipova@elastic.co.invalid> wrote:
>>
>>> Hello Michael,
>>> Thanks for following up on this.
>>> I will try to merge https://github.com/apache/lucene/pull/11743 today
>>> or tomorrow (will back port to 9.4 as well).
>>>
>>> ----
>>> Meanwhile, I've identified a slight regression in the performance for
>>> the vector search ann benchmarks between 9.3 and 9.4 releases.
>>> I am not sure if this regression is a blocker, as the recall drop is
>>> very slight.
>>>
>>> For the glove database, recall dropped by 3-8%; QPS dropped by 30-50%
>>>
>>> [image: image.png]
>>>
>>> For the sift database, recall dropped by 1%; QPS dropped by up to 27-35%
>>>
>>> [image: image.png]
>>>
>>> I could not find yet any specific commit that introduced this regression.
>>> One thing I tested was that "LUCENE-10592 Build HNSW Graph on indexing"
>>> did not introduce this regression, as the recall and QPS with this commit
>>> are comparable with the 9.3 branch.
>>>
>>> Could it be regression in the way we test (as there were changes in
>>> KnnGraphTester)?
>>>
>>>
>>>
>>> On Thu, Sep 8, 2022 at 9:17 AM Michael Sokolov <msokolov@gmail.com>
>>> wrote:
>>>
>>>> Thanks Julie, I looked and left some minor comments. Let's target that
>>>> searchNearestVectors refactor for 9.4.0.
>>>>
>>>> As for removing the generics, it would be great if we can further
>>>> simplify, but agree it doesn't seem critical to target this release,
>>>> although if we can get it pushed by next week that would be fine too;
>>>> it doesn't sound risky? But there is also a potential to touch a lot
>>>> of code, so let's not rush it either :)
>>>>
>>>> We also need to wrap up https://github.com/apache/lucene/pull/11743. I
>>>> believe that looks ready to push and we have decided to maintain the
>>>> current indexing strategy. Mayya - are you ready to push, or do you
>>>> anticipate any further work there?
>>>>
>>>> On Wed, Sep 7, 2022 at 12:32 PM Julie Tibshirani <julietibs@gmail.com>
>>>> wrote:
>>>> >
>>>> > Hi Mike, I've been working on follow-up refactors to the vector
>>>> encoding work we just added in 9.4 (
>>>> https://github.com/apache/lucene/pull/1054) and had a couple things to
>>>> check with you.
>>>> >
>>>> > First, I opened a PR to remove
>>>> LeafReader#searchNearestVectorsExhaustively (
>>>> https://github.com/apache/lucene/pull/11756). If you're happy with the
>>>> change, I'd like to backport it to 9.4 to minimize changes to the
>>>> LeafReader API. This seemed okay to me, since it's just a refactor and has
>>>> no new functionality.
>>>> >
>>>> > I'm also looking into simplifying KnnVectorsWriter to remove the
>>>> generics we added. I may not complete this in time for 9.4. But I'm not too
>>>> worried about pushing this out, since this is a codec-level API, and
>>>> changing it won't be very disruptive to users.
>>>> >
>>>> > Julie
>>>> >
>>>> > On Fri, Sep 2, 2022 at 10:57 PM Tomoko Uchida <
>>>> tomoko.uchida.1111@gmail.com> wrote:
>>>> >>
>>>> >> > Branch branch_9_4 has been cut and versions updated to 9.5 on
>>>> stable branch.
>>>> >>
>>>> >> Then the GitHub Milestone for 9.5 also needs to be created.
>>>> >>
>>>> >> This time, I created Milestone 9.5.0. We should include it in the
>>>> release process.
>>>> >> https://github.com/apache/lucene/milestone/4
>>>> >>
>>>> >>
>>>> >> 2022?9?3?(?) 4:42 Michael Sokolov <msokolov@gmail.com>:
>>>> >>>
>>>> >>> NOTICE:
>>>> >>>
>>>> >>> Branch branch_9_4 has been cut and versions updated to 9.5 on
>>>> stable branch.
>>>> >>>
>>>> >>> Please observe the normal rules:
>>>> >>>
>>>> >>> * No new features may be committed to the branch.
>>>> >>> * Documentation patches, build patches and serious bug fixes may be
>>>> >>> committed to the branch. However, you should submit all patches
>>>> you
>>>> >>> want to commit to Jira first to give others the chance to review
>>>> >>> and possibly vote against the patch. Keep in mind that it is our
>>>> >>> main intention to keep the branch as stable as possible.
>>>> >>> * All patches that are intended for the branch should first be
>>>> committed
>>>> >>> to the unstable branch, merged into the stable branch, and then
>>>> into
>>>> >>> the current release branch.
>>>> >>> * Normal unstable and stable branch development may continue as
>>>> usual.
>>>> >>> However, if you plan to commit a big change to the unstable branch
>>>> >>> while the branch feature freeze is in effect, think twice: can't
>>>> the
>>>> >>> addition wait a couple more days? Merges of bug fixes into the
>>>> branch
>>>> >>> may become more difficult.
>>>> >>> * Only Jira issues with Fix version 9.4 and priority "Blocker" will
>>>> delay
>>>> >>> a release candidate build.
>>>> >>>
>>>> >>>
>>>> ---------------------------------------------------------------------
>>>> >>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>> >>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>> >>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>
>>>>
>
> --
> Adrien
>

Re: Subject: New branch and feature freeze for Lucene 9.4.0 [ In reply to ]

romseygeek at gmail

Sep 9, 2022, 7:41 AM

Post #9 of 37 (1263 views)

Hi Mike,

I’ve opened https://github.com/apache/lucene/pull/11760 <https://github.com/apache/lucene/pull/11760> as a small bug fix PR for a problem with interval queries. Am I OK to port this to the 9.4 branch?

Thanks, Alan

> On 2 Sep 2022, at 20:42, Michael Sokolov <msokolov@gmail.com <mailto:msokolov@gmail.com>> wrote:
>
> NOTICE:
>
> Branch branch_9_4 has been cut and versions updated to 9.5 on stable branch.
>
> Please observe the normal rules:
>
> * No new features may be committed to the branch.
> * Documentation patches, build patches and serious bug fixes may be
> committed to the branch. However, you should submit all patches you
> want to commit to Jira first to give others the chance to review
> and possibly vote against the patch. Keep in mind that it is our
> main intention to keep the branch as stable as possible.
> * All patches that are intended for the branch should first be committed
> to the unstable branch, merged into the stable branch, and then into
> the current release branch.
> * Normal unstable and stable branch development may continue as usual.
> However, if you plan to commit a big change to the unstable branch
> while the branch feature freeze is in effect, think twice: can't the
> addition wait a couple more days? Merges of bug fixes into the branch
> may become more difficult.
> * Only Jira issues with Fix version 9.4 and priority "Blocker" will delay
> a release candidate build.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org <mailto:dev-unsubscribe@lucene.apache.org>
> For additional commands, e-mail: dev-help@lucene.apache.org <mailto:dev-help@lucene.apache.org>
>

Re: Subject: New branch and feature freeze for Lucene 9.4.0 [ In reply to ]

Sep 9, 2022, 8:32 AM

Post #10 of 37 (1262 views)

Hi Alan - I checked out the interval queries patch; seems pretty safe,
please go ahead and port to 9.4. Thanks!

Mike

On Fri, Sep 9, 2022 at 10:41 AM Alan Woodward <romseygeek@gmail.com> wrote:
>
> Hi Mike,
>
> I’ve opened https://github.com/apache/lucene/pull/11760 as a small bug fix PR for a problem with interval queries. Am I OK to port this to the 9.4 branch?
>
> Thanks, Alan
>
> On 2 Sep 2022, at 20:42, Michael Sokolov <msokolov@gmail.com> wrote:
>
> NOTICE:
>
> Branch branch_9_4 has been cut and versions updated to 9.5 on stable branch.
>
> Please observe the normal rules:
>
> * No new features may be committed to the branch.
> * Documentation patches, build patches and serious bug fixes may be
> committed to the branch. However, you should submit all patches you
> want to commit to Jira first to give others the chance to review
> and possibly vote against the patch. Keep in mind that it is our
> main intention to keep the branch as stable as possible.
> * All patches that are intended for the branch should first be committed
> to the unstable branch, merged into the stable branch, and then into
> the current release branch.
> * Normal unstable and stable branch development may continue as usual.
> However, if you plan to commit a big change to the unstable branch
> while the branch feature freeze is in effect, think twice: can't the
> addition wait a couple more days? Merges of bug fixes into the branch
> may become more difficult.
> * Only Jira issues with Fix version 9.4 and priority "Blocker" will delay
> a release candidate build.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Subject: New branch and feature freeze for Lucene 9.4.0 [ In reply to ]

Sep 9, 2022, 9:18 AM

Post #11 of 37 (1262 views)

I think this is the likely explanation. KnnGraphTester does not guarantee a
single segment; it merely sets a large buffer that usually results in a
single segment in our tests. But in one test I ran, using 1M GloVe 100-d
vectors I did note that we end up with two segments (using 9.4). I think
it's due to the fact that we now include the size of the graph in the
budget used to determine when to flush, where previously we only included
the size of the vector data. In my test I saw this index:

-rw-r--r-- 1 sokolovm amazon 212M Sep 9 15:44
_0_Lucene94HnswVectorsFormat_0.vec
-rw-r--r-- 1 sokolovm amazon 12K Sep 9 15:44
_0_Lucene94HnswVectorsFormat_0.vem
-rw-r--r-- 1 sokolovm amazon 852M Sep 9 15:44
_0_Lucene94HnswVectorsFormat_0.vex
...
-rw-r--r-- 1 sokolovm amazon 170M Sep 9 16:08
_1_Lucene94HnswVectorsFormat_0.vec
-rw-r--r-- 1 sokolovm amazon 9.0K Sep 9 16:08
_1_Lucene94HnswVectorsFormat_0.vem
-rw-r--r-- 1 sokolovm amazon 682M Sep 9 16:08
_1_Lucene94HnswVectorsFormat_0.vex

making it clear that the graph (vex files) is dominating the index size

and indeed this leads to a big difference in indexing time, search latency,
and recall

On Fri, Sep 9, 2022 at 8:45 AM Mayya Sharipova
<mayya.sharipova@elastic.co.invalid> wrote:

> Thanks Adrien,
> Thanks for the suggestion. I have not checked the size of segments (I can
> do that later), but in both cases: in 9.3 and 9.4 we end up with a single
> segment, because KnnGraphTester
> <https://github.com/apache/lucene/blob/branch_9_4/lucene/core/src/test/org/apache/lucene/util/hnsw/KnnGraphTester.java#L704>
> that is used for these tests, sets up a high indexing memory buffer – 2Gb.
>
> On Fri, Sep 9, 2022 at 4:46 AM Adrien Grand <jpountz@gmail.com> wrote:
>
>> A change in recall and QPS feels like something that could be caused by a
>> different organization of segments. Mayya, do you know if your Lucene
>> indexes had the same distribution of segments (number and size) in both
>> cases?
>>
>> On Thu, Sep 8, 2022 at 7:02 PM Michael Sokolov <msokolov@gmail.com>
>> wrote:
>>
>>> Hmm those ann-benchmarks regressions are concerning. I think we should
>>> treat as a blocker for this release. I'll see if I can reproduce using
>>> simpler test with KnnGraphTester alone. I will note that nightly benchmarks
>>> haven't registered any QPS regression
>>> https://home.apache.org/~mikemccand/lucenebench/VectorSearch.html
>>>
>>> On Thu, Sep 8, 2022 at 11:07 AM Mayya Sharipova
>>> <mayya.sharipova@elastic.co.invalid> wrote:
>>>
>>>> Hello Michael,
>>>> Thanks for following up on this.
>>>> I will try to merge https://github.com/apache/lucene/pull/11743 today
>>>> or tomorrow (will back port to 9.4 as well).
>>>>
>>>> ----
>>>> Meanwhile, I've identified a slight regression in the performance for
>>>> the vector search ann benchmarks between 9.3 and 9.4 releases.
>>>> I am not sure if this regression is a blocker, as the recall drop is
>>>> very slight.
>>>>
>>>> For the glove database, recall dropped by 3-8%; QPS dropped by 30-50%
>>>>
>>>> [image: image.png]
>>>>
>>>> For the sift database, recall dropped by 1%; QPS dropped by up to 27-35%
>>>>
>>>> [image: image.png]
>>>>
>>>> I could not find yet any specific commit that introduced this
>>>> regression.
>>>> One thing I tested was that "LUCENE-10592 Build HNSW Graph on indexing"
>>>> did not introduce this regression, as the recall and QPS with this commit
>>>> are comparable with the 9.3 branch.
>>>>
>>>> Could it be regression in the way we test (as there were changes in
>>>> KnnGraphTester)?
>>>>
>>>>
>>>>
>>>> On Thu, Sep 8, 2022 at 9:17 AM Michael Sokolov <msokolov@gmail.com>
>>>> wrote:
>>>>
>>>>> Thanks Julie, I looked and left some minor comments. Let's target that
>>>>> searchNearestVectors refactor for 9.4.0.
>>>>>
>>>>> As for removing the generics, it would be great if we can further
>>>>> simplify, but agree it doesn't seem critical to target this release,
>>>>> although if we can get it pushed by next week that would be fine too;
>>>>> it doesn't sound risky? But there is also a potential to touch a lot
>>>>> of code, so let's not rush it either :)
>>>>>
>>>>> We also need to wrap up https://github.com/apache/lucene/pull/11743. I
>>>>> believe that looks ready to push and we have decided to maintain the
>>>>> current indexing strategy. Mayya - are you ready to push, or do you
>>>>> anticipate any further work there?
>>>>>
>>>>> On Wed, Sep 7, 2022 at 12:32 PM Julie Tibshirani <julietibs@gmail.com>
>>>>> wrote:
>>>>> >
>>>>> > Hi Mike, I've been working on follow-up refactors to the vector
>>>>> encoding work we just added in 9.4 (
>>>>> https://github.com/apache/lucene/pull/1054) and had a couple things
>>>>> to check with you.
>>>>> >
>>>>> > First, I opened a PR to remove
>>>>> LeafReader#searchNearestVectorsExhaustively (
>>>>> https://github.com/apache/lucene/pull/11756). If you're happy with
>>>>> the change, I'd like to backport it to 9.4 to minimize changes to the
>>>>> LeafReader API. This seemed okay to me, since it's just a refactor and has
>>>>> no new functionality.
>>>>> >
>>>>> > I'm also looking into simplifying KnnVectorsWriter to remove the
>>>>> generics we added. I may not complete this in time for 9.4. But I'm not too
>>>>> worried about pushing this out, since this is a codec-level API, and
>>>>> changing it won't be very disruptive to users.
>>>>> >
>>>>> > Julie
>>>>> >
>>>>> > On Fri, Sep 2, 2022 at 10:57 PM Tomoko Uchida <
>>>>> tomoko.uchida.1111@gmail.com> wrote:
>>>>> >>
>>>>> >> > Branch branch_9_4 has been cut and versions updated to 9.5 on
>>>>> stable branch.
>>>>> >>
>>>>> >> Then the GitHub Milestone for 9.5 also needs to be created.
>>>>> >>
>>>>> >> This time, I created Milestone 9.5.0. We should include it in the
>>>>> release process.
>>>>> >> https://github.com/apache/lucene/milestone/4
>>>>> >>
>>>>> >>
>>>>> >> 2022?9?3?(?) 4:42 Michael Sokolov <msokolov@gmail.com>:
>>>>> >>>
>>>>> >>> NOTICE:
>>>>> >>>
>>>>> >>> Branch branch_9_4 has been cut and versions updated to 9.5 on
>>>>> stable branch.
>>>>> >>>
>>>>> >>> Please observe the normal rules:
>>>>> >>>
>>>>> >>> * No new features may be committed to the branch.
>>>>> >>> * Documentation patches, build patches and serious bug fixes may be
>>>>> >>> committed to the branch. However, you should submit all patches
>>>>> you
>>>>> >>> want to commit to Jira first to give others the chance to review
>>>>> >>> and possibly vote against the patch. Keep in mind that it is our
>>>>> >>> main intention to keep the branch as stable as possible.
>>>>> >>> * All patches that are intended for the branch should first be
>>>>> committed
>>>>> >>> to the unstable branch, merged into the stable branch, and then
>>>>> into
>>>>> >>> the current release branch.
>>>>> >>> * Normal unstable and stable branch development may continue as
>>>>> usual.
>>>>> >>> However, if you plan to commit a big change to the unstable
>>>>> branch
>>>>> >>> while the branch feature freeze is in effect, think twice: can't
>>>>> the
>>>>> >>> addition wait a couple more days? Merges of bug fixes into the
>>>>> branch
>>>>> >>> may become more difficult.
>>>>> >>> * Only Jira issues with Fix version 9.4 and priority "Blocker"
>>>>> will delay
>>>>> >>> a release candidate build.
>>>>> >>>
>>>>> >>>
>>>>> ---------------------------------------------------------------------
>>>>> >>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>>> >>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>> >>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>>
>>>>>
>>
>> --
>> Adrien
>>
>

Re: Subject: New branch and feature freeze for Lucene 9.4.0 [ In reply to ]

romseygeek at gmail

Sep 9, 2022, 9:21 AM

Post #12 of 37 (1262 views)

Done. Thanks!

> On 9 Sep 2022, at 16:32, Michael Sokolov <msokolov@gmail.com> wrote:
>
> Hi Alan - I checked out the interval queries patch; seems pretty safe,
> please go ahead and port to 9.4. Thanks!
>
> Mike
>
> On Fri, Sep 9, 2022 at 10:41 AM Alan Woodward <romseygeek@gmail.com> wrote:
>>
>> Hi Mike,
>>
>> I’ve opened https://github.com/apache/lucene/pull/11760 as a small bug fix PR for a problem with interval queries. Am I OK to port this to the 9.4 branch?
>>
>> Thanks, Alan
>>
>> On 2 Sep 2022, at 20:42, Michael Sokolov <msokolov@gmail.com> wrote:
>>
>> NOTICE:
>>
>> Branch branch_9_4 has been cut and versions updated to 9.5 on stable branch.
>>
>> Please observe the normal rules:
>>
>> * No new features may be committed to the branch.
>> * Documentation patches, build patches and serious bug fixes may be
>> committed to the branch. However, you should submit all patches you
>> want to commit to Jira first to give others the chance to review
>> and possibly vote against the patch. Keep in mind that it is our
>> main intention to keep the branch as stable as possible.
>> * All patches that are intended for the branch should first be committed
>> to the unstable branch, merged into the stable branch, and then into
>> the current release branch.
>> * Normal unstable and stable branch development may continue as usual.
>> However, if you plan to commit a big change to the unstable branch
>> while the branch feature freeze is in effect, think twice: can't the
>> addition wait a couple more days? Merges of bug fixes into the branch
>> may become more difficult.
>> * Only Jira issues with Fix version 9.4 and priority "Blocker" will delay
>> a release candidate build.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Subject: New branch and feature freeze for Lucene 9.4.0 [ In reply to ]

Sep 12, 2022, 10:51 AM

Post #13 of 37 (1236 views)

Hello Michael,
Thanks for checking.
Sorry for bringing this up again.
First of all, I am ok with proceeding with the Lucene 9.4 release and
leaving the performance investigations for later.

I am interested in what's the maxConn/M value you used for your tests? What
was the heap memory and the size of the RAM buffer for indexing?
Usually, when we have multiple segments, recall should increase, not
decrease. But I agree that with multiple segments we can see a big drop in
QPS.

Here
<https://gist.github.com/mayya-sharipova/e4d66af49a5b67e8a2338d0c3c522a9e>
is my investigation with detailed output of the performance difference
between 9.3 and 9.4 releases. In my tests I used a large indexing buffer
(2Gb) and large heap (5Gb) to end up with a single segment for both 9.3 and
9.4 tests, but still see a big drop in QPS in 9.4.

Thank you.

On Fri, Sep 9, 2022 at 12:21 PM Alan Woodward <romseygeek@gmail.com> wrote:

> Done. Thanks!
>
> > On 9 Sep 2022, at 16:32, Michael Sokolov <msokolov@gmail.com> wrote:
> >
> > Hi Alan - I checked out the interval queries patch; seems pretty safe,
> > please go ahead and port to 9.4. Thanks!
> >
> > Mike
> >
> > On Fri, Sep 9, 2022 at 10:41 AM Alan Woodward <romseygeek@gmail.com>
> wrote:
> >>
> >> Hi Mike,
> >>
> >> I’ve opened https://github.com/apache/lucene/pull/11760 as a small bug
> fix PR for a problem with interval queries. Am I OK to port this to the
> 9.4 branch?
> >>
> >> Thanks, Alan
> >>
> >> On 2 Sep 2022, at 20:42, Michael Sokolov <msokolov@gmail.com> wrote:
> >>
> >> NOTICE:
> >>
> >> Branch branch_9_4 has been cut and versions updated to 9.5 on stable
> branch.
> >>
> >> Please observe the normal rules:
> >>
> >> * No new features may be committed to the branch.
> >> * Documentation patches, build patches and serious bug fixes may be
> >> committed to the branch. However, you should submit all patches you
> >> want to commit to Jira first to give others the chance to review
> >> and possibly vote against the patch. Keep in mind that it is our
> >> main intention to keep the branch as stable as possible.
> >> * All patches that are intended for the branch should first be committed
> >> to the unstable branch, merged into the stable branch, and then into
> >> the current release branch.
> >> * Normal unstable and stable branch development may continue as usual.
> >> However, if you plan to commit a big change to the unstable branch
> >> while the branch feature freeze is in effect, think twice: can't the
> >> addition wait a couple more days? Merges of bug fixes into the branch
> >> may become more difficult.
> >> * Only Jira issues with Fix version 9.4 and priority "Blocker" will
> delay
> >> a release candidate build.
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: dev-help@lucene.apache.org
> >>
> >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: dev-help@lucene.apache.org
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Re: Subject: New branch and feature freeze for Lucene 9.4.0 [ In reply to ]

Sep 12, 2022, 3:28 PM

Post #14 of 37 (1230 views)

Hi Mayya, thanks for persisting - I think we need to wrestle this to
the ground for sure. In the test I ran, RAM buffer was the default
checked in, which is weirdly: 1994MB. I did not specifically set heap
size. I used maxConn/M=200. I'll try with larger buffer to see if I
can get 9.4 to produce a single segment for the same test settings. I
see you used a much smaller M (16), which should have produced quite
small graphs, and I agree, should have been a single segment. Were you
able to verify the number of segments?

Agree that decrease in recall is not expected when more segments are produced.

On Mon, Sep 12, 2022 at 1:51 PM Mayya Sharipova
<mayya.sharipova@elastic.co.invalid> wrote:
>
> Hello Michael,
> Thanks for checking.
> Sorry for bringing this up again.
> First of all, I am ok with proceeding with the Lucene 9.4 release and leaving the performance investigations for later.
>
> I am interested in what's the maxConn/M value you used for your tests? What was the heap memory and the size of the RAM buffer for indexing?
> Usually, when we have multiple segments, recall should increase, not decrease. But I agree that with multiple segments we can see a big drop in QPS.
>
> Here is my investigation with detailed output of the performance difference between 9.3 and 9.4 releases. In my tests I used a large indexing buffer (2Gb) and large heap (5Gb) to end up with a single segment for both 9.3 and 9.4 tests, but still see a big drop in QPS in 9.4.
>
> Thank you.
>
>
>
>
>
> On Fri, Sep 9, 2022 at 12:21 PM Alan Woodward <romseygeek@gmail.com> wrote:
>>
>> Done. Thanks!
>>
>> > On 9 Sep 2022, at 16:32, Michael Sokolov <msokolov@gmail.com> wrote:
>> >
>> > Hi Alan - I checked out the interval queries patch; seems pretty safe,
>> > please go ahead and port to 9.4. Thanks!
>> >
>> > Mike
>> >
>> > On Fri, Sep 9, 2022 at 10:41 AM Alan Woodward <romseygeek@gmail.com> wrote:
>> >>
>> >> Hi Mike,
>> >>
>> >> I’ve opened https://github.com/apache/lucene/pull/11760 as a small bug fix PR for a problem with interval queries. Am I OK to port this to the 9.4 branch?
>> >>
>> >> Thanks, Alan
>> >>
>> >> On 2 Sep 2022, at 20:42, Michael Sokolov <msokolov@gmail.com> wrote:
>> >>
>> >> NOTICE:
>> >>
>> >> Branch branch_9_4 has been cut and versions updated to 9.5 on stable branch.
>> >>
>> >> Please observe the normal rules:
>> >>
>> >> * No new features may be committed to the branch.
>> >> * Documentation patches, build patches and serious bug fixes may be
>> >> committed to the branch. However, you should submit all patches you
>> >> want to commit to Jira first to give others the chance to review
>> >> and possibly vote against the patch. Keep in mind that it is our
>> >> main intention to keep the branch as stable as possible.
>> >> * All patches that are intended for the branch should first be committed
>> >> to the unstable branch, merged into the stable branch, and then into
>> >> the current release branch.
>> >> * Normal unstable and stable branch development may continue as usual.
>> >> However, if you plan to commit a big change to the unstable branch
>> >> while the branch feature freeze is in effect, think twice: can't the
>> >> addition wait a couple more days? Merges of bug fixes into the branch
>> >> may become more difficult.
>> >> * Only Jira issues with Fix version 9.4 and priority "Blocker" will delay
>> >> a release candidate build.
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> >> For additional commands, e-mail: dev-help@lucene.apache.org
>> >>
>> >>
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> > For additional commands, e-mail: dev-help@lucene.apache.org
>> >
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Subject: New branch and feature freeze for Lucene 9.4.0 [ In reply to ]

Sep 13, 2022, 5:50 AM

Post #15 of 37 (1222 views)

I ran another test. I thought I had increased the RAM buffer size to
8G and heap to 16G. However I still see two segments in the index that
was created. And looking at the infostream I see:

dir=MMapDirectory@/local/home/sokolovm/workspace/knn-perf/glove-100-angular.hdf5-train-200-200.index
lockFactory=org\
.apache.lucene.store.NativeFSLockFactory@4466af20
index=
version=9.4.0
analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
ramBufferSizeMB=8000.0
maxBufferedDocs=-1
...
perThreadHardLimitMB=1945
...
DWPT 0 [2022-09-13T02:42:53.329404950Z; main]: flush postings as
segment _6 numDocs=555373
IW 0 [2022-09-13T02:42:53.330671171Z; main]: 0 msec to write norms
IW 0 [2022-09-13T02:42:53.331113184Z; main]: 0 msec to write docValues
IW 0 [2022-09-13T02:42:53.331320146Z; main]: 0 msec to write points
IW 0 [2022-09-13T02:42:56.424195657Z; main]: 3092 msec to write vectors
IW 0 [2022-09-13T02:42:56.429239944Z; main]: 4 msec to finish stored fields
IW 0 [2022-09-13T02:42:56.429593512Z; main]: 0 msec to write postings
and finish vectors
IW 0 [2022-09-13T02:42:56.430309031Z; main]: 0 msec to write fieldInfos
DWPT 0 [2022-09-13T02:42:56.431721622Z; main]: new segment has 0 deleted docs
DWPT 0 [2022-09-13T02:42:56.431921144Z; main]: new segment has 0
soft-deleted docs
DWPT 0 [2022-09-13T02:42:56.435738086Z; main]: new segment has no
vectors; no norms; no docValues; no prox; freqs
DWPT 0 [2022-09-13T02:42:56.435952356Z; main]:
flushedFiles=[_6_Lucene94HnswVectorsFormat_0.vec, _6.fdm, _6.fdt, _6_\
Lucene94HnswVectorsFormat_0.vem, _6.fnm, _6.fdx,
_6_Lucene94HnswVectorsFormat_0.vex]
DWPT 0 [2022-09-13T02:42:56.436121861Z; main]: flushed codec=Lucene94
DWPT 0 [2022-09-13T02:42:56.437691468Z; main]: flushed: segment=_6
ramUsed=1,945.002 MB newFlushedSize=1,065.701 MB \
docs/MB=521.134

so I think it's this perThreadHardLimit that is triggering the
flushes? TBH this isn't something I had seen before; but the docs say:

/**
* Expert: Sets the maximum memory consumption per thread triggering
a forced flush if exceeded. A
* {@link DocumentsWriterPerThread} is forcefully flushed once it
exceeds this limit even if the
* {@link #getRAMBufferSizeMB()} has not been exceeded. This is a
safety limit to prevent a {@link
* DocumentsWriterPerThread} from address space exhaustion due to
its internal 32 bit signed
* integer based memory addressing. The given value must be less
that 2GB (2048MB)
*
* @see #DEFAULT_RAM_PER_THREAD_HARD_LIMIT_MB
*/

On Mon, Sep 12, 2022 at 6:28 PM Michael Sokolov <msokolov@gmail.com> wrote:
>
> Hi Mayya, thanks for persisting - I think we need to wrestle this to
> the ground for sure. In the test I ran, RAM buffer was the default
> checked in, which is weirdly: 1994MB. I did not specifically set heap
> size. I used maxConn/M=200. I'll try with larger buffer to see if I
> can get 9.4 to produce a single segment for the same test settings. I
> see you used a much smaller M (16), which should have produced quite
> small graphs, and I agree, should have been a single segment. Were you
> able to verify the number of segments?
>
> Agree that decrease in recall is not expected when more segments are produced.
>
> On Mon, Sep 12, 2022 at 1:51 PM Mayya Sharipova
> <mayya.sharipova@elastic.co.invalid> wrote:
> >
> > Hello Michael,
> > Thanks for checking.
> > Sorry for bringing this up again.
> > First of all, I am ok with proceeding with the Lucene 9.4 release and leaving the performance investigations for later.
> >
> > I am interested in what's the maxConn/M value you used for your tests? What was the heap memory and the size of the RAM buffer for indexing?
> > Usually, when we have multiple segments, recall should increase, not decrease. But I agree that with multiple segments we can see a big drop in QPS.
> >
> > Here is my investigation with detailed output of the performance difference between 9.3 and 9.4 releases. In my tests I used a large indexing buffer (2Gb) and large heap (5Gb) to end up with a single segment for both 9.3 and 9.4 tests, but still see a big drop in QPS in 9.4.
> >
> > Thank you.
> >
> >
> >
> >
> >
> > On Fri, Sep 9, 2022 at 12:21 PM Alan Woodward <romseygeek@gmail.com> wrote:
> >>
> >> Done. Thanks!
> >>
> >> > On 9 Sep 2022, at 16:32, Michael Sokolov <msokolov@gmail.com> wrote:
> >> >
> >> > Hi Alan - I checked out the interval queries patch; seems pretty safe,
> >> > please go ahead and port to 9.4. Thanks!
> >> >
> >> > Mike
> >> >
> >> > On Fri, Sep 9, 2022 at 10:41 AM Alan Woodward <romseygeek@gmail.com> wrote:
> >> >>
> >> >> Hi Mike,
> >> >>
> >> >> I’ve opened https://github.com/apache/lucene/pull/11760 as a small bug fix PR for a problem with interval queries. Am I OK to port this to the 9.4 branch?
> >> >>
> >> >> Thanks, Alan
> >> >>
> >> >> On 2 Sep 2022, at 20:42, Michael Sokolov <msokolov@gmail.com> wrote:
> >> >>
> >> >> NOTICE:
> >> >>
> >> >> Branch branch_9_4 has been cut and versions updated to 9.5 on stable branch.
> >> >>
> >> >> Please observe the normal rules:
> >> >>
> >> >> * No new features may be committed to the branch.
> >> >> * Documentation patches, build patches and serious bug fixes may be
> >> >> committed to the branch. However, you should submit all patches you
> >> >> want to commit to Jira first to give others the chance to review
> >> >> and possibly vote against the patch. Keep in mind that it is our
> >> >> main intention to keep the branch as stable as possible.
> >> >> * All patches that are intended for the branch should first be committed
> >> >> to the unstable branch, merged into the stable branch, and then into
> >> >> the current release branch.
> >> >> * Normal unstable and stable branch development may continue as usual.
> >> >> However, if you plan to commit a big change to the unstable branch
> >> >> while the branch feature freeze is in effect, think twice: can't the
> >> >> addition wait a couple more days? Merges of bug fixes into the branch
> >> >> may become more difficult.
> >> >> * Only Jira issues with Fix version 9.4 and priority "Blocker" will delay
> >> >> a release candidate build.
> >> >>
> >> >> ---------------------------------------------------------------------
> >> >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> >> >> For additional commands, e-mail: dev-help@lucene.apache.org
> >> >>
> >> >>
> >> >
> >> > ---------------------------------------------------------------------
> >> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> >> > For additional commands, e-mail: dev-help@lucene.apache.org
> >> >
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: dev-help@lucene.apache.org
> >>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Subject: New branch and feature freeze for Lucene 9.4.0 [ In reply to ]

Sep 13, 2022, 6:29 AM

Post #16 of 37 (1222 views)

As a follow-up, I ran a test using the same parameters as above, only
changing M=200 to M=16. This did result in a single segment in both
cases (9.3, 9.4) and the performance was pretty similar; within noise
I think. The main difference I saw was that the 9.3 index was written
using CFS:

9.4:
recall latency nDoc fanout maxConn beamWidth visited index ms
0.755 1.36 1000000 100 16 100 200 891402 1.00
post-filter
-rw-r--r-- 1 sokolovm amazon 382M Sep 13 13:06
_0_Lucene94HnswVectorsFormat_0.vec
-rw-r--r-- 1 sokolovm amazon 262K Sep 13 13:06
_0_Lucene94HnswVectorsFormat_0.vem
-rw-r--r-- 1 sokolovm amazon 131M Sep 13 13:06
_0_Lucene94HnswVectorsFormat_0.vex

9.3:
recall latency nDoc fanout maxConn beamWidth visited index ms
0.775 1.34 1000000 100 16 100 4033 977043
rw-r--r-- 1 sokolovm amazon 297 Sep 13 13:26 _0.cfe
-rw-r--r-- 1 sokolovm amazon 516M Sep 13 13:26 _0.cfs
-rw-r--r-- 1 sokolovm amazon 340 Sep 13 13:26 _0.si

On Tue, Sep 13, 2022 at 8:50 AM Michael Sokolov <msokolov@gmail.com> wrote:
>
> I ran another test. I thought I had increased the RAM buffer size to
> 8G and heap to 16G. However I still see two segments in the index that
> was created. And looking at the infostream I see:
>
> dir=MMapDirectory@/local/home/sokolovm/workspace/knn-perf/glove-100-angular.hdf5-train-200-200.index
> lockFactory=org\
> .apache.lucene.store.NativeFSLockFactory@4466af20
> index=
> version=9.4.0
> analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
> ramBufferSizeMB=8000.0
> maxBufferedDocs=-1
> ...
> perThreadHardLimitMB=1945
> ...
> DWPT 0 [2022-09-13T02:42:53.329404950Z; main]: flush postings as
> segment _6 numDocs=555373
> IW 0 [2022-09-13T02:42:53.330671171Z; main]: 0 msec to write norms
> IW 0 [2022-09-13T02:42:53.331113184Z; main]: 0 msec to write docValues
> IW 0 [2022-09-13T02:42:53.331320146Z; main]: 0 msec to write points
> IW 0 [2022-09-13T02:42:56.424195657Z; main]: 3092 msec to write vectors
> IW 0 [2022-09-13T02:42:56.429239944Z; main]: 4 msec to finish stored fields
> IW 0 [2022-09-13T02:42:56.429593512Z; main]: 0 msec to write postings
> and finish vectors
> IW 0 [2022-09-13T02:42:56.430309031Z; main]: 0 msec to write fieldInfos
> DWPT 0 [2022-09-13T02:42:56.431721622Z; main]: new segment has 0 deleted docs
> DWPT 0 [2022-09-13T02:42:56.431921144Z; main]: new segment has 0
> soft-deleted docs
> DWPT 0 [2022-09-13T02:42:56.435738086Z; main]: new segment has no
> vectors; no norms; no docValues; no prox; freqs
> DWPT 0 [2022-09-13T02:42:56.435952356Z; main]:
> flushedFiles=[_6_Lucene94HnswVectorsFormat_0.vec, _6.fdm, _6.fdt, _6_\
> Lucene94HnswVectorsFormat_0.vem, _6.fnm, _6.fdx,
> _6_Lucene94HnswVectorsFormat_0.vex]
> DWPT 0 [2022-09-13T02:42:56.436121861Z; main]: flushed codec=Lucene94
> DWPT 0 [2022-09-13T02:42:56.437691468Z; main]: flushed: segment=_6
> ramUsed=1,945.002 MB newFlushedSize=1,065.701 MB \
> docs/MB=521.134
>
> so I think it's this perThreadHardLimit that is triggering the
> flushes? TBH this isn't something I had seen before; but the docs say:
>
> /**
> * Expert: Sets the maximum memory consumption per thread triggering
> a forced flush if exceeded. A
> * {@link DocumentsWriterPerThread} is forcefully flushed once it
> exceeds this limit even if the
> * {@link #getRAMBufferSizeMB()} has not been exceeded. This is a
> safety limit to prevent a {@link
> * DocumentsWriterPerThread} from address space exhaustion due to
> its internal 32 bit signed
> * integer based memory addressing. The given value must be less
> that 2GB (2048MB)
> *
> * @see #DEFAULT_RAM_PER_THREAD_HARD_LIMIT_MB
> */
>
> On Mon, Sep 12, 2022 at 6:28 PM Michael Sokolov <msokolov@gmail.com> wrote:
> >
> > Hi Mayya, thanks for persisting - I think we need to wrestle this to
> > the ground for sure. In the test I ran, RAM buffer was the default
> > checked in, which is weirdly: 1994MB. I did not specifically set heap
> > size. I used maxConn/M=200. I'll try with larger buffer to see if I
> > can get 9.4 to produce a single segment for the same test settings. I
> > see you used a much smaller M (16), which should have produced quite
> > small graphs, and I agree, should have been a single segment. Were you
> > able to verify the number of segments?
> >
> > Agree that decrease in recall is not expected when more segments are produced.
> >
> > On Mon, Sep 12, 2022 at 1:51 PM Mayya Sharipova
> > <mayya.sharipova@elastic.co.invalid> wrote:
> > >
> > > Hello Michael,
> > > Thanks for checking.
> > > Sorry for bringing this up again.
> > > First of all, I am ok with proceeding with the Lucene 9.4 release and leaving the performance investigations for later.
> > >
> > > I am interested in what's the maxConn/M value you used for your tests? What was the heap memory and the size of the RAM buffer for indexing?
> > > Usually, when we have multiple segments, recall should increase, not decrease. But I agree that with multiple segments we can see a big drop in QPS.
> > >
> > > Here is my investigation with detailed output of the performance difference between 9.3 and 9.4 releases. In my tests I used a large indexing buffer (2Gb) and large heap (5Gb) to end up with a single segment for both 9.3 and 9.4 tests, but still see a big drop in QPS in 9.4.
> > >
> > > Thank you.
> > >
> > >
> > >
> > >
> > >
> > > On Fri, Sep 9, 2022 at 12:21 PM Alan Woodward <romseygeek@gmail.com> wrote:
> > >>
> > >> Done. Thanks!
> > >>
> > >> > On 9 Sep 2022, at 16:32, Michael Sokolov <msokolov@gmail.com> wrote:
> > >> >
> > >> > Hi Alan - I checked out the interval queries patch; seems pretty safe,
> > >> > please go ahead and port to 9.4. Thanks!
> > >> >
> > >> > Mike
> > >> >
> > >> > On Fri, Sep 9, 2022 at 10:41 AM Alan Woodward <romseygeek@gmail.com> wrote:
> > >> >>
> > >> >> Hi Mike,
> > >> >>
> > >> >> I’ve opened https://github.com/apache/lucene/pull/11760 as a small bug fix PR for a problem with interval queries. Am I OK to port this to the 9.4 branch?
> > >> >>
> > >> >> Thanks, Alan
> > >> >>
> > >> >> On 2 Sep 2022, at 20:42, Michael Sokolov <msokolov@gmail.com> wrote:
> > >> >>
> > >> >> NOTICE:
> > >> >>
> > >> >> Branch branch_9_4 has been cut and versions updated to 9.5 on stable branch.
> > >> >>
> > >> >> Please observe the normal rules:
> > >> >>
> > >> >> * No new features may be committed to the branch.
> > >> >> * Documentation patches, build patches and serious bug fixes may be
> > >> >> committed to the branch. However, you should submit all patches you
> > >> >> want to commit to Jira first to give others the chance to review
> > >> >> and possibly vote against the patch. Keep in mind that it is our
> > >> >> main intention to keep the branch as stable as possible.
> > >> >> * All patches that are intended for the branch should first be committed
> > >> >> to the unstable branch, merged into the stable branch, and then into
> > >> >> the current release branch.
> > >> >> * Normal unstable and stable branch development may continue as usual.
> > >> >> However, if you plan to commit a big change to the unstable branch
> > >> >> while the branch feature freeze is in effect, think twice: can't the
> > >> >> addition wait a couple more days? Merges of bug fixes into the branch
> > >> >> may become more difficult.
> > >> >> * Only Jira issues with Fix version 9.4 and priority "Blocker" will delay
> > >> >> a release candidate build.
> > >> >>
> > >> >> ---------------------------------------------------------------------
> > >> >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> > >> >> For additional commands, e-mail: dev-help@lucene.apache.org
> > >> >>
> > >> >>
> > >> >
> > >> > ---------------------------------------------------------------------
> > >> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> > >> > For additional commands, e-mail: dev-help@lucene.apache.org
> > >> >
> > >>
> > >>
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> > >> For additional commands, e-mail: dev-help@lucene.apache.org
> > >>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Subject: New branch and feature freeze for Lucene 9.4.0 [ In reply to ]

Sep 13, 2022, 7:11 AM

Post #17 of 37 (1222 views)

Thanks for running more tests, Michael.
It is encouraging that you saw a similar performance between 9.3 and 9.4. I
will also run more tests with different parameters.

On Tue, Sep 13, 2022 at 9:30 AM Michael Sokolov <msokolov@gmail.com> wrote:

> As a follow-up, I ran a test using the same parameters as above, only
> changing M=200 to M=16. This did result in a single segment in both
> cases (9.3, 9.4) and the performance was pretty similar; within noise
> I think. The main difference I saw was that the 9.3 index was written
> using CFS:
>
> 9.4:
> recall latency nDoc fanout maxConn beamWidth visited index ms
> 0.755 1.36 1000000 100 16 100 200 891402 1.00
> post-filter
> -rw-r--r-- 1 sokolovm amazon 382M Sep 13 13:06
> _0_Lucene94HnswVectorsFormat_0.vec
> -rw-r--r-- 1 sokolovm amazon 262K Sep 13 13:06
> _0_Lucene94HnswVectorsFormat_0.vem
> -rw-r--r-- 1 sokolovm amazon 131M Sep 13 13:06
> _0_Lucene94HnswVectorsFormat_0.vex
>
> 9.3:
> recall latency nDoc fanout maxConn beamWidth visited index ms
> 0.775 1.34 1000000 100 16 100 4033 977043
> rw-r--r-- 1 sokolovm amazon 297 Sep 13 13:26 _0.cfe
> -rw-r--r-- 1 sokolovm amazon 516M Sep 13 13:26 _0.cfs
> -rw-r--r-- 1 sokolovm amazon 340 Sep 13 13:26 _0.si
>
> On Tue, Sep 13, 2022 at 8:50 AM Michael Sokolov <msokolov@gmail.com>
> wrote:
> >
> > I ran another test. I thought I had increased the RAM buffer size to
> > 8G and heap to 16G. However I still see two segments in the index that
> > was created. And looking at the infostream I see:
> >
> > dir=MMapDirectory@
> /local/home/sokolovm/workspace/knn-perf/glove-100-angular.hdf5-train-200-200.index
> > lockFactory=org\
> > .apache.lucene.store.NativeFSLockFactory@4466af20
> > index=
> > version=9.4.0
> > analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
> > ramBufferSizeMB=8000.0
> > maxBufferedDocs=-1
> > ...
> > perThreadHardLimitMB=1945
> > ...
> > DWPT 0 [2022-09-13T02:42:53.329404950Z; main]: flush postings as
> > segment _6 numDocs=555373
> > IW 0 [2022-09-13T02:42:53.330671171Z; main]: 0 msec to write norms
> > IW 0 [2022-09-13T02:42:53.331113184Z; main]: 0 msec to write docValues
> > IW 0 [2022-09-13T02:42:53.331320146Z; main]: 0 msec to write points
> > IW 0 [2022-09-13T02:42:56.424195657Z; main]: 3092 msec to write vectors
> > IW 0 [2022-09-13T02:42:56.429239944Z; main]: 4 msec to finish stored
> fields
> > IW 0 [2022-09-13T02:42:56.429593512Z; main]: 0 msec to write postings
> > and finish vectors
> > IW 0 [2022-09-13T02:42:56.430309031Z; main]: 0 msec to write fieldInfos
> > DWPT 0 [2022-09-13T02:42:56.431721622Z; main]: new segment has 0 deleted
> docs
> > DWPT 0 [2022-09-13T02:42:56.431921144Z; main]: new segment has 0
> > soft-deleted docs
> > DWPT 0 [2022-09-13T02:42:56.435738086Z; main]: new segment has no
> > vectors; no norms; no docValues; no prox; freqs
> > DWPT 0 [2022-09-13T02:42:56.435952356Z; main]:
> > flushedFiles=[._6_Lucene94HnswVectorsFormat_0.vec, _6.fdm, _6.fdt, _6_\
> > Lucene94HnswVectorsFormat_0.vem, _6.fnm, _6.fdx,
> > _6_Lucene94HnswVectorsFormat_0.vex]
> > DWPT 0 [2022-09-13T02:42:56.436121861Z; main]: flushed codec=Lucene94
> > DWPT 0 [2022-09-13T02:42:56.437691468Z; main]: flushed: segment=_6
> > ramUsed=1,945.002 MB newFlushedSize=1,065.701 MB \
> > docs/MB=521.134
> >
> > so I think it's this perThreadHardLimit that is triggering the
> > flushes? TBH this isn't something I had seen before; but the docs say:
> >
> > /**
> > * Expert: Sets the maximum memory consumption per thread triggering
> > a forced flush if exceeded. A
> > * {@link DocumentsWriterPerThread} is forcefully flushed once it
> > exceeds this limit even if the
> > * {@link #getRAMBufferSizeMB()} has not been exceeded. This is a
> > safety limit to prevent a {@link
> > * DocumentsWriterPerThread} from address space exhaustion due to
> > its internal 32 bit signed
> > * integer based memory addressing. The given value must be less
> > that 2GB (2048MB)
> > *
> > * @see #DEFAULT_RAM_PER_THREAD_HARD_LIMIT_MB
> > */
> >
> > On Mon, Sep 12, 2022 at 6:28 PM Michael Sokolov <msokolov@gmail.com>
> wrote:
> > >
> > > Hi Mayya, thanks for persisting - I think we need to wrestle this to
> > > the ground for sure. In the test I ran, RAM buffer was the default
> > > checked in, which is weirdly: 1994MB. I did not specifically set heap
> > > size. I used maxConn/M=200. I'll try with larger buffer to see if I
> > > can get 9.4 to produce a single segment for the same test settings. I
> > > see you used a much smaller M (16), which should have produced quite
> > > small graphs, and I agree, should have been a single segment. Were you
> > > able to verify the number of segments?
> > >
> > > Agree that decrease in recall is not expected when more segments are
> produced.
> > >
> > > On Mon, Sep 12, 2022 at 1:51 PM Mayya Sharipova
> > > <mayya.sharipova@elastic.co.invalid> wrote:
> > > >
> > > > Hello Michael,
> > > > Thanks for checking.
> > > > Sorry for bringing this up again.
> > > > First of all, I am ok with proceeding with the Lucene 9.4 release
> and leaving the performance investigations for later.
> > > >
> > > > I am interested in what's the maxConn/M value you used for your
> tests? What was the heap memory and the size of the RAM buffer for indexing?
> > > > Usually, when we have multiple segments, recall should increase, not
> decrease. But I agree that with multiple segments we can see a big drop in
> QPS.
> > > >
> > > > Here is my investigation with detailed output of the performance
> difference between 9.3 and 9.4 releases. In my tests I used a large
> indexing buffer (2Gb) and large heap (5Gb) to end up with a single segment
> for both 9.3 and 9.4 tests, but still see a big drop in QPS in 9.4.
> > > >
> > > > Thank you.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Fri, Sep 9, 2022 at 12:21 PM Alan Woodward <romseygeek@gmail.com>
> wrote:
> > > >>
> > > >> Done. Thanks!
> > > >>
> > > >> > On 9 Sep 2022, at 16:32, Michael Sokolov <msokolov@gmail.com>
> wrote:
> > > >> >
> > > >> > Hi Alan - I checked out the interval queries patch; seems pretty
> safe,
> > > >> > please go ahead and port to 9.4. Thanks!
> > > >> >
> > > >> > Mike
> > > >> >
> > > >> > On Fri, Sep 9, 2022 at 10:41 AM Alan Woodward <
> romseygeek@gmail.com> wrote:
> > > >> >>
> > > >> >> Hi Mike,
> > > >> >>
> > > >> >> I’ve opened https://github.com/apache/lucene/pull/11760 as a
> small bug fix PR for a problem with interval queries. Am I OK to port this
> to the 9.4 branch?
> > > >> >>
> > > >> >> Thanks, Alan
> > > >> >>
> > > >> >> On 2 Sep 2022, at 20:42, Michael Sokolov <msokolov@gmail.com>
> wrote:
> > > >> >>
> > > >> >> NOTICE:
> > > >> >>
> > > >> >> Branch branch_9_4 has been cut and versions updated to 9.5 on
> stable branch.
> > > >> >>
> > > >> >> Please observe the normal rules:
> > > >> >>
> > > >> >> * No new features may be committed to the branch.
> > > >> >> * Documentation patches, build patches and serious bug fixes may
> be
> > > >> >> committed to the branch. However, you should submit all patches
> you
> > > >> >> want to commit to Jira first to give others the chance to review
> > > >> >> and possibly vote against the patch. Keep in mind that it is our
> > > >> >> main intention to keep the branch as stable as possible.
> > > >> >> * All patches that are intended for the branch should first be
> committed
> > > >> >> to the unstable branch, merged into the stable branch, and then
> into
> > > >> >> the current release branch.
> > > >> >> * Normal unstable and stable branch development may continue as
> usual.
> > > >> >> However, if you plan to commit a big change to the unstable
> branch
> > > >> >> while the branch feature freeze is in effect, think twice: can't
> the
> > > >> >> addition wait a couple more days? Merges of bug fixes into the
> branch
> > > >> >> may become more difficult.
> > > >> >> * Only Jira issues with Fix version 9.4 and priority "Blocker"
> will delay
> > > >> >> a release candidate build.
> > > >> >>
> > > >> >>
> ---------------------------------------------------------------------
> > > >> >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> > > >> >> For additional commands, e-mail: dev-help@lucene.apache.org
> > > >> >>
> > > >> >>
> > > >> >
> > > >> >
> ---------------------------------------------------------------------
> > > >> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> > > >> > For additional commands, e-mail: dev-help@lucene.apache.org
> > > >> >
> > > >>
> > > >>
> > > >>
> ---------------------------------------------------------------------
> > > >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> > > >> For additional commands, e-mail: dev-help@lucene.apache.org
> > > >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Re: Subject: New branch and feature freeze for Lucene 9.4.0 [ In reply to ]

Sep 15, 2022, 7:54 AM

Post #18 of 37 (1206 views)

Mike, I'm tempted to backport https://github.com/apache/lucene/pull/1068 to
branch_9_4, which is a bugfix that looks pretty safe to me. What do you
think?

On Tue, Sep 13, 2022 at 4:11 PM Mayya Sharipova
<mayya.sharipova@elastic.co.invalid> wrote:

> Thanks for running more tests, Michael.
> It is encouraging that you saw a similar performance between 9.3 and 9.4.
> I will also run more tests with different parameters.
>
> On Tue, Sep 13, 2022 at 9:30 AM Michael Sokolov <msokolov@gmail.com>
> wrote:
>
>> As a follow-up, I ran a test using the same parameters as above, only
>> changing M=200 to M=16. This did result in a single segment in both
>> cases (9.3, 9.4) and the performance was pretty similar; within noise
>> I think. The main difference I saw was that the 9.3 index was written
>> using CFS:
>>
>> 9.4:
>> recall latency nDoc fanout maxConn beamWidth visited index ms
>> 0.755 1.36 1000000 100 16 100 200 891402 1.00
>> post-filter
>> -rw-r--r-- 1 sokolovm amazon 382M Sep 13 13:06
>> _0_Lucene94HnswVectorsFormat_0.vec
>> -rw-r--r-- 1 sokolovm amazon 262K Sep 13 13:06
>> _0_Lucene94HnswVectorsFormat_0.vem
>> -rw-r--r-- 1 sokolovm amazon 131M Sep 13 13:06
>> _0_Lucene94HnswVectorsFormat_0.vex
>>
>> 9.3:
>> recall latency nDoc fanout maxConn beamWidth visited index ms
>> 0.775 1.34 1000000 100 16 100 4033 977043
>> rw-r--r-- 1 sokolovm amazon 297 Sep 13 13:26 _0.cfe
>> -rw-r--r-- 1 sokolovm amazon 516M Sep 13 13:26 _0.cfs
>> -rw-r--r-- 1 sokolovm amazon 340 Sep 13 13:26 _0.si
>>
>> On Tue, Sep 13, 2022 at 8:50 AM Michael Sokolov <msokolov@gmail.com>
>> wrote:
>> >
>> > I ran another test. I thought I had increased the RAM buffer size to
>> > 8G and heap to 16G. However I still see two segments in the index that
>> > was created. And looking at the infostream I see:
>> >
>> > dir=MMapDirectory@
>> /local/home/sokolovm/workspace/knn-perf/glove-100-angular.hdf5-train-200-200.index
>> > lockFactory=org\
>> > .apache.lucene.store.NativeFSLockFactory@4466af20
>> > index=
>> > version=9.4.0
>> > analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
>> > ramBufferSizeMB=8000.0
>> > maxBufferedDocs=-1
>> > ...
>> > perThreadHardLimitMB=1945
>> > ...
>> > DWPT 0 [2022-09-13T02:42:53.329404950Z; main]: flush postings as
>> > segment _6 numDocs=555373
>> > IW 0 [2022-09-13T02:42:53.330671171Z; main]: 0 msec to write norms
>> > IW 0 [2022-09-13T02:42:53.331113184Z; main]: 0 msec to write docValues
>> > IW 0 [2022-09-13T02:42:53.331320146Z; main]: 0 msec to write points
>> > IW 0 [2022-09-13T02:42:56.424195657Z; main]: 3092 msec to write vectors
>> > IW 0 [2022-09-13T02:42:56.429239944Z; main]: 4 msec to finish stored
>> fields
>> > IW 0 [2022-09-13T02:42:56.429593512Z; main]: 0 msec to write postings
>> > and finish vectors
>> > IW 0 [2022-09-13T02:42:56.430309031Z; main]: 0 msec to write fieldInfos
>> > DWPT 0 [2022-09-13T02:42:56.431721622Z; main]: new segment has 0
>> deleted docs
>> > DWPT 0 [2022-09-13T02:42:56.431921144Z; main]: new segment has 0
>> > soft-deleted docs
>> > DWPT 0 [2022-09-13T02:42:56.435738086Z; main]: new segment has no
>> > vectors; no norms; no docValues; no prox; freqs
>> > DWPT 0 [2022-09-13T02:42:56.435952356Z; main]:
>> > flushedFiles=[._6_Lucene94HnswVectorsFormat_0.vec, _6.fdm, _6.fdt, _6_\
>> > Lucene94HnswVectorsFormat_0.vem, _6.fnm, _6.fdx,
>> > _6_Lucene94HnswVectorsFormat_0.vex]
>> > DWPT 0 [2022-09-13T02:42:56.436121861Z; main]: flushed codec=Lucene94
>> > DWPT 0 [2022-09-13T02:42:56.437691468Z; main]: flushed: segment=_6
>> > ramUsed=1,945.002 MB newFlushedSize=1,065.701 MB \
>> > docs/MB=521.134
>> >
>> > so I think it's this perThreadHardLimit that is triggering the
>> > flushes? TBH this isn't something I had seen before; but the docs say:
>> >
>> > /**
>> > * Expert: Sets the maximum memory consumption per thread triggering
>> > a forced flush if exceeded. A
>> > * {@link DocumentsWriterPerThread} is forcefully flushed once it
>> > exceeds this limit even if the
>> > * {@link #getRAMBufferSizeMB()} has not been exceeded. This is a
>> > safety limit to prevent a {@link
>> > * DocumentsWriterPerThread} from address space exhaustion due to
>> > its internal 32 bit signed
>> > * integer based memory addressing. The given value must be less
>> > that 2GB (2048MB)
>> > *
>> > * @see #DEFAULT_RAM_PER_THREAD_HARD_LIMIT_MB
>> > */
>> >
>> > On Mon, Sep 12, 2022 at 6:28 PM Michael Sokolov <msokolov@gmail.com>
>> wrote:
>> > >
>> > > Hi Mayya, thanks for persisting - I think we need to wrestle this to
>> > > the ground for sure. In the test I ran, RAM buffer was the default
>> > > checked in, which is weirdly: 1994MB. I did not specifically set heap
>> > > size. I used maxConn/M=200. I'll try with larger buffer to see if I
>> > > can get 9.4 to produce a single segment for the same test settings. I
>> > > see you used a much smaller M (16), which should have produced quite
>> > > small graphs, and I agree, should have been a single segment. Were you
>> > > able to verify the number of segments?
>> > >
>> > > Agree that decrease in recall is not expected when more segments are
>> produced.
>> > >
>> > > On Mon, Sep 12, 2022 at 1:51 PM Mayya Sharipova
>> > > <mayya.sharipova@elastic.co.invalid> wrote:
>> > > >
>> > > > Hello Michael,
>> > > > Thanks for checking.
>> > > > Sorry for bringing this up again.
>> > > > First of all, I am ok with proceeding with the Lucene 9.4 release
>> and leaving the performance investigations for later.
>> > > >
>> > > > I am interested in what's the maxConn/M value you used for your
>> tests? What was the heap memory and the size of the RAM buffer for indexing?
>> > > > Usually, when we have multiple segments, recall should increase,
>> not decrease. But I agree that with multiple segments we can see a big drop
>> in QPS.
>> > > >
>> > > > Here is my investigation with detailed output of the performance
>> difference between 9.3 and 9.4 releases. In my tests I used a large
>> indexing buffer (2Gb) and large heap (5Gb) to end up with a single segment
>> for both 9.3 and 9.4 tests, but still see a big drop in QPS in 9.4.
>> > > >
>> > > > Thank you.
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > > On Fri, Sep 9, 2022 at 12:21 PM Alan Woodward <romseygeek@gmail.com>
>> wrote:
>> > > >>
>> > > >> Done. Thanks!
>> > > >>
>> > > >> > On 9 Sep 2022, at 16:32, Michael Sokolov <msokolov@gmail.com>
>> wrote:
>> > > >> >
>> > > >> > Hi Alan - I checked out the interval queries patch; seems pretty
>> safe,
>> > > >> > please go ahead and port to 9.4. Thanks!
>> > > >> >
>> > > >> > Mike
>> > > >> >
>> > > >> > On Fri, Sep 9, 2022 at 10:41 AM Alan Woodward <
>> romseygeek@gmail.com> wrote:
>> > > >> >>
>> > > >> >> Hi Mike,
>> > > >> >>
>> > > >> >> I’ve opened https://github.com/apache/lucene/pull/11760 as a
>> small bug fix PR for a problem with interval queries. Am I OK to port this
>> to the 9.4 branch?
>> > > >> >>
>> > > >> >> Thanks, Alan
>> > > >> >>
>> > > >> >> On 2 Sep 2022, at 20:42, Michael Sokolov <msokolov@gmail.com>
>> wrote:
>> > > >> >>
>> > > >> >> NOTICE:
>> > > >> >>
>> > > >> >> Branch branch_9_4 has been cut and versions updated to 9.5 on
>> stable branch.
>> > > >> >>
>> > > >> >> Please observe the normal rules:
>> > > >> >>
>> > > >> >> * No new features may be committed to the branch.
>> > > >> >> * Documentation patches, build patches and serious bug fixes
>> may be
>> > > >> >> committed to the branch. However, you should submit all patches
>> you
>> > > >> >> want to commit to Jira first to give others the chance to review
>> > > >> >> and possibly vote against the patch. Keep in mind that it is our
>> > > >> >> main intention to keep the branch as stable as possible.
>> > > >> >> * All patches that are intended for the branch should first be
>> committed
>> > > >> >> to the unstable branch, merged into the stable branch, and then
>> into
>> > > >> >> the current release branch.
>> > > >> >> * Normal unstable and stable branch development may continue as
>> usual.
>> > > >> >> However, if you plan to commit a big change to the unstable
>> branch
>> > > >> >> while the branch feature freeze is in effect, think twice:
>> can't the
>> > > >> >> addition wait a couple more days? Merges of bug fixes into the
>> branch
>> > > >> >> may become more difficult.
>> > > >> >> * Only Jira issues with Fix version 9.4 and priority "Blocker"
>> will delay
>> > > >> >> a release candidate build.
>> > > >> >>
>> > > >> >>
>> ---------------------------------------------------------------------
>> > > >> >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> > > >> >> For additional commands, e-mail: dev-help@lucene.apache.org
>> > > >> >>
>> > > >> >>
>> > > >> >
>> > > >> >
>> ---------------------------------------------------------------------
>> > > >> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> > > >> > For additional commands, e-mail: dev-help@lucene.apache.org
>> > > >> >
>> > > >>
>> > > >>
>> > > >>
>> ---------------------------------------------------------------------
>> > > >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> > > >> For additional commands, e-mail: dev-help@lucene.apache.org
>> > > >>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>>

--
Adrien

Re: Subject: New branch and feature freeze for Lucene 9.4.0 [ In reply to ]

Sep 15, 2022, 9:31 AM

Post #19 of 37 (1205 views)

it looks like a small bug fix, we have had on main (and 9.x?) for a
while now and no test failures showed up, I guess. Should be OK to
port. I plan to cut artifacts this weekend, or Monday at the latest,
but if you can do the backport today or tomorrow, that's fine by me.

On Thu, Sep 15, 2022 at 10:55 AM Adrien Grand <jpountz@gmail.com> wrote:
>
> Mike, I'm tempted to backport https://github.com/apache/lucene/pull/1068 to branch_9_4, which is a bugfix that looks pretty safe to me. What do you think?
>
> On Tue, Sep 13, 2022 at 4:11 PM Mayya Sharipova <mayya.sharipova@elastic.co.invalid> wrote:
>>
>> Thanks for running more tests, Michael.
>> It is encouraging that you saw a similar performance between 9.3 and 9.4. I will also run more tests with different parameters.
>>
>> On Tue, Sep 13, 2022 at 9:30 AM Michael Sokolov <msokolov@gmail.com> wrote:
>>>
>>> As a follow-up, I ran a test using the same parameters as above, only
>>> changing M=200 to M=16. This did result in a single segment in both
>>> cases (9.3, 9.4) and the performance was pretty similar; within noise
>>> I think. The main difference I saw was that the 9.3 index was written
>>> using CFS:
>>>
>>> 9.4:
>>> recall latency nDoc fanout maxConn beamWidth visited index ms
>>> 0.755 1.36 1000000 100 16 100 200 891402 1.00
>>> post-filter
>>> -rw-r--r-- 1 sokolovm amazon 382M Sep 13 13:06
>>> _0_Lucene94HnswVectorsFormat_0.vec
>>> -rw-r--r-- 1 sokolovm amazon 262K Sep 13 13:06
>>> _0_Lucene94HnswVectorsFormat_0.vem
>>> -rw-r--r-- 1 sokolovm amazon 131M Sep 13 13:06
>>> _0_Lucene94HnswVectorsFormat_0.vex
>>>
>>> 9.3:
>>> recall latency nDoc fanout maxConn beamWidth visited index ms
>>> 0.775 1.34 1000000 100 16 100 4033 977043
>>> rw-r--r-- 1 sokolovm amazon 297 Sep 13 13:26 _0.cfe
>>> -rw-r--r-- 1 sokolovm amazon 516M Sep 13 13:26 _0.cfs
>>> -rw-r--r-- 1 sokolovm amazon 340 Sep 13 13:26 _0.si
>>>
>>> On Tue, Sep 13, 2022 at 8:50 AM Michael Sokolov <msokolov@gmail.com> wrote:
>>> >
>>> > I ran another test. I thought I had increased the RAM buffer size to
>>> > 8G and heap to 16G. However I still see two segments in the index that
>>> > was created. And looking at the infostream I see:
>>> >
>>> > dir=MMapDirectory@/local/home/sokolovm/workspace/knn-perf/glove-100-angular.hdf5-train-200-200.index
>>> > lockFactory=org\
>>> > .apache.lucene.store.NativeFSLockFactory@4466af20
>>> > index=
>>> > version=9.4.0
>>> > analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
>>> > ramBufferSizeMB=8000.0
>>> > maxBufferedDocs=-1
>>> > ...
>>> > perThreadHardLimitMB=1945
>>> > ...
>>> > DWPT 0 [2022-09-13T02:42:53.329404950Z; main]: flush postings as
>>> > segment _6 numDocs=555373
>>> > IW 0 [2022-09-13T02:42:53.330671171Z; main]: 0 msec to write norms
>>> > IW 0 [2022-09-13T02:42:53.331113184Z; main]: 0 msec to write docValues
>>> > IW 0 [2022-09-13T02:42:53.331320146Z; main]: 0 msec to write points
>>> > IW 0 [2022-09-13T02:42:56.424195657Z; main]: 3092 msec to write vectors
>>> > IW 0 [2022-09-13T02:42:56.429239944Z; main]: 4 msec to finish stored fields
>>> > IW 0 [2022-09-13T02:42:56.429593512Z; main]: 0 msec to write postings
>>> > and finish vectors
>>> > IW 0 [2022-09-13T02:42:56.430309031Z; main]: 0 msec to write fieldInfos
>>> > DWPT 0 [2022-09-13T02:42:56.431721622Z; main]: new segment has 0 deleted docs
>>> > DWPT 0 [2022-09-13T02:42:56.431921144Z; main]: new segment has 0
>>> > soft-deleted docs
>>> > DWPT 0 [2022-09-13T02:42:56.435738086Z; main]: new segment has no
>>> > vectors; no norms; no docValues; no prox; freqs
>>> > DWPT 0 [2022-09-13T02:42:56.435952356Z; main]:
>>> > flushedFiles=[._6_Lucene94HnswVectorsFormat_0.vec, _6.fdm, _6.fdt, _6_\
>>> > Lucene94HnswVectorsFormat_0.vem, _6.fnm, _6.fdx,
>>> > _6_Lucene94HnswVectorsFormat_0.vex]
>>> > DWPT 0 [2022-09-13T02:42:56.436121861Z; main]: flushed codec=Lucene94
>>> > DWPT 0 [2022-09-13T02:42:56.437691468Z; main]: flushed: segment=_6
>>> > ramUsed=1,945.002 MB newFlushedSize=1,065.701 MB \
>>> > docs/MB=521.134
>>> >
>>> > so I think it's this perThreadHardLimit that is triggering the
>>> > flushes? TBH this isn't something I had seen before; but the docs say:
>>> >
>>> > /**
>>> > * Expert: Sets the maximum memory consumption per thread triggering
>>> > a forced flush if exceeded. A
>>> > * {@link DocumentsWriterPerThread} is forcefully flushed once it
>>> > exceeds this limit even if the
>>> > * {@link #getRAMBufferSizeMB()} has not been exceeded. This is a
>>> > safety limit to prevent a {@link
>>> > * DocumentsWriterPerThread} from address space exhaustion due to
>>> > its internal 32 bit signed
>>> > * integer based memory addressing. The given value must be less
>>> > that 2GB (2048MB)
>>> > *
>>> > * @see #DEFAULT_RAM_PER_THREAD_HARD_LIMIT_MB
>>> > */
>>> >
>>> > On Mon, Sep 12, 2022 at 6:28 PM Michael Sokolov <msokolov@gmail.com> wrote:
>>> > >
>>> > > Hi Mayya, thanks for persisting - I think we need to wrestle this to
>>> > > the ground for sure. In the test I ran, RAM buffer was the default
>>> > > checked in, which is weirdly: 1994MB. I did not specifically set heap
>>> > > size. I used maxConn/M=200. I'll try with larger buffer to see if I
>>> > > can get 9.4 to produce a single segment for the same test settings. I
>>> > > see you used a much smaller M (16), which should have produced quite
>>> > > small graphs, and I agree, should have been a single segment. Were you
>>> > > able to verify the number of segments?
>>> > >
>>> > > Agree that decrease in recall is not expected when more segments are produced.
>>> > >
>>> > > On Mon, Sep 12, 2022 at 1:51 PM Mayya Sharipova
>>> > > <mayya.sharipova@elastic.co.invalid> wrote:
>>> > > >
>>> > > > Hello Michael,
>>> > > > Thanks for checking.
>>> > > > Sorry for bringing this up again.
>>> > > > First of all, I am ok with proceeding with the Lucene 9.4 release and leaving the performance investigations for later.
>>> > > >
>>> > > > I am interested in what's the maxConn/M value you used for your tests? What was the heap memory and the size of the RAM buffer for indexing?
>>> > > > Usually, when we have multiple segments, recall should increase, not decrease. But I agree that with multiple segments we can see a big drop in QPS.
>>> > > >
>>> > > > Here is my investigation with detailed output of the performance difference between 9.3 and 9.4 releases. In my tests I used a large indexing buffer (2Gb) and large heap (5Gb) to end up with a single segment for both 9.3 and 9.4 tests, but still see a big drop in QPS in 9.4.
>>> > > >
>>> > > > Thank you.
>>> > > >
>>> > > >
>>> > > >
>>> > > >
>>> > > >
>>> > > > On Fri, Sep 9, 2022 at 12:21 PM Alan Woodward <romseygeek@gmail.com> wrote:
>>> > > >>
>>> > > >> Done. Thanks!
>>> > > >>
>>> > > >> > On 9 Sep 2022, at 16:32, Michael Sokolov <msokolov@gmail.com> wrote:
>>> > > >> >
>>> > > >> > Hi Alan - I checked out the interval queries patch; seems pretty safe,
>>> > > >> > please go ahead and port to 9.4. Thanks!
>>> > > >> >
>>> > > >> > Mike
>>> > > >> >
>>> > > >> > On Fri, Sep 9, 2022 at 10:41 AM Alan Woodward <romseygeek@gmail.com> wrote:
>>> > > >> >>
>>> > > >> >> Hi Mike,
>>> > > >> >>
>>> > > >> >> I’ve opened https://github.com/apache/lucene/pull/11760 as a small bug fix PR for a problem with interval queries. Am I OK to port this to the 9.4 branch?
>>> > > >> >>
>>> > > >> >> Thanks, Alan
>>> > > >> >>
>>> > > >> >> On 2 Sep 2022, at 20:42, Michael Sokolov <msokolov@gmail.com> wrote:
>>> > > >> >>
>>> > > >> >> NOTICE:
>>> > > >> >>
>>> > > >> >> Branch branch_9_4 has been cut and versions updated to 9.5 on stable branch.
>>> > > >> >>
>>> > > >> >> Please observe the normal rules:
>>> > > >> >>
>>> > > >> >> * No new features may be committed to the branch.
>>> > > >> >> * Documentation patches, build patches and serious bug fixes may be
>>> > > >> >> committed to the branch. However, you should submit all patches you
>>> > > >> >> want to commit to Jira first to give others the chance to review
>>> > > >> >> and possibly vote against the patch. Keep in mind that it is our
>>> > > >> >> main intention to keep the branch as stable as possible.
>>> > > >> >> * All patches that are intended for the branch should first be committed
>>> > > >> >> to the unstable branch, merged into the stable branch, and then into
>>> > > >> >> the current release branch.
>>> > > >> >> * Normal unstable and stable branch development may continue as usual.
>>> > > >> >> However, if you plan to commit a big change to the unstable branch
>>> > > >> >> while the branch feature freeze is in effect, think twice: can't the
>>> > > >> >> addition wait a couple more days? Merges of bug fixes into the branch
>>> > > >> >> may become more difficult.
>>> > > >> >> * Only Jira issues with Fix version 9.4 and priority "Blocker" will delay
>>> > > >> >> a release candidate build.
>>> > > >> >>
>>> > > >> >> ---------------------------------------------------------------------
>>> > > >> >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>> > > >> >> For additional commands, e-mail: dev-help@lucene.apache.org
>>> > > >> >>
>>> > > >> >>
>>> > > >> >
>>> > > >> > ---------------------------------------------------------------------
>>> > > >> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>> > > >> > For additional commands, e-mail: dev-help@lucene.apache.org
>>> > > >> >
>>> > > >>
>>> > > >>
>>> > > >> ---------------------------------------------------------------------
>>> > > >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>> > > >> For additional commands, e-mail: dev-help@lucene.apache.org
>>> > > >>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>
>
>
> --
> Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Subject: New branch and feature freeze for Lucene 9.4.0 [ In reply to ]

Sep 15, 2022, 5:02 PM

Post #20 of 37 (1197 views)

Hello! I also ran some local vector search benchmarks on branch_9_4. I
found that given the same parameters, there is a significant change in
recall/ QPS before and after the initial "enable quantization to 8-bit"
backport (https://github.com/apache/lucene/pull/1054). Here's an example
with M=16, efConst=100, plus fanout=100:

*BEFORE (commit 45d06772cb62a518dab43ca9d7000a2dc707d345)*
Algorithm
Recall QPS
luceneknn dim=100 {'M': 16, 'efConstruction': 100} 0.828 788.259

*AFTER (commit 87a8f7d48f3d25ed008b5930cf83afc01d7dc588, with compile fixes
for Java 11)*
Algorithm
Recall QPS
luceneknn dim=100 {'M': 16, 'efConstruction': 100} 0.792 813.534

This is really unexpected, because the commit didn't intend to change
anything about the core ANN algorithm. Looking at Mike's results, the
recall changes from 0.775 -> 0.755, which is a significant difference -- if
we didn't change the algorithm, I don't think the recall should change
between runs. It seems important to look into before the release!

Looking at the nightly benchmarks, there wasn't a drop in vector search
performance. I'm guessing this is because we only report QPS, and after
this commit the QPS actually improved a little (for some given parameters).
I don't think we have any checks for if the recall has dropped?

Julie

On Thu, Sep 15, 2022 at 9:32 AM Michael Sokolov <msokolov@gmail.com> wrote:

> it looks like a small bug fix, we have had on main (and 9.x?) for a
> while now and no test failures showed up, I guess. Should be OK to
> port. I plan to cut artifacts this weekend, or Monday at the latest,
> but if you can do the backport today or tomorrow, that's fine by me.
>
> On Thu, Sep 15, 2022 at 10:55 AM Adrien Grand <jpountz@gmail.com> wrote:
> >
> > Mike, I'm tempted to backport https://github.com/apache/lucene/pull/1068
> to branch_9_4, which is a bugfix that looks pretty safe to me. What do you
> think?
> >
> > On Tue, Sep 13, 2022 at 4:11 PM Mayya Sharipova <
> mayya.sharipova@elastic.co.invalid> wrote:
> >>
> >> Thanks for running more tests, Michael.
> >> It is encouraging that you saw a similar performance between 9.3 and
> 9.4. I will also run more tests with different parameters.
> >>
> >> On Tue, Sep 13, 2022 at 9:30 AM Michael Sokolov <msokolov@gmail.com>
> wrote:
> >>>
> >>> As a follow-up, I ran a test using the same parameters as above, only
> >>> changing M=200 to M=16. This did result in a single segment in both
> >>> cases (9.3, 9.4) and the performance was pretty similar; within noise
> >>> I think. The main difference I saw was that the 9.3 index was written
> >>> using CFS:
> >>>
> >>> 9.4:
> >>> recall latency nDoc fanout maxConn beamWidth visited index
> ms
> >>> 0.755 1.36 1000000 100 16 100 200 891402 1.00
> >>> post-filter
> >>> -rw-r--r-- 1 sokolovm amazon 382M Sep 13 13:06
> >>> _0_Lucene94HnswVectorsFormat_0.vec
> >>> -rw-r--r-- 1 sokolovm amazon 262K Sep 13 13:06
> >>> _0_Lucene94HnswVectorsFormat_0.vem
> >>> -rw-r--r-- 1 sokolovm amazon 131M Sep 13 13:06
> >>> _0_Lucene94HnswVectorsFormat_0.vex
> >>>
> >>> 9.3:
> >>> recall latency nDoc fanout maxConn beamWidth visited index
> ms
> >>> 0.775 1.34 1000000 100 16 100 4033 977043
> >>> rw-r--r-- 1 sokolovm amazon 297 Sep 13 13:26 _0.cfe
> >>> -rw-r--r-- 1 sokolovm amazon 516M Sep 13 13:26 _0.cfs
> >>> -rw-r--r-- 1 sokolovm amazon 340 Sep 13 13:26 _0.si
> >>>
> >>> On Tue, Sep 13, 2022 at 8:50 AM Michael Sokolov <msokolov@gmail.com>
> wrote:
> >>> >
> >>> > I ran another test. I thought I had increased the RAM buffer size to
> >>> > 8G and heap to 16G. However I still see two segments in the index
> that
> >>> > was created. And looking at the infostream I see:
> >>> >
> >>> > dir=MMapDirectory@
> /local/home/sokolovm/workspace/knn-perf/glove-100-angular.hdf5-train-200-200.index
> >>> > lockFactory=org\
> >>> > .apache.lucene.store.NativeFSLockFactory@4466af20
> >>> > index=
> >>> > version=9.4.0
> >>> > analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
> >>> > ramBufferSizeMB=8000.0
> >>> > maxBufferedDocs=-1
> >>> > ...
> >>> > perThreadHardLimitMB=1945
> >>> > ...
> >>> > DWPT 0 [2022-09-13T02:42:53.329404950Z; main]: flush postings as
> >>> > segment _6 numDocs=555373
> >>> > IW 0 [2022-09-13T02:42:53.330671171Z; main]: 0 msec to write norms
> >>> > IW 0 [2022-09-13T02:42:53.331113184Z; main]: 0 msec to write
> docValues
> >>> > IW 0 [2022-09-13T02:42:53.331320146Z; main]: 0 msec to write points
> >>> > IW 0 [2022-09-13T02:42:56.424195657Z; main]: 3092 msec to write
> vectors
> >>> > IW 0 [2022-09-13T02:42:56.429239944Z; main]: 4 msec to finish stored
> fields
> >>> > IW 0 [2022-09-13T02:42:56.429593512Z; main]: 0 msec to write postings
> >>> > and finish vectors
> >>> > IW 0 [2022-09-13T02:42:56.430309031Z; main]: 0 msec to write
> fieldInfos
> >>> > DWPT 0 [2022-09-13T02:42:56.431721622Z; main]: new segment has 0
> deleted docs
> >>> > DWPT 0 [2022-09-13T02:42:56.431921144Z; main]: new segment has 0
> >>> > soft-deleted docs
> >>> > DWPT 0 [2022-09-13T02:42:56.435738086Z; main]: new segment has no
> >>> > vectors; no norms; no docValues; no prox; freqs
> >>> > DWPT 0 [2022-09-13T02:42:56.435952356Z; main]:
> >>> > flushedFiles=[_6_Lucene94HnswVectorsFormat_0.vec, _6.fdm, _6.fdt,
> _6_\
> >>> > Lucene94HnswVectorsFormat_0.vem, _6.fnm, _6.fdx,
> >>> > _6_Lucene94HnswVectorsFormat_0.vex]
> >>> > DWPT 0 [2022-09-13T02:42:56.436121861Z; main]: flushed codec=Lucene94
> >>> > DWPT 0 [2022-09-13T02:42:56.437691468Z; main]: flushed: segment=_6
> >>> > ramUsed=1,945.002 MB newFlushedSize=1,065.701 MB \
> >>> > docs/MB=521.134
> >>> >
> >>> > so I think it's this perThreadHardLimit that is triggering the
> >>> > flushes? TBH this isn't something I had seen before; but the docs
> say:
> >>> >
> >>> > /**
> >>> > * Expert: Sets the maximum memory consumption per thread
> triggering
> >>> > a forced flush if exceeded. A
> >>> > * {@link DocumentsWriterPerThread} is forcefully flushed once it
> >>> > exceeds this limit even if the
> >>> > * {@link #getRAMBufferSizeMB()} has not been exceeded. This is a
> >>> > safety limit to prevent a {@link
> >>> > * DocumentsWriterPerThread} from address space exhaustion due to
> >>> > its internal 32 bit signed
> >>> > * integer based memory addressing. The given value must be less
> >>> > that 2GB (2048MB)
> >>> > *
> >>> > * @see #DEFAULT_RAM_PER_THREAD_HARD_LIMIT_MB
> >>> > */
> >>> >
> >>> > On Mon, Sep 12, 2022 at 6:28 PM Michael Sokolov <msokolov@gmail.com>
> wrote:
> >>> > >
> >>> > > Hi Mayya, thanks for persisting - I think we need to wrestle this
> to
> >>> > > the ground for sure. In the test I ran, RAM buffer was the default
> >>> > > checked in, which is weirdly: 1994MB. I did not specifically set
> heap
> >>> > > size. I used maxConn/M=200. I'll try with larger buffer to see if
> I
> >>> > > can get 9.4 to produce a single segment for the same test
> settings. I
> >>> > > see you used a much smaller M (16), which should have produced
> quite
> >>> > > small graphs, and I agree, should have been a single segment. Were
> you
> >>> > > able to verify the number of segments?
> >>> > >
> >>> > > Agree that decrease in recall is not expected when more segments
> are produced.
> >>> > >
> >>> > > On Mon, Sep 12, 2022 at 1:51 PM Mayya Sharipova
> >>> > > <mayya.sharipova@elastic.co.invalid> wrote:
> >>> > > >
> >>> > > > Hello Michael,
> >>> > > > Thanks for checking.
> >>> > > > Sorry for bringing this up again.
> >>> > > > First of all, I am ok with proceeding with the Lucene 9.4
> release and leaving the performance investigations for later.
> >>> > > >
> >>> > > > I am interested in what's the maxConn/M value you used for your
> tests? What was the heap memory and the size of the RAM buffer for indexing?
> >>> > > > Usually, when we have multiple segments, recall should increase,
> not decrease. But I agree that with multiple segments we can see a big drop
> in QPS.
> >>> > > >
> >>> > > > Here is my investigation with detailed output of the performance
> difference between 9.3 and 9.4 releases. In my tests I used a large
> indexing buffer (2Gb) and large heap (5Gb) to end up with a single segment
> for both 9.3 and 9.4 tests, but still see a big drop in QPS in 9.4.
> >>> > > >
> >>> > > > Thank you.
> >>> > > >
> >>> > > >
> >>> > > >
> >>> > > >
> >>> > > >
> >>> > > > On Fri, Sep 9, 2022 at 12:21 PM Alan Woodward <
> romseygeek@gmail.com> wrote:
> >>> > > >>
> >>> > > >> Done. Thanks!
> >>> > > >>
> >>> > > >> > On 9 Sep 2022, at 16:32, Michael Sokolov <msokolov@gmail.com>
> wrote:
> >>> > > >> >
> >>> > > >> > Hi Alan - I checked out the interval queries patch; seems
> pretty safe,
> >>> > > >> > please go ahead and port to 9.4. Thanks!
> >>> > > >> >
> >>> > > >> > Mike
> >>> > > >> >
> >>> > > >> > On Fri, Sep 9, 2022 at 10:41 AM Alan Woodward <
> romseygeek@gmail.com> wrote:
> >>> > > >> >>
> >>> > > >> >> Hi Mike,
> >>> > > >> >>
> >>> > > >> >> I’ve opened https://github.com/apache/lucene/pull/11760 as
> a small bug fix PR for a problem with interval queries. Am I OK to port
> this to the 9.4 branch?
> >>> > > >> >>
> >>> > > >> >> Thanks, Alan
> >>> > > >> >>
> >>> > > >> >> On 2 Sep 2022, at 20:42, Michael Sokolov <msokolov@gmail.com>
> wrote:
> >>> > > >> >>
> >>> > > >> >> NOTICE:
> >>> > > >> >>
> >>> > > >> >> Branch branch_9_4 has been cut and versions updated to 9.5
> on stable branch.
> >>> > > >> >>
> >>> > > >> >> Please observe the normal rules:
> >>> > > >> >>
> >>> > > >> >> * No new features may be committed to the branch.
> >>> > > >> >> * Documentation patches, build patches and serious bug fixes
> may be
> >>> > > >> >> committed to the branch. However, you should submit all
> patches you
> >>> > > >> >> want to commit to Jira first to give others the chance to
> review
> >>> > > >> >> and possibly vote against the patch. Keep in mind that it is
> our
> >>> > > >> >> main intention to keep the branch as stable as possible.
> >>> > > >> >> * All patches that are intended for the branch should first
> be committed
> >>> > > >> >> to the unstable branch, merged into the stable branch, and
> then into
> >>> > > >> >> the current release branch.
> >>> > > >> >> * Normal unstable and stable branch development may continue
> as usual.
> >>> > > >> >> However, if you plan to commit a big change to the unstable
> branch
> >>> > > >> >> while the branch feature freeze is in effect, think twice:
> can't the
> >>> > > >> >> addition wait a couple more days? Merges of bug fixes into
> the branch
> >>> > > >> >> may become more difficult.
> >>> > > >> >> * Only Jira issues with Fix version 9.4 and priority
> "Blocker" will delay
> >>> > > >> >> a release candidate build.
> >>> > > >> >>
> >>> > > >> >>
> ---------------------------------------------------------------------
> >>> > > >> >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> >>> > > >> >> For additional commands, e-mail: dev-help@lucene.apache.org
> >>> > > >> >>
> >>> > > >> >>
> >>> > > >> >
> >>> > > >> >
> ---------------------------------------------------------------------
> >>> > > >> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> >>> > > >> > For additional commands, e-mail: dev-help@lucene.apache.org
> >>> > > >> >
> >>> > > >>
> >>> > > >>
> >>> > > >>
> ---------------------------------------------------------------------
> >>> > > >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> >>> > > >> For additional commands, e-mail: dev-help@lucene.apache.org
> >>> > > >>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> >>> For additional commands, e-mail: dev-help@lucene.apache.org
> >>>
> >
> >
> > --
> > Adrien
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Re: Subject: New branch and feature freeze for Lucene 9.4.0 [ In reply to ]

Sep 16, 2022, 8:06 AM

Post #21 of 37 (1191 views)

Thank you Mike, I just backported the change.

On Thu, Sep 15, 2022 at 6:32 PM Michael Sokolov <msokolov@gmail.com> wrote:

> it looks like a small bug fix, we have had on main (and 9.x?) for a
> while now and no test failures showed up, I guess. Should be OK to
> port. I plan to cut artifacts this weekend, or Monday at the latest,
> but if you can do the backport today or tomorrow, that's fine by me.
>
> On Thu, Sep 15, 2022 at 10:55 AM Adrien Grand <jpountz@gmail.com> wrote:
> >
> > Mike, I'm tempted to backport https://github.com/apache/lucene/pull/1068
> to branch_9_4, which is a bugfix that looks pretty safe to me. What do you
> think?
> >
> > On Tue, Sep 13, 2022 at 4:11 PM Mayya Sharipova <
> mayya.sharipova@elastic.co.invalid> wrote:
> >>
> >> Thanks for running more tests, Michael.
> >> It is encouraging that you saw a similar performance between 9.3 and
> 9.4. I will also run more tests with different parameters.
> >>
> >> On Tue, Sep 13, 2022 at 9:30 AM Michael Sokolov <msokolov@gmail.com>
> wrote:
> >>>
> >>> As a follow-up, I ran a test using the same parameters as above, only
> >>> changing M=200 to M=16. This did result in a single segment in both
> >>> cases (9.3, 9.4) and the performance was pretty similar; within noise
> >>> I think. The main difference I saw was that the 9.3 index was written
> >>> using CFS:
> >>>
> >>> 9.4:
> >>> recall latency nDoc fanout maxConn beamWidth visited index
> ms
> >>> 0.755 1.36 1000000 100 16 100 200 891402 1.00
> >>> post-filter
> >>> -rw-r--r-- 1 sokolovm amazon 382M Sep 13 13:06
> >>> _0_Lucene94HnswVectorsFormat_0.vec
> >>> -rw-r--r-- 1 sokolovm amazon 262K Sep 13 13:06
> >>> _0_Lucene94HnswVectorsFormat_0.vem
> >>> -rw-r--r-- 1 sokolovm amazon 131M Sep 13 13:06
> >>> _0_Lucene94HnswVectorsFormat_0.vex
> >>>
> >>> 9.3:
> >>> recall latency nDoc fanout maxConn beamWidth visited index
> ms
> >>> 0.775 1.34 1000000 100 16 100 4033 977043
> >>> rw-r--r-- 1 sokolovm amazon 297 Sep 13 13:26 _0.cfe
> >>> -rw-r--r-- 1 sokolovm amazon 516M Sep 13 13:26 _0.cfs
> >>> -rw-r--r-- 1 sokolovm amazon 340 Sep 13 13:26 _0.si
> >>>
> >>> On Tue, Sep 13, 2022 at 8:50 AM Michael Sokolov <msokolov@gmail.com>
> wrote:
> >>> >
> >>> > I ran another test. I thought I had increased the RAM buffer size to
> >>> > 8G and heap to 16G. However I still see two segments in the index
> that
> >>> > was created. And looking at the infostream I see:
> >>> >
> >>> > dir=MMapDirectory@
> /local/home/sokolovm/workspace/knn-perf/glove-100-angular.hdf5-train-200-200.index
> >>> > lockFactory=org\
> >>> > .apache.lucene.store.NativeFSLockFactory@4466af20
> >>> > index=
> >>> > version=9.4.0
> >>> > analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
> >>> > ramBufferSizeMB=8000.0
> >>> > maxBufferedDocs=-1
> >>> > ...
> >>> > perThreadHardLimitMB=1945
> >>> > ...
> >>> > DWPT 0 [2022-09-13T02:42:53.329404950Z; main]: flush postings as
> >>> > segment _6 numDocs=555373
> >>> > IW 0 [2022-09-13T02:42:53.330671171Z; main]: 0 msec to write norms
> >>> > IW 0 [2022-09-13T02:42:53.331113184Z; main]: 0 msec to write
> docValues
> >>> > IW 0 [2022-09-13T02:42:53.331320146Z; main]: 0 msec to write points
> >>> > IW 0 [2022-09-13T02:42:56.424195657Z; main]: 3092 msec to write
> vectors
> >>> > IW 0 [2022-09-13T02:42:56.429239944Z; main]: 4 msec to finish stored
> fields
> >>> > IW 0 [2022-09-13T02:42:56.429593512Z; main]: 0 msec to write postings
> >>> > and finish vectors
> >>> > IW 0 [2022-09-13T02:42:56.430309031Z; main]: 0 msec to write
> fieldInfos
> >>> > DWPT 0 [2022-09-13T02:42:56.431721622Z; main]: new segment has 0
> deleted docs
> >>> > DWPT 0 [2022-09-13T02:42:56.431921144Z; main]: new segment has 0
> >>> > soft-deleted docs
> >>> > DWPT 0 [2022-09-13T02:42:56.435738086Z; main]: new segment has no
> >>> > vectors; no norms; no docValues; no prox; freqs
> >>> > DWPT 0 [2022-09-13T02:42:56.435952356Z; main]:
> >>> > flushedFiles=[_6_Lucene94HnswVectorsFormat_0.vec, _6.fdm, _6.fdt,
> _6_\
> >>> > Lucene94HnswVectorsFormat_0.vem, _6.fnm, _6.fdx,
> >>> > _6_Lucene94HnswVectorsFormat_0.vex]
> >>> > DWPT 0 [2022-09-13T02:42:56.436121861Z; main]: flushed codec=Lucene94
> >>> > DWPT 0 [2022-09-13T02:42:56.437691468Z; main]: flushed: segment=_6
> >>> > ramUsed=1,945.002 MB newFlushedSize=1,065.701 MB \
> >>> > docs/MB=521.134
> >>> >
> >>> > so I think it's this perThreadHardLimit that is triggering the
> >>> > flushes? TBH this isn't something I had seen before; but the docs
> say:
> >>> >
> >>> > /**
> >>> > * Expert: Sets the maximum memory consumption per thread
> triggering
> >>> > a forced flush if exceeded. A
> >>> > * {@link DocumentsWriterPerThread} is forcefully flushed once it
> >>> > exceeds this limit even if the
> >>> > * {@link #getRAMBufferSizeMB()} has not been exceeded. This is a
> >>> > safety limit to prevent a {@link
> >>> > * DocumentsWriterPerThread} from address space exhaustion due to
> >>> > its internal 32 bit signed
> >>> > * integer based memory addressing. The given value must be less
> >>> > that 2GB (2048MB)
> >>> > *
> >>> > * @see #DEFAULT_RAM_PER_THREAD_HARD_LIMIT_MB
> >>> > */
> >>> >
> >>> > On Mon, Sep 12, 2022 at 6:28 PM Michael Sokolov <msokolov@gmail.com>
> wrote:
> >>> > >
> >>> > > Hi Mayya, thanks for persisting - I think we need to wrestle this
> to
> >>> > > the ground for sure. In the test I ran, RAM buffer was the default
> >>> > > checked in, which is weirdly: 1994MB. I did not specifically set
> heap
> >>> > > size. I used maxConn/M=200. I'll try with larger buffer to see if
> I
> >>> > > can get 9.4 to produce a single segment for the same test
> settings. I
> >>> > > see you used a much smaller M (16), which should have produced
> quite
> >>> > > small graphs, and I agree, should have been a single segment. Were
> you
> >>> > > able to verify the number of segments?
> >>> > >
> >>> > > Agree that decrease in recall is not expected when more segments
> are produced.
> >>> > >
> >>> > > On Mon, Sep 12, 2022 at 1:51 PM Mayya Sharipova
> >>> > > <mayya.sharipova@elastic.co.invalid> wrote:
> >>> > > >
> >>> > > > Hello Michael,
> >>> > > > Thanks for checking.
> >>> > > > Sorry for bringing this up again.
> >>> > > > First of all, I am ok with proceeding with the Lucene 9.4
> release and leaving the performance investigations for later.
> >>> > > >
> >>> > > > I am interested in what's the maxConn/M value you used for your
> tests? What was the heap memory and the size of the RAM buffer for indexing?
> >>> > > > Usually, when we have multiple segments, recall should increase,
> not decrease. But I agree that with multiple segments we can see a big drop
> in QPS.
> >>> > > >
> >>> > > > Here is my investigation with detailed output of the performance
> difference between 9.3 and 9.4 releases. In my tests I used a large
> indexing buffer (2Gb) and large heap (5Gb) to end up with a single segment
> for both 9.3 and 9.4 tests, but still see a big drop in QPS in 9.4.
> >>> > > >
> >>> > > > Thank you.
> >>> > > >
> >>> > > >
> >>> > > >
> >>> > > >
> >>> > > >
> >>> > > > On Fri, Sep 9, 2022 at 12:21 PM Alan Woodward <
> romseygeek@gmail.com> wrote:
> >>> > > >>
> >>> > > >> Done. Thanks!
> >>> > > >>
> >>> > > >> > On 9 Sep 2022, at 16:32, Michael Sokolov <msokolov@gmail.com>
> wrote:
> >>> > > >> >
> >>> > > >> > Hi Alan - I checked out the interval queries patch; seems
> pretty safe,
> >>> > > >> > please go ahead and port to 9.4. Thanks!
> >>> > > >> >
> >>> > > >> > Mike
> >>> > > >> >
> >>> > > >> > On Fri, Sep 9, 2022 at 10:41 AM Alan Woodward <
> romseygeek@gmail.com> wrote:
> >>> > > >> >>
> >>> > > >> >> Hi Mike,
> >>> > > >> >>
> >>> > > >> >> I’ve opened https://github.com/apache/lucene/pull/11760 as
> a small bug fix PR for a problem with interval queries. Am I OK to port
> this to the 9.4 branch?
> >>> > > >> >>
> >>> > > >> >> Thanks, Alan
> >>> > > >> >>
> >>> > > >> >> On 2 Sep 2022, at 20:42, Michael Sokolov <msokolov@gmail.com>
> wrote:
> >>> > > >> >>
> >>> > > >> >> NOTICE:
> >>> > > >> >>
> >>> > > >> >> Branch branch_9_4 has been cut and versions updated to 9.5
> on stable branch.
> >>> > > >> >>
> >>> > > >> >> Please observe the normal rules:
> >>> > > >> >>
> >>> > > >> >> * No new features may be committed to the branch.
> >>> > > >> >> * Documentation patches, build patches and serious bug fixes
> may be
> >>> > > >> >> committed to the branch. However, you should submit all
> patches you
> >>> > > >> >> want to commit to Jira first to give others the chance to
> review
> >>> > > >> >> and possibly vote against the patch. Keep in mind that it is
> our
> >>> > > >> >> main intention to keep the branch as stable as possible.
> >>> > > >> >> * All patches that are intended for the branch should first
> be committed
> >>> > > >> >> to the unstable branch, merged into the stable branch, and
> then into
> >>> > > >> >> the current release branch.
> >>> > > >> >> * Normal unstable and stable branch development may continue
> as usual.
> >>> > > >> >> However, if you plan to commit a big change to the unstable
> branch
> >>> > > >> >> while the branch feature freeze is in effect, think twice:
> can't the
> >>> > > >> >> addition wait a couple more days? Merges of bug fixes into
> the branch
> >>> > > >> >> may become more difficult.
> >>> > > >> >> * Only Jira issues with Fix version 9.4 and priority
> "Blocker" will delay
> >>> > > >> >> a release candidate build.
> >>> > > >> >>
> >>> > > >> >>
> ---------------------------------------------------------------------
> >>> > > >> >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> >>> > > >> >> For additional commands, e-mail: dev-help@lucene.apache.org
> >>> > > >> >>
> >>> > > >> >>
> >>> > > >> >
> >>> > > >> >
> ---------------------------------------------------------------------
> >>> > > >> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> >>> > > >> > For additional commands, e-mail: dev-help@lucene.apache.org
> >>> > > >> >
> >>> > > >>
> >>> > > >>
> >>> > > >>
> ---------------------------------------------------------------------
> >>> > > >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> >>> > > >> For additional commands, e-mail: dev-help@lucene.apache.org
> >>> > > >>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> >>> For additional commands, e-mail: dev-help@lucene.apache.org
> >>>
> >
> >
> > --
> > Adrien
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

--
Adrien

Re: Subject: New branch and feature freeze for Lucene 9.4.0 [ In reply to ]

Sep 18, 2022, 12:25 PM

Post #22 of 37 (1159 views)

Thanks for the deep-dive Julie. I was able to reproduce the changing
recall. I had introduced some bugs in the diversity checks (that may
have partially canceled each other out? it's hard to understand what
was happening in the buggy case) and posted a fix today
https://github.com/apache/lucene/pull/11781.

There are a couple of other outstanding issues I found while doing a
bunch of git bisecting;

I think we might have introduced a (test-only) performance regression
in KnnGraphTester

We may still be over-allocating the size of NeighborArray, leading to
excessive segmentation? I wonder if we could avoid dynamic
re-allocation there, and simply initialize every neighbor array to
2*M+1.

While I don't think these are necessarily blockers, given that we are
releasing HNSW improvements, it seems like we should address these,
especially as the build-graph-on-index is one of the things we are
releasing, and it is (may be?) impacted. I will see if I can put up a
patch or two.

It would be great if you all are able to test again with
https://github.com/apache/lucene/pull/11781/ applied

-Mike

On Fri, Sep 16, 2022 at 11:07 AM Adrien Grand <jpountz@gmail.com> wrote:
>
> Thank you Mike, I just backported the change.
>
> On Thu, Sep 15, 2022 at 6:32 PM Michael Sokolov <msokolov@gmail.com> wrote:
>>
>> it looks like a small bug fix, we have had on main (and 9.x?) for a
>> while now and no test failures showed up, I guess. Should be OK to
>> port. I plan to cut artifacts this weekend, or Monday at the latest,
>> but if you can do the backport today or tomorrow, that's fine by me.
>>
>> On Thu, Sep 15, 2022 at 10:55 AM Adrien Grand <jpountz@gmail.com> wrote:
>> >
>> > Mike, I'm tempted to backport https://github.com/apache/lucene/pull/1068 to branch_9_4, which is a bugfix that looks pretty safe to me. What do you think?
>> >
>> > On Tue, Sep 13, 2022 at 4:11 PM Mayya Sharipova <mayya.sharipova@elastic.co.invalid> wrote:
>> >>
>> >> Thanks for running more tests, Michael.
>> >> It is encouraging that you saw a similar performance between 9.3 and 9.4. I will also run more tests with different parameters.
>> >>
>> >> On Tue, Sep 13, 2022 at 9:30 AM Michael Sokolov <msokolov@gmail.com> wrote:
>> >>>
>> >>> As a follow-up, I ran a test using the same parameters as above, only
>> >>> changing M=200 to M=16. This did result in a single segment in both
>> >>> cases (9.3, 9.4) and the performance was pretty similar; within noise
>> >>> I think. The main difference I saw was that the 9.3 index was written
>> >>> using CFS:
>> >>>
>> >>> 9.4:
>> >>> recall latency nDoc fanout maxConn beamWidth visited index ms
>> >>> 0.755 1.36 1000000 100 16 100 200 891402 1.00
>> >>> post-filter
>> >>> -rw-r--r-- 1 sokolovm amazon 382M Sep 13 13:06
>> >>> _0_Lucene94HnswVectorsFormat_0.vec
>> >>> -rw-r--r-- 1 sokolovm amazon 262K Sep 13 13:06
>> >>> _0_Lucene94HnswVectorsFormat_0.vem
>> >>> -rw-r--r-- 1 sokolovm amazon 131M Sep 13 13:06
>> >>> _0_Lucene94HnswVectorsFormat_0.vex
>> >>>
>> >>> 9.3:
>> >>> recall latency nDoc fanout maxConn beamWidth visited index ms
>> >>> 0.775 1.34 1000000 100 16 100 4033 977043
>> >>> rw-r--r-- 1 sokolovm amazon 297 Sep 13 13:26 _0.cfe
>> >>> -rw-r--r-- 1 sokolovm amazon 516M Sep 13 13:26 _0.cfs
>> >>> -rw-r--r-- 1 sokolovm amazon 340 Sep 13 13:26 _0.si
>> >>>
>> >>> On Tue, Sep 13, 2022 at 8:50 AM Michael Sokolov <msokolov@gmail.com> wrote:
>> >>> >
>> >>> > I ran another test. I thought I had increased the RAM buffer size to
>> >>> > 8G and heap to 16G. However I still see two segments in the index that
>> >>> > was created. And looking at the infostream I see:
>> >>> >
>> >>> > dir=MMapDirectory@/local/home/sokolovm/workspace/knn-perf/glove-100-angular.hdf5-train-200-200.index
>> >>> > lockFactory=org\
>> >>> > .apache.lucene.store.NativeFSLockFactory@4466af20
>> >>> > index=
>> >>> > version=9.4.0
>> >>> > analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
>> >>> > ramBufferSizeMB=8000.0
>> >>> > maxBufferedDocs=-1
>> >>> > ...
>> >>> > perThreadHardLimitMB=1945
>> >>> > ...
>> >>> > DWPT 0 [2022-09-13T02:42:53.329404950Z; main]: flush postings as
>> >>> > segment _6 numDocs=555373
>> >>> > IW 0 [2022-09-13T02:42:53.330671171Z; main]: 0 msec to write norms
>> >>> > IW 0 [2022-09-13T02:42:53.331113184Z; main]: 0 msec to write docValues
>> >>> > IW 0 [2022-09-13T02:42:53.331320146Z; main]: 0 msec to write points
>> >>> > IW 0 [2022-09-13T02:42:56.424195657Z; main]: 3092 msec to write vectors
>> >>> > IW 0 [2022-09-13T02:42:56.429239944Z; main]: 4 msec to finish stored fields
>> >>> > IW 0 [2022-09-13T02:42:56.429593512Z; main]: 0 msec to write postings
>> >>> > and finish vectors
>> >>> > IW 0 [2022-09-13T02:42:56.430309031Z; main]: 0 msec to write fieldInfos
>> >>> > DWPT 0 [2022-09-13T02:42:56.431721622Z; main]: new segment has 0 deleted docs
>> >>> > DWPT 0 [2022-09-13T02:42:56.431921144Z; main]: new segment has 0
>> >>> > soft-deleted docs
>> >>> > DWPT 0 [2022-09-13T02:42:56.435738086Z; main]: new segment has no
>> >>> > vectors; no norms; no docValues; no prox; freqs
>> >>> > DWPT 0 [2022-09-13T02:42:56.435952356Z; main]:
>> >>> > flushedFiles=[._6_Lucene94HnswVectorsFormat_0.vec, _6.fdm, _6.fdt, _6_\
>> >>> > Lucene94HnswVectorsFormat_0.vem, _6.fnm, _6.fdx,
>> >>> > _6_Lucene94HnswVectorsFormat_0.vex]
>> >>> > DWPT 0 [2022-09-13T02:42:56.436121861Z; main]: flushed codec=Lucene94
>> >>> > DWPT 0 [2022-09-13T02:42:56.437691468Z; main]: flushed: segment=_6
>> >>> > ramUsed=1,945.002 MB newFlushedSize=1,065.701 MB \
>> >>> > docs/MB=521.134
>> >>> >
>> >>> > so I think it's this perThreadHardLimit that is triggering the
>> >>> > flushes? TBH this isn't something I had seen before; but the docs say:
>> >>> >
>> >>> > /**
>> >>> > * Expert: Sets the maximum memory consumption per thread triggering
>> >>> > a forced flush if exceeded. A
>> >>> > * {@link DocumentsWriterPerThread} is forcefully flushed once it
>> >>> > exceeds this limit even if the
>> >>> > * {@link #getRAMBufferSizeMB()} has not been exceeded. This is a
>> >>> > safety limit to prevent a {@link
>> >>> > * DocumentsWriterPerThread} from address space exhaustion due to
>> >>> > its internal 32 bit signed
>> >>> > * integer based memory addressing. The given value must be less
>> >>> > that 2GB (2048MB)
>> >>> > *
>> >>> > * @see #DEFAULT_RAM_PER_THREAD_HARD_LIMIT_MB
>> >>> > */
>> >>> >
>> >>> > On Mon, Sep 12, 2022 at 6:28 PM Michael Sokolov <msokolov@gmail.com> wrote:
>> >>> > >
>> >>> > > Hi Mayya, thanks for persisting - I think we need to wrestle this to
>> >>> > > the ground for sure. In the test I ran, RAM buffer was the default
>> >>> > > checked in, which is weirdly: 1994MB. I did not specifically set heap
>> >>> > > size. I used maxConn/M=200. I'll try with larger buffer to see if I
>> >>> > > can get 9.4 to produce a single segment for the same test settings. I
>> >>> > > see you used a much smaller M (16), which should have produced quite
>> >>> > > small graphs, and I agree, should have been a single segment. Were you
>> >>> > > able to verify the number of segments?
>> >>> > >
>> >>> > > Agree that decrease in recall is not expected when more segments are produced.
>> >>> > >
>> >>> > > On Mon, Sep 12, 2022 at 1:51 PM Mayya Sharipova
>> >>> > > <mayya.sharipova@elastic.co.invalid> wrote:
>> >>> > > >
>> >>> > > > Hello Michael,
>> >>> > > > Thanks for checking.
>> >>> > > > Sorry for bringing this up again.
>> >>> > > > First of all, I am ok with proceeding with the Lucene 9.4 release and leaving the performance investigations for later.
>> >>> > > >
>> >>> > > > I am interested in what's the maxConn/M value you used for your tests? What was the heap memory and the size of the RAM buffer for indexing?
>> >>> > > > Usually, when we have multiple segments, recall should increase, not decrease. But I agree that with multiple segments we can see a big drop in QPS.
>> >>> > > >
>> >>> > > > Here is my investigation with detailed output of the performance difference between 9.3 and 9.4 releases. In my tests I used a large indexing buffer (2Gb) and large heap (5Gb) to end up with a single segment for both 9.3 and 9.4 tests, but still see a big drop in QPS in 9.4.
>> >>> > > >
>> >>> > > > Thank you.
>> >>> > > >
>> >>> > > >
>> >>> > > >
>> >>> > > >
>> >>> > > >
>> >>> > > > On Fri, Sep 9, 2022 at 12:21 PM Alan Woodward <romseygeek@gmail.com> wrote:
>> >>> > > >>
>> >>> > > >> Done. Thanks!
>> >>> > > >>
>> >>> > > >> > On 9 Sep 2022, at 16:32, Michael Sokolov <msokolov@gmail.com> wrote:
>> >>> > > >> >
>> >>> > > >> > Hi Alan - I checked out the interval queries patch; seems pretty safe,
>> >>> > > >> > please go ahead and port to 9.4. Thanks!
>> >>> > > >> >
>> >>> > > >> > Mike
>> >>> > > >> >
>> >>> > > >> > On Fri, Sep 9, 2022 at 10:41 AM Alan Woodward <romseygeek@gmail.com> wrote:
>> >>> > > >> >>
>> >>> > > >> >> Hi Mike,
>> >>> > > >> >>
>> >>> > > >> >> I’ve opened https://github.com/apache/lucene/pull/11760 as a small bug fix PR for a problem with interval queries. Am I OK to port this to the 9.4 branch?
>> >>> > > >> >>
>> >>> > > >> >> Thanks, Alan
>> >>> > > >> >>
>> >>> > > >> >> On 2 Sep 2022, at 20:42, Michael Sokolov <msokolov@gmail.com> wrote:
>> >>> > > >> >>
>> >>> > > >> >> NOTICE:
>> >>> > > >> >>
>> >>> > > >> >> Branch branch_9_4 has been cut and versions updated to 9.5 on stable branch.
>> >>> > > >> >>
>> >>> > > >> >> Please observe the normal rules:
>> >>> > > >> >>
>> >>> > > >> >> * No new features may be committed to the branch.
>> >>> > > >> >> * Documentation patches, build patches and serious bug fixes may be
>> >>> > > >> >> committed to the branch. However, you should submit all patches you
>> >>> > > >> >> want to commit to Jira first to give others the chance to review
>> >>> > > >> >> and possibly vote against the patch. Keep in mind that it is our
>> >>> > > >> >> main intention to keep the branch as stable as possible.
>> >>> > > >> >> * All patches that are intended for the branch should first be committed
>> >>> > > >> >> to the unstable branch, merged into the stable branch, and then into
>> >>> > > >> >> the current release branch.
>> >>> > > >> >> * Normal unstable and stable branch development may continue as usual.
>> >>> > > >> >> However, if you plan to commit a big change to the unstable branch
>> >>> > > >> >> while the branch feature freeze is in effect, think twice: can't the
>> >>> > > >> >> addition wait a couple more days? Merges of bug fixes into the branch
>> >>> > > >> >> may become more difficult.
>> >>> > > >> >> * Only Jira issues with Fix version 9.4 and priority "Blocker" will delay
>> >>> > > >> >> a release candidate build.
>> >>> > > >> >>
>> >>> > > >> >> ---------------------------------------------------------------------
>> >>> > > >> >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> >>> > > >> >> For additional commands, e-mail: dev-help@lucene.apache.org
>> >>> > > >> >>
>> >>> > > >> >>
>> >>> > > >> >
>> >>> > > >> > ---------------------------------------------------------------------
>> >>> > > >> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> >>> > > >> > For additional commands, e-mail: dev-help@lucene.apache.org
>> >>> > > >> >
>> >>> > > >>
>> >>> > > >>
>> >>> > > >> ---------------------------------------------------------------------
>> >>> > > >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> >>> > > >> For additional commands, e-mail: dev-help@lucene.apache.org
>> >>> > > >>
>> >>>
>> >>> ---------------------------------------------------------------------
>> >>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> >>> For additional commands, e-mail: dev-help@lucene.apache.org
>> >>>
>> >
>> >
>> > --
>> > Adrien
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>
>
> --
> Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Subject: New branch and feature freeze for Lucene 9.4.0 [ In reply to ]

Sep 18, 2022, 3:24 PM

Post #23 of 37 (1156 views)

OK, I think I was wrong about latency having increased due to a change
in KnnGraphTester -- I did some testing there and couldn't reproduce.
There does seem to be a slight vector search latency increase,
possibly noise, but maybe due to the branching introduced to check
whether to do byte vs float operations? It would be a little
surprising if that were the case given the small number of branchings
compared to the number of multiplies in dot-product though.

On Sun, Sep 18, 2022 at 3:25 PM Michael Sokolov <msokolov@gmail.com> wrote:
>
> Thanks for the deep-dive Julie. I was able to reproduce the changing
> recall. I had introduced some bugs in the diversity checks (that may
> have partially canceled each other out? it's hard to understand what
> was happening in the buggy case) and posted a fix today
> https://github.com/apache/lucene/pull/11781.
>
> There are a couple of other outstanding issues I found while doing a
> bunch of git bisecting;
>
> I think we might have introduced a (test-only) performance regression
> in KnnGraphTester
>
> We may still be over-allocating the size of NeighborArray, leading to
> excessive segmentation? I wonder if we could avoid dynamic
> re-allocation there, and simply initialize every neighbor array to
> 2*M+1.
>
> While I don't think these are necessarily blockers, given that we are
> releasing HNSW improvements, it seems like we should address these,
> especially as the build-graph-on-index is one of the things we are
> releasing, and it is (may be?) impacted. I will see if I can put up a
> patch or two.
>
> It would be great if you all are able to test again with
> https://github.com/apache/lucene/pull/11781/ applied
>
> -Mike
>
> On Fri, Sep 16, 2022 at 11:07 AM Adrien Grand <jpountz@gmail.com> wrote:
> >
> > Thank you Mike, I just backported the change.
> >
> > On Thu, Sep 15, 2022 at 6:32 PM Michael Sokolov <msokolov@gmail.com> wrote:
> >>
> >> it looks like a small bug fix, we have had on main (and 9.x?) for a
> >> while now and no test failures showed up, I guess. Should be OK to
> >> port. I plan to cut artifacts this weekend, or Monday at the latest,
> >> but if you can do the backport today or tomorrow, that's fine by me.
> >>
> >> On Thu, Sep 15, 2022 at 10:55 AM Adrien Grand <jpountz@gmail.com> wrote:
> >> >
> >> > Mike, I'm tempted to backport https://github.com/apache/lucene/pull/1068 to branch_9_4, which is a bugfix that looks pretty safe to me. What do you think?
> >> >
> >> > On Tue, Sep 13, 2022 at 4:11 PM Mayya Sharipova <mayya.sharipova@elastic.co.invalid> wrote:
> >> >>
> >> >> Thanks for running more tests, Michael.
> >> >> It is encouraging that you saw a similar performance between 9.3 and 9.4. I will also run more tests with different parameters.
> >> >>
> >> >> On Tue, Sep 13, 2022 at 9:30 AM Michael Sokolov <msokolov@gmail.com> wrote:
> >> >>>
> >> >>> As a follow-up, I ran a test using the same parameters as above, only
> >> >>> changing M=200 to M=16. This did result in a single segment in both
> >> >>> cases (9.3, 9.4) and the performance was pretty similar; within noise
> >> >>> I think. The main difference I saw was that the 9.3 index was written
> >> >>> using CFS:
> >> >>>
> >> >>> 9.4:
> >> >>> recall latency nDoc fanout maxConn beamWidth visited index ms
> >> >>> 0.755 1.36 1000000 100 16 100 200 891402 1.00
> >> >>> post-filter
> >> >>> -rw-r--r-- 1 sokolovm amazon 382M Sep 13 13:06
> >> >>> _0_Lucene94HnswVectorsFormat_0.vec
> >> >>> -rw-r--r-- 1 sokolovm amazon 262K Sep 13 13:06
> >> >>> _0_Lucene94HnswVectorsFormat_0.vem
> >> >>> -rw-r--r-- 1 sokolovm amazon 131M Sep 13 13:06
> >> >>> _0_Lucene94HnswVectorsFormat_0.vex
> >> >>>
> >> >>> 9.3:
> >> >>> recall latency nDoc fanout maxConn beamWidth visited index ms
> >> >>> 0.775 1.34 1000000 100 16 100 4033 977043
> >> >>> rw-r--r-- 1 sokolovm amazon 297 Sep 13 13:26 _0.cfe
> >> >>> -rw-r--r-- 1 sokolovm amazon 516M Sep 13 13:26 _0.cfs
> >> >>> -rw-r--r-- 1 sokolovm amazon 340 Sep 13 13:26 _0.si
> >> >>>
> >> >>> On Tue, Sep 13, 2022 at 8:50 AM Michael Sokolov <msokolov@gmail.com> wrote:
> >> >>> >
> >> >>> > I ran another test. I thought I had increased the RAM buffer size to
> >> >>> > 8G and heap to 16G. However I still see two segments in the index that
> >> >>> > was created. And looking at the infostream I see:
> >> >>> >
> >> >>> > dir=MMapDirectory@/local/home/sokolovm/workspace/knn-perf/glove-100-angular.hdf5-train-200-200.index
> >> >>> > lockFactory=org\
> >> >>> > .apache.lucene.store.NativeFSLockFactory@4466af20
> >> >>> > index=
> >> >>> > version=9.4.0
> >> >>> > analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
> >> >>> > ramBufferSizeMB=8000.0
> >> >>> > maxBufferedDocs=-1
> >> >>> > ...
> >> >>> > perThreadHardLimitMB=1945
> >> >>> > ...
> >> >>> > DWPT 0 [2022-09-13T02:42:53.329404950Z; main]: flush postings as
> >> >>> > segment _6 numDocs=555373
> >> >>> > IW 0 [2022-09-13T02:42:53.330671171Z; main]: 0 msec to write norms
> >> >>> > IW 0 [2022-09-13T02:42:53.331113184Z; main]: 0 msec to write docValues
> >> >>> > IW 0 [2022-09-13T02:42:53.331320146Z; main]: 0 msec to write points
> >> >>> > IW 0 [2022-09-13T02:42:56.424195657Z; main]: 3092 msec to write vectors
> >> >>> > IW 0 [2022-09-13T02:42:56.429239944Z; main]: 4 msec to finish stored fields
> >> >>> > IW 0 [2022-09-13T02:42:56.429593512Z; main]: 0 msec to write postings
> >> >>> > and finish vectors
> >> >>> > IW 0 [2022-09-13T02:42:56.430309031Z; main]: 0 msec to write fieldInfos
> >> >>> > DWPT 0 [2022-09-13T02:42:56.431721622Z; main]: new segment has 0 deleted docs
> >> >>> > DWPT 0 [2022-09-13T02:42:56.431921144Z; main]: new segment has 0
> >> >>> > soft-deleted docs
> >> >>> > DWPT 0 [2022-09-13T02:42:56.435738086Z; main]: new segment has no
> >> >>> > vectors; no norms; no docValues; no prox; freqs
> >> >>> > DWPT 0 [2022-09-13T02:42:56.435952356Z; main]:
> >> >>> > flushedFiles=[._6_Lucene94HnswVectorsFormat_0.vec, _6.fdm, _6.fdt, _6_\
> >> >>> > Lucene94HnswVectorsFormat_0.vem, _6.fnm, _6.fdx,
> >> >>> > _6_Lucene94HnswVectorsFormat_0.vex]
> >> >>> > DWPT 0 [2022-09-13T02:42:56.436121861Z; main]: flushed codec=Lucene94
> >> >>> > DWPT 0 [2022-09-13T02:42:56.437691468Z; main]: flushed: segment=_6
> >> >>> > ramUsed=1,945.002 MB newFlushedSize=1,065.701 MB \
> >> >>> > docs/MB=521.134
> >> >>> >
> >> >>> > so I think it's this perThreadHardLimit that is triggering the
> >> >>> > flushes? TBH this isn't something I had seen before; but the docs say:
> >> >>> >
> >> >>> > /**
> >> >>> > * Expert: Sets the maximum memory consumption per thread triggering
> >> >>> > a forced flush if exceeded. A
> >> >>> > * {@link DocumentsWriterPerThread} is forcefully flushed once it
> >> >>> > exceeds this limit even if the
> >> >>> > * {@link #getRAMBufferSizeMB()} has not been exceeded. This is a
> >> >>> > safety limit to prevent a {@link
> >> >>> > * DocumentsWriterPerThread} from address space exhaustion due to
> >> >>> > its internal 32 bit signed
> >> >>> > * integer based memory addressing. The given value must be less
> >> >>> > that 2GB (2048MB)
> >> >>> > *
> >> >>> > * @see #DEFAULT_RAM_PER_THREAD_HARD_LIMIT_MB
> >> >>> > */
> >> >>> >
> >> >>> > On Mon, Sep 12, 2022 at 6:28 PM Michael Sokolov <msokolov@gmail.com> wrote:
> >> >>> > >
> >> >>> > > Hi Mayya, thanks for persisting - I think we need to wrestle this to
> >> >>> > > the ground for sure. In the test I ran, RAM buffer was the default
> >> >>> > > checked in, which is weirdly: 1994MB. I did not specifically set heap
> >> >>> > > size. I used maxConn/M=200. I'll try with larger buffer to see if I
> >> >>> > > can get 9.4 to produce a single segment for the same test settings. I
> >> >>> > > see you used a much smaller M (16), which should have produced quite
> >> >>> > > small graphs, and I agree, should have been a single segment. Were you
> >> >>> > > able to verify the number of segments?
> >> >>> > >
> >> >>> > > Agree that decrease in recall is not expected when more segments are produced.
> >> >>> > >
> >> >>> > > On Mon, Sep 12, 2022 at 1:51 PM Mayya Sharipova
> >> >>> > > <mayya.sharipova@elastic.co.invalid> wrote:
> >> >>> > > >
> >> >>> > > > Hello Michael,
> >> >>> > > > Thanks for checking.
> >> >>> > > > Sorry for bringing this up again.
> >> >>> > > > First of all, I am ok with proceeding with the Lucene 9.4 release and leaving the performance investigations for later.
> >> >>> > > >
> >> >>> > > > I am interested in what's the maxConn/M value you used for your tests? What was the heap memory and the size of the RAM buffer for indexing?
> >> >>> > > > Usually, when we have multiple segments, recall should increase, not decrease. But I agree that with multiple segments we can see a big drop in QPS.
> >> >>> > > >
> >> >>> > > > Here is my investigation with detailed output of the performance difference between 9.3 and 9.4 releases. In my tests I used a large indexing buffer (2Gb) and large heap (5Gb) to end up with a single segment for both 9.3 and 9.4 tests, but still see a big drop in QPS in 9.4.
> >> >>> > > >
> >> >>> > > > Thank you.
> >> >>> > > >
> >> >>> > > >
> >> >>> > > >
> >> >>> > > >
> >> >>> > > >
> >> >>> > > > On Fri, Sep 9, 2022 at 12:21 PM Alan Woodward <romseygeek@gmail.com> wrote:
> >> >>> > > >>
> >> >>> > > >> Done. Thanks!
> >> >>> > > >>
> >> >>> > > >> > On 9 Sep 2022, at 16:32, Michael Sokolov <msokolov@gmail.com> wrote:
> >> >>> > > >> >
> >> >>> > > >> > Hi Alan - I checked out the interval queries patch; seems pretty safe,
> >> >>> > > >> > please go ahead and port to 9.4. Thanks!
> >> >>> > > >> >
> >> >>> > > >> > Mike
> >> >>> > > >> >
> >> >>> > > >> > On Fri, Sep 9, 2022 at 10:41 AM Alan Woodward <romseygeek@gmail.com> wrote:
> >> >>> > > >> >>
> >> >>> > > >> >> Hi Mike,
> >> >>> > > >> >>
> >> >>> > > >> >> I’ve opened https://github.com/apache/lucene/pull/11760 as a small bug fix PR for a problem with interval queries. Am I OK to port this to the 9.4 branch?
> >> >>> > > >> >>
> >> >>> > > >> >> Thanks, Alan
> >> >>> > > >> >>
> >> >>> > > >> >> On 2 Sep 2022, at 20:42, Michael Sokolov <msokolov@gmail.com> wrote:
> >> >>> > > >> >>
> >> >>> > > >> >> NOTICE:
> >> >>> > > >> >>
> >> >>> > > >> >> Branch branch_9_4 has been cut and versions updated to 9.5 on stable branch.
> >> >>> > > >> >>
> >> >>> > > >> >> Please observe the normal rules:
> >> >>> > > >> >>
> >> >>> > > >> >> * No new features may be committed to the branch.
> >> >>> > > >> >> * Documentation patches, build patches and serious bug fixes may be
> >> >>> > > >> >> committed to the branch. However, you should submit all patches you
> >> >>> > > >> >> want to commit to Jira first to give others the chance to review
> >> >>> > > >> >> and possibly vote against the patch. Keep in mind that it is our
> >> >>> > > >> >> main intention to keep the branch as stable as possible.
> >> >>> > > >> >> * All patches that are intended for the branch should first be committed
> >> >>> > > >> >> to the unstable branch, merged into the stable branch, and then into
> >> >>> > > >> >> the current release branch.
> >> >>> > > >> >> * Normal unstable and stable branch development may continue as usual.
> >> >>> > > >> >> However, if you plan to commit a big change to the unstable branch
> >> >>> > > >> >> while the branch feature freeze is in effect, think twice: can't the
> >> >>> > > >> >> addition wait a couple more days? Merges of bug fixes into the branch
> >> >>> > > >> >> may become more difficult.
> >> >>> > > >> >> * Only Jira issues with Fix version 9.4 and priority "Blocker" will delay
> >> >>> > > >> >> a release candidate build.
> >> >>> > > >> >>
> >> >>> > > >> >> ---------------------------------------------------------------------
> >> >>> > > >> >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> >> >>> > > >> >> For additional commands, e-mail: dev-help@lucene.apache.org
> >> >>> > > >> >>
> >> >>> > > >> >>
> >> >>> > > >> >
> >> >>> > > >> > ---------------------------------------------------------------------
> >> >>> > > >> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> >> >>> > > >> > For additional commands, e-mail: dev-help@lucene.apache.org
> >> >>> > > >> >
> >> >>> > > >>
> >> >>> > > >>
> >> >>> > > >> ---------------------------------------------------------------------
> >> >>> > > >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> >> >>> > > >> For additional commands, e-mail: dev-help@lucene.apache.org
> >> >>> > > >>
> >> >>>
> >> >>> ---------------------------------------------------------------------
> >> >>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> >> >>> For additional commands, e-mail: dev-help@lucene.apache.org
> >> >>>
> >> >
> >> >
> >> > --
> >> > Adrien
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: dev-help@lucene.apache.org
> >>
> >
> >
> > --
> > Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Subject: New branch and feature freeze for Lucene 9.4.0 [ In reply to ]

Sep 19, 2022, 8:02 AM

Post #24 of 37 (1148 views)

>
> It would be great if you all are able to test again with
> https://github.com/apache/lucene/pull/11781/ applied

I ran the ann benchmarks with this change, and was happy to confirm that
in my test recall with this PR is the same as in 9.3 branch, although QPS
is lower, but we can investigate QPSs later.

glove-100-angular M:16 efConstruction:100
9.3 recall9.3 QPSthis PR recallthis PR QPS
n_cands=10 0.620 2745.933 0.620 1675.500
n_cands=20 0.680 2288.665 0.680 1512.744
n_cands=40 0.746 1770.243 0.746 1040.240
n_cands=80 0.809 1226.738 0.809 695.236
n_cands=120 0.843 948.908 0.843 525.914
n_cands=200 0.878 671.781 0.878 351.529
n_cands=400 0.918 392.265 0.918 207.854
n_cands=600 0.937 282.403 0.937 144.311
n_cands=800 0.949 214.620 0.949 116.875

On Sun, Sep 18, 2022 at 6:25 PM Michael Sokolov <msokolov@gmail.com> wrote:

> OK, I think I was wrong about latency having increased due to a change
> in KnnGraphTester -- I did some testing there and couldn't reproduce.
> There does seem to be a slight vector search latency increase,
> possibly noise, but maybe due to the branching introduced to check
> whether to do byte vs float operations? It would be a little
> surprising if that were the case given the small number of branchings
> compared to the number of multiplies in dot-product though.
>
> On Sun, Sep 18, 2022 at 3:25 PM Michael Sokolov <msokolov@gmail.com>
> wrote:
> >
> > Thanks for the deep-dive Julie. I was able to reproduce the changing
> > recall. I had introduced some bugs in the diversity checks (that may
> > have partially canceled each other out? it's hard to understand what
> > was happening in the buggy case) and posted a fix today
> > https://github.com/apache/lucene/pull/11781.
> >
> > There are a couple of other outstanding issues I found while doing a
> > bunch of git bisecting;
> >
> > I think we might have introduced a (test-only) performance regression
> > in KnnGraphTester
> >
> > We may still be over-allocating the size of NeighborArray, leading to
> > excessive segmentation? I wonder if we could avoid dynamic
> > re-allocation there, and simply initialize every neighbor array to
> > 2*M+1.
> >
> > While I don't think these are necessarily blockers, given that we are
> > releasing HNSW improvements, it seems like we should address these,
> > especially as the build-graph-on-index is one of the things we are
> > releasing, and it is (may be?) impacted. I will see if I can put up a
> > patch or two.
> >
> > It would be great if you all are able to test again with
> > https://github.com/apache/lucene/pull/11781/ applied
> >
> > -Mike
> >
> > On Fri, Sep 16, 2022 at 11:07 AM Adrien Grand <jpountz@gmail.com> wrote:
> > >
> > > Thank you Mike, I just backported the change.
> > >
> > > On Thu, Sep 15, 2022 at 6:32 PM Michael Sokolov <msokolov@gmail.com>
> wrote:
> > >>
> > >> it looks like a small bug fix, we have had on main (and 9.x?) for a
> > >> while now and no test failures showed up, I guess. Should be OK to
> > >> port. I plan to cut artifacts this weekend, or Monday at the latest,
> > >> but if you can do the backport today or tomorrow, that's fine by me.
> > >>
> > >> On Thu, Sep 15, 2022 at 10:55 AM Adrien Grand <jpountz@gmail.com>
> wrote:
> > >> >
> > >> > Mike, I'm tempted to backport
> https://github.com/apache/lucene/pull/1068 to branch_9_4, which is a
> bugfix that looks pretty safe to me. What do you think?
> > >> >
> > >> > On Tue, Sep 13, 2022 at 4:11 PM Mayya Sharipova <
> mayya.sharipova@elastic.co.invalid> wrote:
> > >> >>
> > >> >> Thanks for running more tests, Michael.
> > >> >> It is encouraging that you saw a similar performance between 9.3
> and 9.4. I will also run more tests with different parameters.
> > >> >>
> > >> >> On Tue, Sep 13, 2022 at 9:30 AM Michael Sokolov <
> msokolov@gmail.com> wrote:
> > >> >>>
> > >> >>> As a follow-up, I ran a test using the same parameters as above,
> only
> > >> >>> changing M=200 to M=16. This did result in a single segment in
> both
> > >> >>> cases (9.3, 9.4) and the performance was pretty similar; within
> noise
> > >> >>> I think. The main difference I saw was that the 9.3 index was
> written
> > >> >>> using CFS:
> > >> >>>
> > >> >>> 9.4:
> > >> >>> recall latency nDoc fanout maxConn beamWidth visited
> index ms
> > >> >>> 0.755 1.36 1000000 100 16 100 200 891402
> 1.00
> > >> >>> post-filter
> > >> >>> -rw-r--r-- 1 sokolovm amazon 382M Sep 13 13:06
> > >> >>> _0_Lucene94HnswVectorsFormat_0.vec
> > >> >>> -rw-r--r-- 1 sokolovm amazon 262K Sep 13 13:06
> > >> >>> _0_Lucene94HnswVectorsFormat_0.vem
> > >> >>> -rw-r--r-- 1 sokolovm amazon 131M Sep 13 13:06
> > >> >>> _0_Lucene94HnswVectorsFormat_0.vex
> > >> >>>
> > >> >>> 9.3:
> > >> >>> recall latency nDoc fanout maxConn beamWidth visited
> index ms
> > >> >>> 0.775 1.34 1000000 100 16 100 4033 977043
> > >> >>> rw-r--r-- 1 sokolovm amazon 297 Sep 13 13:26 _0.cfe
> > >> >>> -rw-r--r-- 1 sokolovm amazon 516M Sep 13 13:26 _0.cfs
> > >> >>> -rw-r--r-- 1 sokolovm amazon 340 Sep 13 13:26 _0.si
> > >> >>>
> > >> >>> On Tue, Sep 13, 2022 at 8:50 AM Michael Sokolov <
> msokolov@gmail.com> wrote:
> > >> >>> >
> > >> >>> > I ran another test. I thought I had increased the RAM buffer
> size to
> > >> >>> > 8G and heap to 16G. However I still see two segments in the
> index that
> > >> >>> > was created. And looking at the infostream I see:
> > >> >>> >
> > >> >>> > dir=MMapDirectory@
> /local/home/sokolovm/workspace/knn-perf/glove-100-angular.hdf5-train-200-200.index
> > >> >>> > lockFactory=org\
> > >> >>> > .apache.lucene.store.NativeFSLockFactory@4466af20
> > >> >>> > index=
> > >> >>> > version=9.4.0
> > >> >>> > analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
> > >> >>> > ramBufferSizeMB=8000.0
> > >> >>> > maxBufferedDocs=-1
> > >> >>> > ...
> > >> >>> > perThreadHardLimitMB=1945
> > >> >>> > ...
> > >> >>> > DWPT 0 [2022-09-13T02:42:53.329404950Z; main]: flush postings as
> > >> >>> > segment _6 numDocs=555373
> > >> >>> > IW 0 [2022-09-13T02:42:53.330671171Z; main]: 0 msec to write
> norms
> > >> >>> > IW 0 [2022-09-13T02:42:53.331113184Z; main]: 0 msec to write
> docValues
> > >> >>> > IW 0 [2022-09-13T02:42:53.331320146Z; main]: 0 msec to write
> points
> > >> >>> > IW 0 [2022-09-13T02:42:56.424195657Z; main]: 3092 msec to write
> vectors
> > >> >>> > IW 0 [2022-09-13T02:42:56.429239944Z; main]: 4 msec to finish
> stored fields
> > >> >>> > IW 0 [2022-09-13T02:42:56.429593512Z; main]: 0 msec to write
> postings
> > >> >>> > and finish vectors
> > >> >>> > IW 0 [2022-09-13T02:42:56.430309031Z; main]: 0 msec to write
> fieldInfos
> > >> >>> > DWPT 0 [2022-09-13T02:42:56.431721622Z; main]: new segment has
> 0 deleted docs
> > >> >>> > DWPT 0 [2022-09-13T02:42:56.431921144Z; main]: new segment has 0
> > >> >>> > soft-deleted docs
> > >> >>> > DWPT 0 [2022-09-13T02:42:56.435738086Z; main]: new segment has
> no
> > >> >>> > vectors; no norms; no docValues; no prox; freqs
> > >> >>> > DWPT 0 [2022-09-13T02:42:56.435952356Z; main]:
> > >> >>> > flushedFiles=[_6_Lucene94HnswVectorsFormat_0.vec, _6.fdm,
> _6.fdt, _6_\
> > >> >>> > Lucene94HnswVectorsFormat_0.vem, _6.fnm, _6.fdx,
> > >> >>> > _6_Lucene94HnswVectorsFormat_0.vex]
> > >> >>> > DWPT 0 [2022-09-13T02:42:56.436121861Z; main]: flushed
> codec=Lucene94
> > >> >>> > DWPT 0 [2022-09-13T02:42:56.437691468Z; main]: flushed:
> segment=_6
> > >> >>> > ramUsed=1,945.002 MB newFlushedSize=1,065.701 MB \
> > >> >>> > docs/MB=521.134
> > >> >>> >
> > >> >>> > so I think it's this perThreadHardLimit that is triggering the
> > >> >>> > flushes? TBH this isn't something I had seen before; but the
> docs say:
> > >> >>> >
> > >> >>> > /**
> > >> >>> > * Expert: Sets the maximum memory consumption per thread
> triggering
> > >> >>> > a forced flush if exceeded. A
> > >> >>> > * {@link DocumentsWriterPerThread} is forcefully flushed
> once it
> > >> >>> > exceeds this limit even if the
> > >> >>> > * {@link #getRAMBufferSizeMB()} has not been exceeded. This
> is a
> > >> >>> > safety limit to prevent a {@link
> > >> >>> > * DocumentsWriterPerThread} from address space exhaustion
> due to
> > >> >>> > its internal 32 bit signed
> > >> >>> > * integer based memory addressing. The given value must be
> less
> > >> >>> > that 2GB (2048MB)
> > >> >>> > *
> > >> >>> > * @see #DEFAULT_RAM_PER_THREAD_HARD_LIMIT_MB
> > >> >>> > */
> > >> >>> >
> > >> >>> > On Mon, Sep 12, 2022 at 6:28 PM Michael Sokolov <
> msokolov@gmail.com> wrote:
> > >> >>> > >
> > >> >>> > > Hi Mayya, thanks for persisting - I think we need to wrestle
> this to
> > >> >>> > > the ground for sure. In the test I ran, RAM buffer was the
> default
> > >> >>> > > checked in, which is weirdly: 1994MB. I did not specifically
> set heap
> > >> >>> > > size. I used maxConn/M=200. I'll try with larger buffer to
> see if I
> > >> >>> > > can get 9.4 to produce a single segment for the same test
> settings. I
> > >> >>> > > see you used a much smaller M (16), which should have
> produced quite
> > >> >>> > > small graphs, and I agree, should have been a single segment.
> Were you
> > >> >>> > > able to verify the number of segments?
> > >> >>> > >
> > >> >>> > > Agree that decrease in recall is not expected when more
> segments are produced.
> > >> >>> > >
> > >> >>> > > On Mon, Sep 12, 2022 at 1:51 PM Mayya Sharipova
> > >> >>> > > <mayya.sharipova@elastic.co.invalid> wrote:
> > >> >>> > > >
> > >> >>> > > > Hello Michael,
> > >> >>> > > > Thanks for checking.
> > >> >>> > > > Sorry for bringing this up again.
> > >> >>> > > > First of all, I am ok with proceeding with the Lucene 9.4
> release and leaving the performance investigations for later.
> > >> >>> > > >
> > >> >>> > > > I am interested in what's the maxConn/M value you used for
> your tests? What was the heap memory and the size of the RAM buffer for
> indexing?
> > >> >>> > > > Usually, when we have multiple segments, recall should
> increase, not decrease. But I agree that with multiple segments we can see
> a big drop in QPS.
> > >> >>> > > >
> > >> >>> > > > Here is my investigation with detailed output of the
> performance difference between 9.3 and 9.4 releases. In my tests I used a
> large indexing buffer (2Gb) and large heap (5Gb) to end up with a single
> segment for both 9.3 and 9.4 tests, but still see a big drop in QPS in 9.4.
> > >> >>> > > >
> > >> >>> > > > Thank you.
> > >> >>> > > >
> > >> >>> > > >
> > >> >>> > > >
> > >> >>> > > >
> > >> >>> > > >
> > >> >>> > > > On Fri, Sep 9, 2022 at 12:21 PM Alan Woodward <
> romseygeek@gmail.com> wrote:
> > >> >>> > > >>
> > >> >>> > > >> Done. Thanks!
> > >> >>> > > >>
> > >> >>> > > >> > On 9 Sep 2022, at 16:32, Michael Sokolov <
> msokolov@gmail.com> wrote:
> > >> >>> > > >> >
> > >> >>> > > >> > Hi Alan - I checked out the interval queries patch;
> seems pretty safe,
> > >> >>> > > >> > please go ahead and port to 9.4. Thanks!
> > >> >>> > > >> >
> > >> >>> > > >> > Mike
> > >> >>> > > >> >
> > >> >>> > > >> > On Fri, Sep 9, 2022 at 10:41 AM Alan Woodward <
> romseygeek@gmail.com> wrote:
> > >> >>> > > >> >>
> > >> >>> > > >> >> Hi Mike,
> > >> >>> > > >> >>
> > >> >>> > > >> >> I’ve opened https://github.com/apache/lucene/pull/11760
> as a small bug fix PR for a problem with interval queries. Am I OK to port
> this to the 9.4 branch?
> > >> >>> > > >> >>
> > >> >>> > > >> >> Thanks, Alan
> > >> >>> > > >> >>
> > >> >>> > > >> >> On 2 Sep 2022, at 20:42, Michael Sokolov <
> msokolov@gmail.com> wrote:
> > >> >>> > > >> >>
> > >> >>> > > >> >> NOTICE:
> > >> >>> > > >> >>
> > >> >>> > > >> >> Branch branch_9_4 has been cut and versions updated to
> 9.5 on stable branch.
> > >> >>> > > >> >>
> > >> >>> > > >> >> Please observe the normal rules:
> > >> >>> > > >> >>
> > >> >>> > > >> >> * No new features may be committed to the branch.
> > >> >>> > > >> >> * Documentation patches, build patches and serious bug
> fixes may be
> > >> >>> > > >> >> committed to the branch. However, you should submit all
> patches you
> > >> >>> > > >> >> want to commit to Jira first to give others the chance
> to review
> > >> >>> > > >> >> and possibly vote against the patch. Keep in mind that
> it is our
> > >> >>> > > >> >> main intention to keep the branch as stable as possible.
> > >> >>> > > >> >> * All patches that are intended for the branch should
> first be committed
> > >> >>> > > >> >> to the unstable branch, merged into the stable branch,
> and then into
> > >> >>> > > >> >> the current release branch.
> > >> >>> > > >> >> * Normal unstable and stable branch development may
> continue as usual.
> > >> >>> > > >> >> However, if you plan to commit a big change to the
> unstable branch
> > >> >>> > > >> >> while the branch feature freeze is in effect, think
> twice: can't the
> > >> >>> > > >> >> addition wait a couple more days? Merges of bug fixes
> into the branch
> > >> >>> > > >> >> may become more difficult.
> > >> >>> > > >> >> * Only Jira issues with Fix version 9.4 and priority
> "Blocker" will delay
> > >> >>> > > >> >> a release candidate build.
> > >> >>> > > >> >>
> > >> >>> > > >> >>
> ---------------------------------------------------------------------
> > >> >>> > > >> >> To unsubscribe, e-mail:
> dev-unsubscribe@lucene.apache.org
> > >> >>> > > >> >> For additional commands, e-mail:
> dev-help@lucene.apache.org
> > >> >>> > > >> >>
> > >> >>> > > >> >>
> > >> >>> > > >> >
> > >> >>> > > >> >
> ---------------------------------------------------------------------
> > >> >>> > > >> > To unsubscribe, e-mail:
> dev-unsubscribe@lucene.apache.org
> > >> >>> > > >> > For additional commands, e-mail:
> dev-help@lucene.apache.org
> > >> >>> > > >> >
> > >> >>> > > >>
> > >> >>> > > >>
> > >> >>> > > >>
> ---------------------------------------------------------------------
> > >> >>> > > >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> > >> >>> > > >> For additional commands, e-mail:
> dev-help@lucene.apache.org
> > >> >>> > > >>
> > >> >>>
> > >> >>>
> ---------------------------------------------------------------------
> > >> >>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> > >> >>> For additional commands, e-mail: dev-help@lucene.apache.org
> > >> >>>
> > >> >
> > >> >
> > >> > --
> > >> > Adrien
> > >>
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> > >> For additional commands, e-mail: dev-help@lucene.apache.org
> > >>
> > >
> > >
> > > --
> > > Adrien
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Re: Subject: New branch and feature freeze for Lucene 9.4.0 [ In reply to ]

Sep 19, 2022, 10:32 AM

Post #25 of 37 (1144 views)

Thanks for your speedy testing! I am observing comparable latencies *when
the index geometry (ie number of segments)* is unchanged. Agree we can
leave this for a later day. I'll proceed to cut 9.4 artifacts

On Mon, Sep 19, 2022 at 11:02 AM Mayya Sharipova
<mayya.sharipova@elastic.co.invalid> wrote:

> It would be great if you all are able to test again with
>> https://github.com/apache/lucene/pull/11781/ applied
>
>
>
> I ran the ann benchmarks with this change, and was happy to confirm that
> in my test recall with this PR is the same as in 9.3 branch, although QPS
> is lower, but we can investigate QPSs later.
>
> glove-100-angular M:16 efConstruction:100
> 9.3 recall9.3 QPSthis PR recallthis PR QPS
> n_cands=10 0.620 2745.933 0.620 1675.500
> n_cands=20 0.680 2288.665 0.680 1512.744
> n_cands=40 0.746 1770.243 0.746 1040.240
> n_cands=80 0.809 1226.738 0.809 695.236
> n_cands=120 0.843 948.908 0.843 525.914
> n_cands=200 0.878 671.781 0.878 351.529
> n_cands=400 0.918 392.265 0.918 207.854
> n_cands=600 0.937 282.403 0.937 144.311
> n_cands=800 0.949 214.620 0.949 116.875
>
> On Sun, Sep 18, 2022 at 6:25 PM Michael Sokolov <msokolov@gmail.com>
> wrote:
>
>> OK, I think I was wrong about latency having increased due to a change
>> in KnnGraphTester -- I did some testing there and couldn't reproduce.
>> There does seem to be a slight vector search latency increase,
>> possibly noise, but maybe due to the branching introduced to check
>> whether to do byte vs float operations? It would be a little
>> surprising if that were the case given the small number of branchings
>> compared to the number of multiplies in dot-product though.
>>
>> On Sun, Sep 18, 2022 at 3:25 PM Michael Sokolov <msokolov@gmail.com>
>> wrote:
>> >
>> > Thanks for the deep-dive Julie. I was able to reproduce the changing
>> > recall. I had introduced some bugs in the diversity checks (that may
>> > have partially canceled each other out? it's hard to understand what
>> > was happening in the buggy case) and posted a fix today
>> > https://github.com/apache/lucene/pull/11781.
>> >
>> > There are a couple of other outstanding issues I found while doing a
>> > bunch of git bisecting;
>> >
>> > I think we might have introduced a (test-only) performance regression
>> > in KnnGraphTester
>> >
>> > We may still be over-allocating the size of NeighborArray, leading to
>> > excessive segmentation? I wonder if we could avoid dynamic
>> > re-allocation there, and simply initialize every neighbor array to
>> > 2*M+1.
>> >
>> > While I don't think these are necessarily blockers, given that we are
>> > releasing HNSW improvements, it seems like we should address these,
>> > especially as the build-graph-on-index is one of the things we are
>> > releasing, and it is (may be?) impacted. I will see if I can put up a
>> > patch or two.
>> >
>> > It would be great if you all are able to test again with
>> > https://github.com/apache/lucene/pull/11781/ applied
>> >
>> > -Mike
>> >
>> > On Fri, Sep 16, 2022 at 11:07 AM Adrien Grand <jpountz@gmail.com>
>> wrote:
>> > >
>> > > Thank you Mike, I just backported the change.
>> > >
>> > > On Thu, Sep 15, 2022 at 6:32 PM Michael Sokolov <msokolov@gmail.com>
>> wrote:
>> > >>
>> > >> it looks like a small bug fix, we have had on main (and 9.x?) for a
>> > >> while now and no test failures showed up, I guess. Should be OK to
>> > >> port. I plan to cut artifacts this weekend, or Monday at the latest,
>> > >> but if you can do the backport today or tomorrow, that's fine by me.
>> > >>
>> > >> On Thu, Sep 15, 2022 at 10:55 AM Adrien Grand <jpountz@gmail.com>
>> wrote:
>> > >> >
>> > >> > Mike, I'm tempted to backport
>> https://github.com/apache/lucene/pull/1068 to branch_9_4, which is a
>> bugfix that looks pretty safe to me. What do you think?
>> > >> >
>> > >> > On Tue, Sep 13, 2022 at 4:11 PM Mayya Sharipova <
>> mayya.sharipova@elastic.co.invalid> wrote:
>> > >> >>
>> > >> >> Thanks for running more tests, Michael.
>> > >> >> It is encouraging that you saw a similar performance between 9.3
>> and 9.4. I will also run more tests with different parameters.
>> > >> >>
>> > >> >> On Tue, Sep 13, 2022 at 9:30 AM Michael Sokolov <
>> msokolov@gmail.com> wrote:
>> > >> >>>
>> > >> >>> As a follow-up, I ran a test using the same parameters as above,
>> only
>> > >> >>> changing M=200 to M=16. This did result in a single segment in
>> both
>> > >> >>> cases (9.3, 9.4) and the performance was pretty similar; within
>> noise
>> > >> >>> I think. The main difference I saw was that the 9.3 index was
>> written
>> > >> >>> using CFS:
>> > >> >>>
>> > >> >>> 9.4:
>> > >> >>> recall latency nDoc fanout maxConn beamWidth visited
>> index ms
>> > >> >>> 0.755 1.36 1000000 100 16 100 200 891402
>> 1.00
>> > >> >>> post-filter
>> > >> >>> -rw-r--r-- 1 sokolovm amazon 382M Sep 13 13:06
>> > >> >>> _0_Lucene94HnswVectorsFormat_0.vec
>> > >> >>> -rw-r--r-- 1 sokolovm amazon 262K Sep 13 13:06
>> > >> >>> _0_Lucene94HnswVectorsFormat_0.vem
>> > >> >>> -rw-r--r-- 1 sokolovm amazon 131M Sep 13 13:06
>> > >> >>> _0_Lucene94HnswVectorsFormat_0.vex
>> > >> >>>
>> > >> >>> 9.3:
>> > >> >>> recall latency nDoc fanout maxConn beamWidth visited
>> index ms
>> > >> >>> 0.775 1.34 1000000 100 16 100 4033 977043
>> > >> >>> rw-r--r-- 1 sokolovm amazon 297 Sep 13 13:26 _0.cfe
>> > >> >>> -rw-r--r-- 1 sokolovm amazon 516M Sep 13 13:26 _0.cfs
>> > >> >>> -rw-r--r-- 1 sokolovm amazon 340 Sep 13 13:26 _0.si
>> > >> >>>
>> > >> >>> On Tue, Sep 13, 2022 at 8:50 AM Michael Sokolov <
>> msokolov@gmail.com> wrote:
>> > >> >>> >
>> > >> >>> > I ran another test. I thought I had increased the RAM buffer
>> size to
>> > >> >>> > 8G and heap to 16G. However I still see two segments in the
>> index that
>> > >> >>> > was created. And looking at the infostream I see:
>> > >> >>> >
>> > >> >>> > dir=MMapDirectory@
>> /local/home/sokolovm/workspace/knn-perf/glove-100-angular.hdf5-train-200-200.index
>> > >> >>> > lockFactory=org\
>> > >> >>> > .apache.lucene.store.NativeFSLockFactory@4466af20
>> > >> >>> > index=
>> > >> >>> > version=9.4.0
>> > >> >>> > analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
>> > >> >>> > ramBufferSizeMB=8000.0
>> > >> >>> > maxBufferedDocs=-1
>> > >> >>> > ...
>> > >> >>> > perThreadHardLimitMB=1945
>> > >> >>> > ...
>> > >> >>> > DWPT 0 [2022-09-13T02:42:53.329404950Z; main]: flush postings
>> as
>> > >> >>> > segment _6 numDocs=555373
>> > >> >>> > IW 0 [2022-09-13T02:42:53.330671171Z; main]: 0 msec to write
>> norms
>> > >> >>> > IW 0 [2022-09-13T02:42:53.331113184Z; main]: 0 msec to write
>> docValues
>> > >> >>> > IW 0 [2022-09-13T02:42:53.331320146Z; main]: 0 msec to write
>> points
>> > >> >>> > IW 0 [2022-09-13T02:42:56.424195657Z; main]: 3092 msec to
>> write vectors
>> > >> >>> > IW 0 [2022-09-13T02:42:56.429239944Z; main]: 4 msec to finish
>> stored fields
>> > >> >>> > IW 0 [2022-09-13T02:42:56.429593512Z; main]: 0 msec to write
>> postings
>> > >> >>> > and finish vectors
>> > >> >>> > IW 0 [2022-09-13T02:42:56.430309031Z; main]: 0 msec to write
>> fieldInfos
>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.431721622Z; main]: new segment has
>> 0 deleted docs
>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.431921144Z; main]: new segment has
>> 0
>> > >> >>> > soft-deleted docs
>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.435738086Z; main]: new segment has
>> no
>> > >> >>> > vectors; no norms; no docValues; no prox; freqs
>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.435952356Z; main]:
>> > >> >>> > flushedFiles=[_6_Lucene94HnswVectorsFormat_0.vec, _6.fdm,
>> _6.fdt, _6_\
>> > >> >>> > Lucene94HnswVectorsFormat_0.vem, _6.fnm, _6.fdx,
>> > >> >>> > _6_Lucene94HnswVectorsFormat_0.vex]
>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.436121861Z; main]: flushed
>> codec=Lucene94
>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.437691468Z; main]: flushed:
>> segment=_6
>> > >> >>> > ramUsed=1,945.002 MB newFlushedSize=1,065.701 MB \
>> > >> >>> > docs/MB=521.134
>> > >> >>> >
>> > >> >>> > so I think it's this perThreadHardLimit that is triggering the
>> > >> >>> > flushes? TBH this isn't something I had seen before; but the
>> docs say:
>> > >> >>> >
>> > >> >>> > /**
>> > >> >>> > * Expert: Sets the maximum memory consumption per thread
>> triggering
>> > >> >>> > a forced flush if exceeded. A
>> > >> >>> > * {@link DocumentsWriterPerThread} is forcefully flushed
>> once it
>> > >> >>> > exceeds this limit even if the
>> > >> >>> > * {@link #getRAMBufferSizeMB()} has not been exceeded. This
>> is a
>> > >> >>> > safety limit to prevent a {@link
>> > >> >>> > * DocumentsWriterPerThread} from address space exhaustion
>> due to
>> > >> >>> > its internal 32 bit signed
>> > >> >>> > * integer based memory addressing. The given value must be
>> less
>> > >> >>> > that 2GB (2048MB)
>> > >> >>> > *
>> > >> >>> > * @see #DEFAULT_RAM_PER_THREAD_HARD_LIMIT_MB
>> > >> >>> > */
>> > >> >>> >
>> > >> >>> > On Mon, Sep 12, 2022 at 6:28 PM Michael Sokolov <
>> msokolov@gmail.com> wrote:
>> > >> >>> > >
>> > >> >>> > > Hi Mayya, thanks for persisting - I think we need to wrestle
>> this to
>> > >> >>> > > the ground for sure. In the test I ran, RAM buffer was the
>> default
>> > >> >>> > > checked in, which is weirdly: 1994MB. I did not specifically
>> set heap
>> > >> >>> > > size. I used maxConn/M=200. I'll try with larger buffer to
>> see if I
>> > >> >>> > > can get 9.4 to produce a single segment for the same test
>> settings. I
>> > >> >>> > > see you used a much smaller M (16), which should have
>> produced quite
>> > >> >>> > > small graphs, and I agree, should have been a single
>> segment. Were you
>> > >> >>> > > able to verify the number of segments?
>> > >> >>> > >
>> > >> >>> > > Agree that decrease in recall is not expected when more
>> segments are produced.
>> > >> >>> > >
>> > >> >>> > > On Mon, Sep 12, 2022 at 1:51 PM Mayya Sharipova
>> > >> >>> > > <mayya.sharipova@elastic.co.invalid> wrote:
>> > >> >>> > > >
>> > >> >>> > > > Hello Michael,
>> > >> >>> > > > Thanks for checking.
>> > >> >>> > > > Sorry for bringing this up again.
>> > >> >>> > > > First of all, I am ok with proceeding with the Lucene 9.4
>> release and leaving the performance investigations for later.
>> > >> >>> > > >
>> > >> >>> > > > I am interested in what's the maxConn/M value you used for
>> your tests? What was the heap memory and the size of the RAM buffer for
>> indexing?
>> > >> >>> > > > Usually, when we have multiple segments, recall should
>> increase, not decrease. But I agree that with multiple segments we can see
>> a big drop in QPS.
>> > >> >>> > > >
>> > >> >>> > > > Here is my investigation with detailed output of the
>> performance difference between 9.3 and 9.4 releases. In my tests I used a
>> large indexing buffer (2Gb) and large heap (5Gb) to end up with a single
>> segment for both 9.3 and 9.4 tests, but still see a big drop in QPS in 9.4.
>> > >> >>> > > >
>> > >> >>> > > > Thank you.
>> > >> >>> > > >
>> > >> >>> > > >
>> > >> >>> > > >
>> > >> >>> > > >
>> > >> >>> > > >
>> > >> >>> > > > On Fri, Sep 9, 2022 at 12:21 PM Alan Woodward <
>> romseygeek@gmail.com> wrote:
>> > >> >>> > > >>
>> > >> >>> > > >> Done. Thanks!
>> > >> >>> > > >>
>> > >> >>> > > >> > On 9 Sep 2022, at 16:32, Michael Sokolov <
>> msokolov@gmail.com> wrote:
>> > >> >>> > > >> >
>> > >> >>> > > >> > Hi Alan - I checked out the interval queries patch;
>> seems pretty safe,
>> > >> >>> > > >> > please go ahead and port to 9.4. Thanks!
>> > >> >>> > > >> >
>> > >> >>> > > >> > Mike
>> > >> >>> > > >> >
>> > >> >>> > > >> > On Fri, Sep 9, 2022 at 10:41 AM Alan Woodward <
>> romseygeek@gmail.com> wrote:
>> > >> >>> > > >> >>
>> > >> >>> > > >> >> Hi Mike,
>> > >> >>> > > >> >>
>> > >> >>> > > >> >> I’ve opened
>> https://github.com/apache/lucene/pull/11760 as a small bug fix PR for a
>> problem with interval queries. Am I OK to port this to the 9.4 branch?
>> > >> >>> > > >> >>
>> > >> >>> > > >> >> Thanks, Alan
>> > >> >>> > > >> >>
>> > >> >>> > > >> >> On 2 Sep 2022, at 20:42, Michael Sokolov <
>> msokolov@gmail.com> wrote:
>> > >> >>> > > >> >>
>> > >> >>> > > >> >> NOTICE:
>> > >> >>> > > >> >>
>> > >> >>> > > >> >> Branch branch_9_4 has been cut and versions updated to
>> 9.5 on stable branch.
>> > >> >>> > > >> >>
>> > >> >>> > > >> >> Please observe the normal rules:
>> > >> >>> > > >> >>
>> > >> >>> > > >> >> * No new features may be committed to the branch.
>> > >> >>> > > >> >> * Documentation patches, build patches and serious bug
>> fixes may be
>> > >> >>> > > >> >> committed to the branch. However, you should submit
>> all patches you
>> > >> >>> > > >> >> want to commit to Jira first to give others the chance
>> to review
>> > >> >>> > > >> >> and possibly vote against the patch. Keep in mind that
>> it is our
>> > >> >>> > > >> >> main intention to keep the branch as stable as
>> possible.
>> > >> >>> > > >> >> * All patches that are intended for the branch should
>> first be committed
>> > >> >>> > > >> >> to the unstable branch, merged into the stable branch,
>> and then into
>> > >> >>> > > >> >> the current release branch.
>> > >> >>> > > >> >> * Normal unstable and stable branch development may
>> continue as usual.
>> > >> >>> > > >> >> However, if you plan to commit a big change to the
>> unstable branch
>> > >> >>> > > >> >> while the branch feature freeze is in effect, think
>> twice: can't the
>> > >> >>> > > >> >> addition wait a couple more days? Merges of bug fixes
>> into the branch
>> > >> >>> > > >> >> may become more difficult.
>> > >> >>> > > >> >> * Only Jira issues with Fix version 9.4 and priority
>> "Blocker" will delay
>> > >> >>> > > >> >> a release candidate build.
>> > >> >>> > > >> >>
>> > >> >>> > > >> >>
>> ---------------------------------------------------------------------
>> > >> >>> > > >> >> To unsubscribe, e-mail:
>> dev-unsubscribe@lucene.apache.org
>> > >> >>> > > >> >> For additional commands, e-mail:
>> dev-help@lucene.apache.org
>> > >> >>> > > >> >>
>> > >> >>> > > >> >>
>> > >> >>> > > >> >
>> > >> >>> > > >> >
>> ---------------------------------------------------------------------
>> > >> >>> > > >> > To unsubscribe, e-mail:
>> dev-unsubscribe@lucene.apache.org
>> > >> >>> > > >> > For additional commands, e-mail:
>> dev-help@lucene.apache.org
>> > >> >>> > > >> >
>> > >> >>> > > >>
>> > >> >>> > > >>
>> > >> >>> > > >>
>> ---------------------------------------------------------------------
>> > >> >>> > > >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> > >> >>> > > >> For additional commands, e-mail:
>> dev-help@lucene.apache.org
>> > >> >>> > > >>
>> > >> >>>
>> > >> >>>
>> ---------------------------------------------------------------------
>> > >> >>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> > >> >>> For additional commands, e-mail: dev-help@lucene.apache.org
>> > >> >>>
>> > >> >
>> > >> >
>> > >> > --
>> > >> > Adrien
>> > >>
>> > >> ---------------------------------------------------------------------
>> > >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> > >> For additional commands, e-mail: dev-help@lucene.apache.org
>> > >>
>> > >
>> > >
>> > > --
>> > > Adrien
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>>

Re: Subject: New branch and feature freeze for Lucene 9.4.0 [ In reply to ]

Sep 19, 2022, 3:11 PM

Post #26 of 37 (415 views)

Using the ann-benchmarks framework, I still saw a similar regression as
Mayya between 9.3 and 9.4. I investigated and found it was due to
"KnnGraphTester to use KnnVectorQuery" (
https://github.com/apache/lucene/pull/796), specifically the change to the
warm-up strategy. If I revert it, the results look exactly as expected.

I guess we can keep an eye on the nightly benchmarks tomorrow to
double-check there's no drop. It would also be nice to formalize the
ann-benchmarks set-up and run it regularly (like we've discussed in
https://github.com/apache/lucene/issues/10665).

Julie

On Mon, Sep 19, 2022 at 10:33 AM Michael Sokolov <msokolov@gmail.com> wrote:

> Thanks for your speedy testing! I am observing comparable latencies *when
> the index geometry (ie number of segments)* is unchanged. Agree we can
> leave this for a later day. I'll proceed to cut 9.4 artifacts
>
> On Mon, Sep 19, 2022 at 11:02 AM Mayya Sharipova
> <mayya.sharipova@elastic.co.invalid> wrote:
>
>> It would be great if you all are able to test again with
>>> https://github.com/apache/lucene/pull/11781/ applied
>>
>>
>>
>> I ran the ann benchmarks with this change, and was happy to confirm that
>> in my test recall with this PR is the same as in 9.3 branch, although QPS
>> is lower, but we can investigate QPSs later.
>>
>> glove-100-angular M:16 efConstruction:100
>> 9.3 recall9.3 QPSthis PR recallthis PR QPS
>> n_cands=10 0.620 2745.933 0.620 1675.500
>> n_cands=20 0.680 2288.665 0.680 1512.744
>> n_cands=40 0.746 1770.243 0.746 1040.240
>> n_cands=80 0.809 1226.738 0.809 695.236
>> n_cands=120 0.843 948.908 0.843 525.914
>> n_cands=200 0.878 671.781 0.878 351.529
>> n_cands=400 0.918 392.265 0.918 207.854
>> n_cands=600 0.937 282.403 0.937 144.311
>> n_cands=800 0.949 214.620 0.949 116.875
>>
>> On Sun, Sep 18, 2022 at 6:25 PM Michael Sokolov <msokolov@gmail.com>
>> wrote:
>>
>>> OK, I think I was wrong about latency having increased due to a change
>>> in KnnGraphTester -- I did some testing there and couldn't reproduce.
>>> There does seem to be a slight vector search latency increase,
>>> possibly noise, but maybe due to the branching introduced to check
>>> whether to do byte vs float operations? It would be a little
>>> surprising if that were the case given the small number of branchings
>>> compared to the number of multiplies in dot-product though.
>>>
>>> On Sun, Sep 18, 2022 at 3:25 PM Michael Sokolov <msokolov@gmail.com>
>>> wrote:
>>> >
>>> > Thanks for the deep-dive Julie. I was able to reproduce the changing
>>> > recall. I had introduced some bugs in the diversity checks (that may
>>> > have partially canceled each other out? it's hard to understand what
>>> > was happening in the buggy case) and posted a fix today
>>> > https://github.com/apache/lucene/pull/11781.
>>> >
>>> > There are a couple of other outstanding issues I found while doing a
>>> > bunch of git bisecting;
>>> >
>>> > I think we might have introduced a (test-only) performance regression
>>> > in KnnGraphTester
>>> >
>>> > We may still be over-allocating the size of NeighborArray, leading to
>>> > excessive segmentation? I wonder if we could avoid dynamic
>>> > re-allocation there, and simply initialize every neighbor array to
>>> > 2*M+1.
>>> >
>>> > While I don't think these are necessarily blockers, given that we are
>>> > releasing HNSW improvements, it seems like we should address these,
>>> > especially as the build-graph-on-index is one of the things we are
>>> > releasing, and it is (may be?) impacted. I will see if I can put up a
>>> > patch or two.
>>> >
>>> > It would be great if you all are able to test again with
>>> > https://github.com/apache/lucene/pull/11781/ applied
>>> >
>>> > -Mike
>>> >
>>> > On Fri, Sep 16, 2022 at 11:07 AM Adrien Grand <jpountz@gmail.com>
>>> wrote:
>>> > >
>>> > > Thank you Mike, I just backported the change.
>>> > >
>>> > > On Thu, Sep 15, 2022 at 6:32 PM Michael Sokolov <msokolov@gmail.com>
>>> wrote:
>>> > >>
>>> > >> it looks like a small bug fix, we have had on main (and 9.x?) for a
>>> > >> while now and no test failures showed up, I guess. Should be OK to
>>> > >> port. I plan to cut artifacts this weekend, or Monday at the latest,
>>> > >> but if you can do the backport today or tomorrow, that's fine by me.
>>> > >>
>>> > >> On Thu, Sep 15, 2022 at 10:55 AM Adrien Grand <jpountz@gmail.com>
>>> wrote:
>>> > >> >
>>> > >> > Mike, I'm tempted to backport
>>> https://github.com/apache/lucene/pull/1068 to branch_9_4, which is a
>>> bugfix that looks pretty safe to me. What do you think?
>>> > >> >
>>> > >> > On Tue, Sep 13, 2022 at 4:11 PM Mayya Sharipova <
>>> mayya.sharipova@elastic.co.invalid> wrote:
>>> > >> >>
>>> > >> >> Thanks for running more tests, Michael.
>>> > >> >> It is encouraging that you saw a similar performance between 9.3
>>> and 9.4. I will also run more tests with different parameters.
>>> > >> >>
>>> > >> >> On Tue, Sep 13, 2022 at 9:30 AM Michael Sokolov <
>>> msokolov@gmail.com> wrote:
>>> > >> >>>
>>> > >> >>> As a follow-up, I ran a test using the same parameters as
>>> above, only
>>> > >> >>> changing M=200 to M=16. This did result in a single segment in
>>> both
>>> > >> >>> cases (9.3, 9.4) and the performance was pretty similar; within
>>> noise
>>> > >> >>> I think. The main difference I saw was that the 9.3 index was
>>> written
>>> > >> >>> using CFS:
>>> > >> >>>
>>> > >> >>> 9.4:
>>> > >> >>> recall latency nDoc fanout maxConn beamWidth visited
>>> index ms
>>> > >> >>> 0.755 1.36 1000000 100 16 100 200 891402
>>> 1.00
>>> > >> >>> post-filter
>>> > >> >>> -rw-r--r-- 1 sokolovm amazon 382M Sep 13 13:06
>>> > >> >>> _0_Lucene94HnswVectorsFormat_0.vec
>>> > >> >>> -rw-r--r-- 1 sokolovm amazon 262K Sep 13 13:06
>>> > >> >>> _0_Lucene94HnswVectorsFormat_0.vem
>>> > >> >>> -rw-r--r-- 1 sokolovm amazon 131M Sep 13 13:06
>>> > >> >>> _0_Lucene94HnswVectorsFormat_0.vex
>>> > >> >>>
>>> > >> >>> 9.3:
>>> > >> >>> recall latency nDoc fanout maxConn beamWidth visited
>>> index ms
>>> > >> >>> 0.775 1.34 1000000 100 16 100 4033 977043
>>> > >> >>> rw-r--r-- 1 sokolovm amazon 297 Sep 13 13:26 _0.cfe
>>> > >> >>> -rw-r--r-- 1 sokolovm amazon 516M Sep 13 13:26 _0.cfs
>>> > >> >>> -rw-r--r-- 1 sokolovm amazon 340 Sep 13 13:26 _0.si
>>> > >> >>>
>>> > >> >>> On Tue, Sep 13, 2022 at 8:50 AM Michael Sokolov <
>>> msokolov@gmail.com> wrote:
>>> > >> >>> >
>>> > >> >>> > I ran another test. I thought I had increased the RAM buffer
>>> size to
>>> > >> >>> > 8G and heap to 16G. However I still see two segments in the
>>> index that
>>> > >> >>> > was created. And looking at the infostream I see:
>>> > >> >>> >
>>> > >> >>> > dir=MMapDirectory@
>>> /local/home/sokolovm/workspace/knn-perf/glove-100-angular.hdf5-train-200-200.index
>>> > >> >>> > lockFactory=org\
>>> > >> >>> > .apache.lucene.store.NativeFSLockFactory@4466af20
>>> > >> >>> > index=
>>> > >> >>> > version=9.4.0
>>> > >> >>> > analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
>>> > >> >>> > ramBufferSizeMB=8000.0
>>> > >> >>> > maxBufferedDocs=-1
>>> > >> >>> > ...
>>> > >> >>> > perThreadHardLimitMB=1945
>>> > >> >>> > ...
>>> > >> >>> > DWPT 0 [2022-09-13T02:42:53.329404950Z; main]: flush postings
>>> as
>>> > >> >>> > segment _6 numDocs=555373
>>> > >> >>> > IW 0 [2022-09-13T02:42:53.330671171Z; main]: 0 msec to write
>>> norms
>>> > >> >>> > IW 0 [2022-09-13T02:42:53.331113184Z; main]: 0 msec to write
>>> docValues
>>> > >> >>> > IW 0 [2022-09-13T02:42:53.331320146Z; main]: 0 msec to write
>>> points
>>> > >> >>> > IW 0 [2022-09-13T02:42:56.424195657Z; main]: 3092 msec to
>>> write vectors
>>> > >> >>> > IW 0 [2022-09-13T02:42:56.429239944Z; main]: 4 msec to finish
>>> stored fields
>>> > >> >>> > IW 0 [2022-09-13T02:42:56.429593512Z; main]: 0 msec to write
>>> postings
>>> > >> >>> > and finish vectors
>>> > >> >>> > IW 0 [2022-09-13T02:42:56.430309031Z; main]: 0 msec to write
>>> fieldInfos
>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.431721622Z; main]: new segment
>>> has 0 deleted docs
>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.431921144Z; main]: new segment
>>> has 0
>>> > >> >>> > soft-deleted docs
>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.435738086Z; main]: new segment
>>> has no
>>> > >> >>> > vectors; no norms; no docValues; no prox; freqs
>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.435952356Z; main]:
>>> > >> >>> > flushedFiles=[_6_Lucene94HnswVectorsFormat_0.vec, _6.fdm,
>>> _6.fdt, _6_\
>>> > >> >>> > Lucene94HnswVectorsFormat_0.vem, _6.fnm, _6.fdx,
>>> > >> >>> > _6_Lucene94HnswVectorsFormat_0.vex]
>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.436121861Z; main]: flushed
>>> codec=Lucene94
>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.437691468Z; main]: flushed:
>>> segment=_6
>>> > >> >>> > ramUsed=1,945.002 MB newFlushedSize=1,065.701 MB \
>>> > >> >>> > docs/MB=521.134
>>> > >> >>> >
>>> > >> >>> > so I think it's this perThreadHardLimit that is triggering the
>>> > >> >>> > flushes? TBH this isn't something I had seen before; but the
>>> docs say:
>>> > >> >>> >
>>> > >> >>> > /**
>>> > >> >>> > * Expert: Sets the maximum memory consumption per thread
>>> triggering
>>> > >> >>> > a forced flush if exceeded. A
>>> > >> >>> > * {@link DocumentsWriterPerThread} is forcefully flushed
>>> once it
>>> > >> >>> > exceeds this limit even if the
>>> > >> >>> > * {@link #getRAMBufferSizeMB()} has not been exceeded.
>>> This is a
>>> > >> >>> > safety limit to prevent a {@link
>>> > >> >>> > * DocumentsWriterPerThread} from address space exhaustion
>>> due to
>>> > >> >>> > its internal 32 bit signed
>>> > >> >>> > * integer based memory addressing. The given value must be
>>> less
>>> > >> >>> > that 2GB (2048MB)
>>> > >> >>> > *
>>> > >> >>> > * @see #DEFAULT_RAM_PER_THREAD_HARD_LIMIT_MB
>>> > >> >>> > */
>>> > >> >>> >
>>> > >> >>> > On Mon, Sep 12, 2022 at 6:28 PM Michael Sokolov <
>>> msokolov@gmail.com> wrote:
>>> > >> >>> > >
>>> > >> >>> > > Hi Mayya, thanks for persisting - I think we need to
>>> wrestle this to
>>> > >> >>> > > the ground for sure. In the test I ran, RAM buffer was the
>>> default
>>> > >> >>> > > checked in, which is weirdly: 1994MB. I did not
>>> specifically set heap
>>> > >> >>> > > size. I used maxConn/M=200. I'll try with larger buffer to
>>> see if I
>>> > >> >>> > > can get 9.4 to produce a single segment for the same test
>>> settings. I
>>> > >> >>> > > see you used a much smaller M (16), which should have
>>> produced quite
>>> > >> >>> > > small graphs, and I agree, should have been a single
>>> segment. Were you
>>> > >> >>> > > able to verify the number of segments?
>>> > >> >>> > >
>>> > >> >>> > > Agree that decrease in recall is not expected when more
>>> segments are produced.
>>> > >> >>> > >
>>> > >> >>> > > On Mon, Sep 12, 2022 at 1:51 PM Mayya Sharipova
>>> > >> >>> > > <mayya.sharipova@elastic.co.invalid> wrote:
>>> > >> >>> > > >
>>> > >> >>> > > > Hello Michael,
>>> > >> >>> > > > Thanks for checking.
>>> > >> >>> > > > Sorry for bringing this up again.
>>> > >> >>> > > > First of all, I am ok with proceeding with the Lucene 9.4
>>> release and leaving the performance investigations for later.
>>> > >> >>> > > >
>>> > >> >>> > > > I am interested in what's the maxConn/M value you used
>>> for your tests? What was the heap memory and the size of the RAM buffer for
>>> indexing?
>>> > >> >>> > > > Usually, when we have multiple segments, recall should
>>> increase, not decrease. But I agree that with multiple segments we can see
>>> a big drop in QPS.
>>> > >> >>> > > >
>>> > >> >>> > > > Here is my investigation with detailed output of the
>>> performance difference between 9.3 and 9.4 releases. In my tests I used a
>>> large indexing buffer (2Gb) and large heap (5Gb) to end up with a single
>>> segment for both 9.3 and 9.4 tests, but still see a big drop in QPS in 9.4.
>>> > >> >>> > > >
>>> > >> >>> > > > Thank you.
>>> > >> >>> > > >
>>> > >> >>> > > >
>>> > >> >>> > > >
>>> > >> >>> > > >
>>> > >> >>> > > >
>>> > >> >>> > > > On Fri, Sep 9, 2022 at 12:21 PM Alan Woodward <
>>> romseygeek@gmail.com> wrote:
>>> > >> >>> > > >>
>>> > >> >>> > > >> Done. Thanks!
>>> > >> >>> > > >>
>>> > >> >>> > > >> > On 9 Sep 2022, at 16:32, Michael Sokolov <
>>> msokolov@gmail.com> wrote:
>>> > >> >>> > > >> >
>>> > >> >>> > > >> > Hi Alan - I checked out the interval queries patch;
>>> seems pretty safe,
>>> > >> >>> > > >> > please go ahead and port to 9.4. Thanks!
>>> > >> >>> > > >> >
>>> > >> >>> > > >> > Mike
>>> > >> >>> > > >> >
>>> > >> >>> > > >> > On Fri, Sep 9, 2022 at 10:41 AM Alan Woodward <
>>> romseygeek@gmail.com> wrote:
>>> > >> >>> > > >> >>
>>> > >> >>> > > >> >> Hi Mike,
>>> > >> >>> > > >> >>
>>> > >> >>> > > >> >> I’ve opened
>>> https://github.com/apache/lucene/pull/11760 as a small bug fix PR for a
>>> problem with interval queries. Am I OK to port this to the 9.4 branch?
>>> > >> >>> > > >> >>
>>> > >> >>> > > >> >> Thanks, Alan
>>> > >> >>> > > >> >>
>>> > >> >>> > > >> >> On 2 Sep 2022, at 20:42, Michael Sokolov <
>>> msokolov@gmail.com> wrote:
>>> > >> >>> > > >> >>
>>> > >> >>> > > >> >> NOTICE:
>>> > >> >>> > > >> >>
>>> > >> >>> > > >> >> Branch branch_9_4 has been cut and versions updated
>>> to 9.5 on stable branch.
>>> > >> >>> > > >> >>
>>> > >> >>> > > >> >> Please observe the normal rules:
>>> > >> >>> > > >> >>
>>> > >> >>> > > >> >> * No new features may be committed to the branch.
>>> > >> >>> > > >> >> * Documentation patches, build patches and serious
>>> bug fixes may be
>>> > >> >>> > > >> >> committed to the branch. However, you should submit
>>> all patches you
>>> > >> >>> > > >> >> want to commit to Jira first to give others the
>>> chance to review
>>> > >> >>> > > >> >> and possibly vote against the patch. Keep in mind
>>> that it is our
>>> > >> >>> > > >> >> main intention to keep the branch as stable as
>>> possible.
>>> > >> >>> > > >> >> * All patches that are intended for the branch should
>>> first be committed
>>> > >> >>> > > >> >> to the unstable branch, merged into the stable
>>> branch, and then into
>>> > >> >>> > > >> >> the current release branch.
>>> > >> >>> > > >> >> * Normal unstable and stable branch development may
>>> continue as usual.
>>> > >> >>> > > >> >> However, if you plan to commit a big change to the
>>> unstable branch
>>> > >> >>> > > >> >> while the branch feature freeze is in effect, think
>>> twice: can't the
>>> > >> >>> > > >> >> addition wait a couple more days? Merges of bug fixes
>>> into the branch
>>> > >> >>> > > >> >> may become more difficult.
>>> > >> >>> > > >> >> * Only Jira issues with Fix version 9.4 and priority
>>> "Blocker" will delay
>>> > >> >>> > > >> >> a release candidate build.
>>> > >> >>> > > >> >>
>>> > >> >>> > > >> >>
>>> ---------------------------------------------------------------------
>>> > >> >>> > > >> >> To unsubscribe, e-mail:
>>> dev-unsubscribe@lucene.apache.org
>>> > >> >>> > > >> >> For additional commands, e-mail:
>>> dev-help@lucene.apache.org
>>> > >> >>> > > >> >>
>>> > >> >>> > > >> >>
>>> > >> >>> > > >> >
>>> > >> >>> > > >> >
>>> ---------------------------------------------------------------------
>>> > >> >>> > > >> > To unsubscribe, e-mail:
>>> dev-unsubscribe@lucene.apache.org
>>> > >> >>> > > >> > For additional commands, e-mail:
>>> dev-help@lucene.apache.org
>>> > >> >>> > > >> >
>>> > >> >>> > > >>
>>> > >> >>> > > >>
>>> > >> >>> > > >>
>>> ---------------------------------------------------------------------
>>> > >> >>> > > >> To unsubscribe, e-mail:
>>> dev-unsubscribe@lucene.apache.org
>>> > >> >>> > > >> For additional commands, e-mail:
>>> dev-help@lucene.apache.org
>>> > >> >>> > > >>
>>> > >> >>>
>>> > >> >>>
>>> ---------------------------------------------------------------------
>>> > >> >>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>> > >> >>> For additional commands, e-mail: dev-help@lucene.apache.org
>>> > >> >>>
>>> > >> >
>>> > >> >
>>> > >> > --
>>> > >> > Adrien
>>> > >>
>>> > >>
>>> ---------------------------------------------------------------------
>>> > >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>> > >> For additional commands, e-mail: dev-help@lucene.apache.org
>>> > >>
>>> > >
>>> > >
>>> > > --
>>> > > Adrien
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>
>>>

Re: Subject: New branch and feature freeze for Lucene 9.4.0 [ In reply to ]

Sep 19, 2022, 3:45 PM

Post #27 of 37 (415 views)

I'm confused, since warming should not be counted in the timings. Are you
saying that the recall was affected??

On Mon, Sep 19, 2022 at 6:12 PM Julie Tibshirani <julietibs@gmail.com>
wrote:

> Using the ann-benchmarks framework, I still saw a similar regression as
> Mayya between 9.3 and 9.4. I investigated and found it was due to
> "KnnGraphTester to use KnnVectorQuery" (
> https://github.com/apache/lucene/pull/796), specifically the change to
> the warm-up strategy. If I revert it, the results look exactly as expected.
>
> I guess we can keep an eye on the nightly benchmarks tomorrow to
> double-check there's no drop. It would also be nice to formalize the
> ann-benchmarks set-up and run it regularly (like we've discussed in
> https://github.com/apache/lucene/issues/10665).
>
> Julie
>
> On Mon, Sep 19, 2022 at 10:33 AM Michael Sokolov <msokolov@gmail.com>
> wrote:
>
>> Thanks for your speedy testing! I am observing comparable latencies *when
>> the index geometry (ie number of segments)* is unchanged. Agree we can
>> leave this for a later day. I'll proceed to cut 9.4 artifacts
>>
>> On Mon, Sep 19, 2022 at 11:02 AM Mayya Sharipova
>> <mayya.sharipova@elastic.co.invalid> wrote:
>>
>>> It would be great if you all are able to test again with
>>>> https://github.com/apache/lucene/pull/11781/ applied
>>>
>>>
>>>
>>> I ran the ann benchmarks with this change, and was happy to confirm
>>> that in my test recall with this PR is the same as in 9.3 branch, although
>>> QPS is lower, but we can investigate QPSs later.
>>>
>>> glove-100-angular M:16 efConstruction:100
>>> 9.3 recall9.3 QPSthis PR recallthis PR QPS
>>> n_cands=10 0.620 2745.933 0.620 1675.500
>>> n_cands=20 0.680 2288.665 0.680 1512.744
>>> n_cands=40 0.746 1770.243 0.746 1040.240
>>> n_cands=80 0.809 1226.738 0.809 695.236
>>> n_cands=120 0.843 948.908 0.843 525.914
>>> n_cands=200 0.878 671.781 0.878 351.529
>>> n_cands=400 0.918 392.265 0.918 207.854
>>> n_cands=600 0.937 282.403 0.937 144.311
>>> n_cands=800 0.949 214.620 0.949 116.875
>>>
>>> On Sun, Sep 18, 2022 at 6:25 PM Michael Sokolov <msokolov@gmail.com>
>>> wrote:
>>>
>>>> OK, I think I was wrong about latency having increased due to a change
>>>> in KnnGraphTester -- I did some testing there and couldn't reproduce.
>>>> There does seem to be a slight vector search latency increase,
>>>> possibly noise, but maybe due to the branching introduced to check
>>>> whether to do byte vs float operations? It would be a little
>>>> surprising if that were the case given the small number of branchings
>>>> compared to the number of multiplies in dot-product though.
>>>>
>>>> On Sun, Sep 18, 2022 at 3:25 PM Michael Sokolov <msokolov@gmail.com>
>>>> wrote:
>>>> >
>>>> > Thanks for the deep-dive Julie. I was able to reproduce the changing
>>>> > recall. I had introduced some bugs in the diversity checks (that may
>>>> > have partially canceled each other out? it's hard to understand what
>>>> > was happening in the buggy case) and posted a fix today
>>>> > https://github.com/apache/lucene/pull/11781.
>>>> >
>>>> > There are a couple of other outstanding issues I found while doing a
>>>> > bunch of git bisecting;
>>>> >
>>>> > I think we might have introduced a (test-only) performance regression
>>>> > in KnnGraphTester
>>>> >
>>>> > We may still be over-allocating the size of NeighborArray, leading to
>>>> > excessive segmentation? I wonder if we could avoid dynamic
>>>> > re-allocation there, and simply initialize every neighbor array to
>>>> > 2*M+1.
>>>> >
>>>> > While I don't think these are necessarily blockers, given that we are
>>>> > releasing HNSW improvements, it seems like we should address these,
>>>> > especially as the build-graph-on-index is one of the things we are
>>>> > releasing, and it is (may be?) impacted. I will see if I can put up a
>>>> > patch or two.
>>>> >
>>>> > It would be great if you all are able to test again with
>>>> > https://github.com/apache/lucene/pull/11781/ applied
>>>> >
>>>> > -Mike
>>>> >
>>>> > On Fri, Sep 16, 2022 at 11:07 AM Adrien Grand <jpountz@gmail.com>
>>>> wrote:
>>>> > >
>>>> > > Thank you Mike, I just backported the change.
>>>> > >
>>>> > > On Thu, Sep 15, 2022 at 6:32 PM Michael Sokolov <msokolov@gmail.com>
>>>> wrote:
>>>> > >>
>>>> > >> it looks like a small bug fix, we have had on main (and 9.x?) for a
>>>> > >> while now and no test failures showed up, I guess. Should be OK to
>>>> > >> port. I plan to cut artifacts this weekend, or Monday at the
>>>> latest,
>>>> > >> but if you can do the backport today or tomorrow, that's fine by
>>>> me.
>>>> > >>
>>>> > >> On Thu, Sep 15, 2022 at 10:55 AM Adrien Grand <jpountz@gmail.com>
>>>> wrote:
>>>> > >> >
>>>> > >> > Mike, I'm tempted to backport
>>>> https://github.com/apache/lucene/pull/1068 to branch_9_4, which is a
>>>> bugfix that looks pretty safe to me. What do you think?
>>>> > >> >
>>>> > >> > On Tue, Sep 13, 2022 at 4:11 PM Mayya Sharipova <
>>>> mayya.sharipova@elastic.co.invalid> wrote:
>>>> > >> >>
>>>> > >> >> Thanks for running more tests, Michael.
>>>> > >> >> It is encouraging that you saw a similar performance between
>>>> 9.3 and 9.4. I will also run more tests with different parameters.
>>>> > >> >>
>>>> > >> >> On Tue, Sep 13, 2022 at 9:30 AM Michael Sokolov <
>>>> msokolov@gmail.com> wrote:
>>>> > >> >>>
>>>> > >> >>> As a follow-up, I ran a test using the same parameters as
>>>> above, only
>>>> > >> >>> changing M=200 to M=16. This did result in a single segment in
>>>> both
>>>> > >> >>> cases (9.3, 9.4) and the performance was pretty similar;
>>>> within noise
>>>> > >> >>> I think. The main difference I saw was that the 9.3 index was
>>>> written
>>>> > >> >>> using CFS:
>>>> > >> >>>
>>>> > >> >>> 9.4:
>>>> > >> >>> recall latency nDoc fanout maxConn beamWidth
>>>> visited index ms
>>>> > >> >>> 0.755 1.36 1000000 100 16 100 200
>>>> 891402 1.00
>>>> > >> >>> post-filter
>>>> > >> >>> -rw-r--r-- 1 sokolovm amazon 382M Sep 13 13:06
>>>> > >> >>> _0_Lucene94HnswVectorsFormat_0.vec
>>>> > >> >>> -rw-r--r-- 1 sokolovm amazon 262K Sep 13 13:06
>>>> > >> >>> _0_Lucene94HnswVectorsFormat_0.vem
>>>> > >> >>> -rw-r--r-- 1 sokolovm amazon 131M Sep 13 13:06
>>>> > >> >>> _0_Lucene94HnswVectorsFormat_0.vex
>>>> > >> >>>
>>>> > >> >>> 9.3:
>>>> > >> >>> recall latency nDoc fanout maxConn beamWidth
>>>> visited index ms
>>>> > >> >>> 0.775 1.34 1000000 100 16 100 4033 977043
>>>> > >> >>> rw-r--r-- 1 sokolovm amazon 297 Sep 13 13:26 _0.cfe
>>>> > >> >>> -rw-r--r-- 1 sokolovm amazon 516M Sep 13 13:26 _0.cfs
>>>> > >> >>> -rw-r--r-- 1 sokolovm amazon 340 Sep 13 13:26 _0.si
>>>> > >> >>>
>>>> > >> >>> On Tue, Sep 13, 2022 at 8:50 AM Michael Sokolov <
>>>> msokolov@gmail.com> wrote:
>>>> > >> >>> >
>>>> > >> >>> > I ran another test. I thought I had increased the RAM buffer
>>>> size to
>>>> > >> >>> > 8G and heap to 16G. However I still see two segments in the
>>>> index that
>>>> > >> >>> > was created. And looking at the infostream I see:
>>>> > >> >>> >
>>>> > >> >>> > dir=MMapDirectory@
>>>> /local/home/sokolovm/workspace/knn-perf/glove-100-angular.hdf5-train-200-200.index
>>>> > >> >>> > lockFactory=org\
>>>> > >> >>> > .apache.lucene.store.NativeFSLockFactory@4466af20
>>>> > >> >>> > index=
>>>> > >> >>> > version=9.4.0
>>>> > >> >>> > analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
>>>> > >> >>> > ramBufferSizeMB=8000.0
>>>> > >> >>> > maxBufferedDocs=-1
>>>> > >> >>> > ...
>>>> > >> >>> > perThreadHardLimitMB=1945
>>>> > >> >>> > ...
>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:53.329404950Z; main]: flush
>>>> postings as
>>>> > >> >>> > segment _6 numDocs=555373
>>>> > >> >>> > IW 0 [2022-09-13T02:42:53.330671171Z; main]: 0 msec to write
>>>> norms
>>>> > >> >>> > IW 0 [2022-09-13T02:42:53.331113184Z; main]: 0 msec to write
>>>> docValues
>>>> > >> >>> > IW 0 [2022-09-13T02:42:53.331320146Z; main]: 0 msec to write
>>>> points
>>>> > >> >>> > IW 0 [2022-09-13T02:42:56.424195657Z; main]: 3092 msec to
>>>> write vectors
>>>> > >> >>> > IW 0 [2022-09-13T02:42:56.429239944Z; main]: 4 msec to
>>>> finish stored fields
>>>> > >> >>> > IW 0 [2022-09-13T02:42:56.429593512Z; main]: 0 msec to write
>>>> postings
>>>> > >> >>> > and finish vectors
>>>> > >> >>> > IW 0 [2022-09-13T02:42:56.430309031Z; main]: 0 msec to write
>>>> fieldInfos
>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.431721622Z; main]: new segment
>>>> has 0 deleted docs
>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.431921144Z; main]: new segment
>>>> has 0
>>>> > >> >>> > soft-deleted docs
>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.435738086Z; main]: new segment
>>>> has no
>>>> > >> >>> > vectors; no norms; no docValues; no prox; freqs
>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.435952356Z; main]:
>>>> > >> >>> > flushedFiles=[_6_Lucene94HnswVectorsFormat_0.vec, _6.fdm,
>>>> _6.fdt, _6_\
>>>> > >> >>> > Lucene94HnswVectorsFormat_0.vem, _6.fnm, _6.fdx,
>>>> > >> >>> > _6_Lucene94HnswVectorsFormat_0.vex]
>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.436121861Z; main]: flushed
>>>> codec=Lucene94
>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.437691468Z; main]: flushed:
>>>> segment=_6
>>>> > >> >>> > ramUsed=1,945.002 MB newFlushedSize=1,065.701 MB \
>>>> > >> >>> > docs/MB=521.134
>>>> > >> >>> >
>>>> > >> >>> > so I think it's this perThreadHardLimit that is triggering
>>>> the
>>>> > >> >>> > flushes? TBH this isn't something I had seen before; but the
>>>> docs say:
>>>> > >> >>> >
>>>> > >> >>> > /**
>>>> > >> >>> > * Expert: Sets the maximum memory consumption per thread
>>>> triggering
>>>> > >> >>> > a forced flush if exceeded. A
>>>> > >> >>> > * {@link DocumentsWriterPerThread} is forcefully flushed
>>>> once it
>>>> > >> >>> > exceeds this limit even if the
>>>> > >> >>> > * {@link #getRAMBufferSizeMB()} has not been exceeded.
>>>> This is a
>>>> > >> >>> > safety limit to prevent a {@link
>>>> > >> >>> > * DocumentsWriterPerThread} from address space exhaustion
>>>> due to
>>>> > >> >>> > its internal 32 bit signed
>>>> > >> >>> > * integer based memory addressing. The given value must
>>>> be less
>>>> > >> >>> > that 2GB (2048MB)
>>>> > >> >>> > *
>>>> > >> >>> > * @see #DEFAULT_RAM_PER_THREAD_HARD_LIMIT_MB
>>>> > >> >>> > */
>>>> > >> >>> >
>>>> > >> >>> > On Mon, Sep 12, 2022 at 6:28 PM Michael Sokolov <
>>>> msokolov@gmail.com> wrote:
>>>> > >> >>> > >
>>>> > >> >>> > > Hi Mayya, thanks for persisting - I think we need to
>>>> wrestle this to
>>>> > >> >>> > > the ground for sure. In the test I ran, RAM buffer was the
>>>> default
>>>> > >> >>> > > checked in, which is weirdly: 1994MB. I did not
>>>> specifically set heap
>>>> > >> >>> > > size. I used maxConn/M=200. I'll try with larger buffer
>>>> to see if I
>>>> > >> >>> > > can get 9.4 to produce a single segment for the same test
>>>> settings. I
>>>> > >> >>> > > see you used a much smaller M (16), which should have
>>>> produced quite
>>>> > >> >>> > > small graphs, and I agree, should have been a single
>>>> segment. Were you
>>>> > >> >>> > > able to verify the number of segments?
>>>> > >> >>> > >
>>>> > >> >>> > > Agree that decrease in recall is not expected when more
>>>> segments are produced.
>>>> > >> >>> > >
>>>> > >> >>> > > On Mon, Sep 12, 2022 at 1:51 PM Mayya Sharipova
>>>> > >> >>> > > <mayya.sharipova@elastic.co.invalid> wrote:
>>>> > >> >>> > > >
>>>> > >> >>> > > > Hello Michael,
>>>> > >> >>> > > > Thanks for checking.
>>>> > >> >>> > > > Sorry for bringing this up again.
>>>> > >> >>> > > > First of all, I am ok with proceeding with the Lucene
>>>> 9.4 release and leaving the performance investigations for later.
>>>> > >> >>> > > >
>>>> > >> >>> > > > I am interested in what's the maxConn/M value you used
>>>> for your tests? What was the heap memory and the size of the RAM buffer for
>>>> indexing?
>>>> > >> >>> > > > Usually, when we have multiple segments, recall should
>>>> increase, not decrease. But I agree that with multiple segments we can see
>>>> a big drop in QPS.
>>>> > >> >>> > > >
>>>> > >> >>> > > > Here is my investigation with detailed output of the
>>>> performance difference between 9.3 and 9.4 releases. In my tests I used a
>>>> large indexing buffer (2Gb) and large heap (5Gb) to end up with a single
>>>> segment for both 9.3 and 9.4 tests, but still see a big drop in QPS in 9.4.
>>>> > >> >>> > > >
>>>> > >> >>> > > > Thank you.
>>>> > >> >>> > > >
>>>> > >> >>> > > >
>>>> > >> >>> > > >
>>>> > >> >>> > > >
>>>> > >> >>> > > >
>>>> > >> >>> > > > On Fri, Sep 9, 2022 at 12:21 PM Alan Woodward <
>>>> romseygeek@gmail.com> wrote:
>>>> > >> >>> > > >>
>>>> > >> >>> > > >> Done. Thanks!
>>>> > >> >>> > > >>
>>>> > >> >>> > > >> > On 9 Sep 2022, at 16:32, Michael Sokolov <
>>>> msokolov@gmail.com> wrote:
>>>> > >> >>> > > >> >
>>>> > >> >>> > > >> > Hi Alan - I checked out the interval queries patch;
>>>> seems pretty safe,
>>>> > >> >>> > > >> > please go ahead and port to 9.4. Thanks!
>>>> > >> >>> > > >> >
>>>> > >> >>> > > >> > Mike
>>>> > >> >>> > > >> >
>>>> > >> >>> > > >> > On Fri, Sep 9, 2022 at 10:41 AM Alan Woodward <
>>>> romseygeek@gmail.com> wrote:
>>>> > >> >>> > > >> >>
>>>> > >> >>> > > >> >> Hi Mike,
>>>> > >> >>> > > >> >>
>>>> > >> >>> > > >> >> I’ve opened
>>>> https://github.com/apache/lucene/pull/11760 as a small bug fix PR for
>>>> a problem with interval queries. Am I OK to port this to the 9.4 branch?
>>>> > >> >>> > > >> >>
>>>> > >> >>> > > >> >> Thanks, Alan
>>>> > >> >>> > > >> >>
>>>> > >> >>> > > >> >> On 2 Sep 2022, at 20:42, Michael Sokolov <
>>>> msokolov@gmail.com> wrote:
>>>> > >> >>> > > >> >>
>>>> > >> >>> > > >> >> NOTICE:
>>>> > >> >>> > > >> >>
>>>> > >> >>> > > >> >> Branch branch_9_4 has been cut and versions updated
>>>> to 9.5 on stable branch.
>>>> > >> >>> > > >> >>
>>>> > >> >>> > > >> >> Please observe the normal rules:
>>>> > >> >>> > > >> >>
>>>> > >> >>> > > >> >> * No new features may be committed to the branch.
>>>> > >> >>> > > >> >> * Documentation patches, build patches and serious
>>>> bug fixes may be
>>>> > >> >>> > > >> >> committed to the branch. However, you should submit
>>>> all patches you
>>>> > >> >>> > > >> >> want to commit to Jira first to give others the
>>>> chance to review
>>>> > >> >>> > > >> >> and possibly vote against the patch. Keep in mind
>>>> that it is our
>>>> > >> >>> > > >> >> main intention to keep the branch as stable as
>>>> possible.
>>>> > >> >>> > > >> >> * All patches that are intended for the branch
>>>> should first be committed
>>>> > >> >>> > > >> >> to the unstable branch, merged into the stable
>>>> branch, and then into
>>>> > >> >>> > > >> >> the current release branch.
>>>> > >> >>> > > >> >> * Normal unstable and stable branch development may
>>>> continue as usual.
>>>> > >> >>> > > >> >> However, if you plan to commit a big change to the
>>>> unstable branch
>>>> > >> >>> > > >> >> while the branch feature freeze is in effect, think
>>>> twice: can't the
>>>> > >> >>> > > >> >> addition wait a couple more days? Merges of bug
>>>> fixes into the branch
>>>> > >> >>> > > >> >> may become more difficult.
>>>> > >> >>> > > >> >> * Only Jira issues with Fix version 9.4 and priority
>>>> "Blocker" will delay
>>>> > >> >>> > > >> >> a release candidate build.
>>>> > >> >>> > > >> >>
>>>> > >> >>> > > >> >>
>>>> ---------------------------------------------------------------------
>>>> > >> >>> > > >> >> To unsubscribe, e-mail:
>>>> dev-unsubscribe@lucene.apache.org
>>>> > >> >>> > > >> >> For additional commands, e-mail:
>>>> dev-help@lucene.apache.org
>>>> > >> >>> > > >> >>
>>>> > >> >>> > > >> >>
>>>> > >> >>> > > >> >
>>>> > >> >>> > > >> >
>>>> ---------------------------------------------------------------------
>>>> > >> >>> > > >> > To unsubscribe, e-mail:
>>>> dev-unsubscribe@lucene.apache.org
>>>> > >> >>> > > >> > For additional commands, e-mail:
>>>> dev-help@lucene.apache.org
>>>> > >> >>> > > >> >
>>>> > >> >>> > > >>
>>>> > >> >>> > > >>
>>>> > >> >>> > > >>
>>>> ---------------------------------------------------------------------
>>>> > >> >>> > > >> To unsubscribe, e-mail:
>>>> dev-unsubscribe@lucene.apache.org
>>>> > >> >>> > > >> For additional commands, e-mail:
>>>> dev-help@lucene.apache.org
>>>> > >> >>> > > >>
>>>> > >> >>>
>>>> > >> >>>
>>>> ---------------------------------------------------------------------
>>>> > >> >>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>> > >> >>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>> > >> >>>
>>>> > >> >
>>>> > >> >
>>>> > >> > --
>>>> > >> > Adrien
>>>> > >>
>>>> > >>
>>>> ---------------------------------------------------------------------
>>>> > >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>> > >> For additional commands, e-mail: dev-help@lucene.apache.org
>>>> > >>
>>>> > >
>>>> > >
>>>> > > --
>>>> > > Adrien
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>
>>>>

Re: Subject: New branch and feature freeze for Lucene 9.4.0 [ In reply to ]

Sep 19, 2022, 4:25 PM

Post #28 of 37 (415 views)

Sorry for the confusion. To explain, I use a local ann-benchmarks set-up
that makes use of KnnGraphTester. It is a bit hacky and I accidentally
included the warm-ups in the final timings. So the change to warm-up
explains why we saw different results in our tests. This is great
motivation to solidify and publish my local ann-benchmarks set-up so that
it's not so fragile!

In summary, with your latest fix the recall and QPS look good to me -- I
don't detect any regression between 9.3 and 9.4.

Julie

On Mon, Sep 19, 2022 at 3:45 PM Michael Sokolov <msokolov@gmail.com> wrote:

> I'm confused, since warming should not be counted in the timings. Are you
> saying that the recall was affected??
>
> On Mon, Sep 19, 2022 at 6:12 PM Julie Tibshirani <julietibs@gmail.com>
> wrote:
>
>> Using the ann-benchmarks framework, I still saw a similar regression as
>> Mayya between 9.3 and 9.4. I investigated and found it was due to
>> "KnnGraphTester to use KnnVectorQuery" (
>> https://github.com/apache/lucene/pull/796), specifically the change to
>> the warm-up strategy. If I revert it, the results look exactly as expected.
>>
>> I guess we can keep an eye on the nightly benchmarks tomorrow to
>> double-check there's no drop. It would also be nice to formalize the
>> ann-benchmarks set-up and run it regularly (like we've discussed in
>> https://github.com/apache/lucene/issues/10665).
>>
>> Julie
>>
>> On Mon, Sep 19, 2022 at 10:33 AM Michael Sokolov <msokolov@gmail.com>
>> wrote:
>>
>>> Thanks for your speedy testing! I am observing comparable latencies
>>> *when the index geometry (ie number of segments)* is unchanged. Agree we
>>> can leave this for a later day. I'll proceed to cut 9.4 artifacts
>>>
>>> On Mon, Sep 19, 2022 at 11:02 AM Mayya Sharipova
>>> <mayya.sharipova@elastic.co.invalid> wrote:
>>>
>>>> It would be great if you all are able to test again with
>>>>> https://github.com/apache/lucene/pull/11781/ applied
>>>>
>>>>
>>>>
>>>> I ran the ann benchmarks with this change, and was happy to confirm
>>>> that in my test recall with this PR is the same as in 9.3 branch, although
>>>> QPS is lower, but we can investigate QPSs later.
>>>>
>>>> glove-100-angular M:16 efConstruction:100
>>>> 9.3 recall9.3 QPSthis PR recallthis PR QPS
>>>> n_cands=10 0.620 2745.933 0.620 1675.500
>>>> n_cands=20 0.680 2288.665 0.680 1512.744
>>>> n_cands=40 0.746 1770.243 0.746 1040.240
>>>> n_cands=80 0.809 1226.738 0.809 695.236
>>>> n_cands=120 0.843 948.908 0.843 525.914
>>>> n_cands=200 0.878 671.781 0.878 351.529
>>>> n_cands=400 0.918 392.265 0.918 207.854
>>>> n_cands=600 0.937 282.403 0.937 144.311
>>>> n_cands=800 0.949 214.620 0.949 116.875
>>>>
>>>> On Sun, Sep 18, 2022 at 6:25 PM Michael Sokolov <msokolov@gmail.com>
>>>> wrote:
>>>>
>>>>> OK, I think I was wrong about latency having increased due to a change
>>>>> in KnnGraphTester -- I did some testing there and couldn't reproduce.
>>>>> There does seem to be a slight vector search latency increase,
>>>>> possibly noise, but maybe due to the branching introduced to check
>>>>> whether to do byte vs float operations? It would be a little
>>>>> surprising if that were the case given the small number of branchings
>>>>> compared to the number of multiplies in dot-product though.
>>>>>
>>>>> On Sun, Sep 18, 2022 at 3:25 PM Michael Sokolov <msokolov@gmail.com>
>>>>> wrote:
>>>>> >
>>>>> > Thanks for the deep-dive Julie. I was able to reproduce the changing
>>>>> > recall. I had introduced some bugs in the diversity checks (that may
>>>>> > have partially canceled each other out? it's hard to understand what
>>>>> > was happening in the buggy case) and posted a fix today
>>>>> > https://github.com/apache/lucene/pull/11781.
>>>>> >
>>>>> > There are a couple of other outstanding issues I found while doing a
>>>>> > bunch of git bisecting;
>>>>> >
>>>>> > I think we might have introduced a (test-only) performance regression
>>>>> > in KnnGraphTester
>>>>> >
>>>>> > We may still be over-allocating the size of NeighborArray, leading to
>>>>> > excessive segmentation? I wonder if we could avoid dynamic
>>>>> > re-allocation there, and simply initialize every neighbor array to
>>>>> > 2*M+1.
>>>>> >
>>>>> > While I don't think these are necessarily blockers, given that we are
>>>>> > releasing HNSW improvements, it seems like we should address these,
>>>>> > especially as the build-graph-on-index is one of the things we are
>>>>> > releasing, and it is (may be?) impacted. I will see if I can put up a
>>>>> > patch or two.
>>>>> >
>>>>> > It would be great if you all are able to test again with
>>>>> > https://github.com/apache/lucene/pull/11781/ applied
>>>>> >
>>>>> > -Mike
>>>>> >
>>>>> > On Fri, Sep 16, 2022 at 11:07 AM Adrien Grand <jpountz@gmail.com>
>>>>> wrote:
>>>>> > >
>>>>> > > Thank you Mike, I just backported the change.
>>>>> > >
>>>>> > > On Thu, Sep 15, 2022 at 6:32 PM Michael Sokolov <
>>>>> msokolov@gmail.com> wrote:
>>>>> > >>
>>>>> > >> it looks like a small bug fix, we have had on main (and 9.x?) for
>>>>> a
>>>>> > >> while now and no test failures showed up, I guess. Should be OK to
>>>>> > >> port. I plan to cut artifacts this weekend, or Monday at the
>>>>> latest,
>>>>> > >> but if you can do the backport today or tomorrow, that's fine by
>>>>> me.
>>>>> > >>
>>>>> > >> On Thu, Sep 15, 2022 at 10:55 AM Adrien Grand <jpountz@gmail.com>
>>>>> wrote:
>>>>> > >> >
>>>>> > >> > Mike, I'm tempted to backport
>>>>> https://github.com/apache/lucene/pull/1068 to branch_9_4, which is a
>>>>> bugfix that looks pretty safe to me. What do you think?
>>>>> > >> >
>>>>> > >> > On Tue, Sep 13, 2022 at 4:11 PM Mayya Sharipova <
>>>>> mayya.sharipova@elastic.co.invalid> wrote:
>>>>> > >> >>
>>>>> > >> >> Thanks for running more tests, Michael.
>>>>> > >> >> It is encouraging that you saw a similar performance between
>>>>> 9.3 and 9.4. I will also run more tests with different parameters.
>>>>> > >> >>
>>>>> > >> >> On Tue, Sep 13, 2022 at 9:30 AM Michael Sokolov <
>>>>> msokolov@gmail.com> wrote:
>>>>> > >> >>>
>>>>> > >> >>> As a follow-up, I ran a test using the same parameters as
>>>>> above, only
>>>>> > >> >>> changing M=200 to M=16. This did result in a single segment
>>>>> in both
>>>>> > >> >>> cases (9.3, 9.4) and the performance was pretty similar;
>>>>> within noise
>>>>> > >> >>> I think. The main difference I saw was that the 9.3 index was
>>>>> written
>>>>> > >> >>> using CFS:
>>>>> > >> >>>
>>>>> > >> >>> 9.4:
>>>>> > >> >>> recall latency nDoc fanout maxConn beamWidth
>>>>> visited index ms
>>>>> > >> >>> 0.755 1.36 1000000 100 16 100 200
>>>>> 891402 1.00
>>>>> > >> >>> post-filter
>>>>> > >> >>> -rw-r--r-- 1 sokolovm amazon 382M Sep 13 13:06
>>>>> > >> >>> _0_Lucene94HnswVectorsFormat_0.vec
>>>>> > >> >>> -rw-r--r-- 1 sokolovm amazon 262K Sep 13 13:06
>>>>> > >> >>> _0_Lucene94HnswVectorsFormat_0.vem
>>>>> > >> >>> -rw-r--r-- 1 sokolovm amazon 131M Sep 13 13:06
>>>>> > >> >>> _0_Lucene94HnswVectorsFormat_0.vex
>>>>> > >> >>>
>>>>> > >> >>> 9.3:
>>>>> > >> >>> recall latency nDoc fanout maxConn beamWidth
>>>>> visited index ms
>>>>> > >> >>> 0.775 1.34 1000000 100 16 100 4033 977043
>>>>> > >> >>> rw-r--r-- 1 sokolovm amazon 297 Sep 13 13:26 _0.cfe
>>>>> > >> >>> -rw-r--r-- 1 sokolovm amazon 516M Sep 13 13:26 _0.cfs
>>>>> > >> >>> -rw-r--r-- 1 sokolovm amazon 340 Sep 13 13:26 _0.si
>>>>> > >> >>>
>>>>> > >> >>> On Tue, Sep 13, 2022 at 8:50 AM Michael Sokolov <
>>>>> msokolov@gmail.com> wrote:
>>>>> > >> >>> >
>>>>> > >> >>> > I ran another test. I thought I had increased the RAM
>>>>> buffer size to
>>>>> > >> >>> > 8G and heap to 16G. However I still see two segments in the
>>>>> index that
>>>>> > >> >>> > was created. And looking at the infostream I see:
>>>>> > >> >>> >
>>>>> > >> >>> > dir=MMapDirectory@
>>>>> /local/home/sokolovm/workspace/knn-perf/glove-100-angular.hdf5-train-200-200.index
>>>>> > >> >>> > lockFactory=org\
>>>>> > >> >>> > .apache.lucene.store.NativeFSLockFactory@4466af20
>>>>> > >> >>> > index=
>>>>> > >> >>> > version=9.4.0
>>>>> > >> >>> >
>>>>> analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
>>>>> > >> >>> > ramBufferSizeMB=8000.0
>>>>> > >> >>> > maxBufferedDocs=-1
>>>>> > >> >>> > ...
>>>>> > >> >>> > perThreadHardLimitMB=1945
>>>>> > >> >>> > ...
>>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:53.329404950Z; main]: flush
>>>>> postings as
>>>>> > >> >>> > segment _6 numDocs=555373
>>>>> > >> >>> > IW 0 [2022-09-13T02:42:53.330671171Z; main]: 0 msec to
>>>>> write norms
>>>>> > >> >>> > IW 0 [2022-09-13T02:42:53.331113184Z; main]: 0 msec to
>>>>> write docValues
>>>>> > >> >>> > IW 0 [2022-09-13T02:42:53.331320146Z; main]: 0 msec to
>>>>> write points
>>>>> > >> >>> > IW 0 [2022-09-13T02:42:56.424195657Z; main]: 3092 msec to
>>>>> write vectors
>>>>> > >> >>> > IW 0 [2022-09-13T02:42:56.429239944Z; main]: 4 msec to
>>>>> finish stored fields
>>>>> > >> >>> > IW 0 [2022-09-13T02:42:56.429593512Z; main]: 0 msec to
>>>>> write postings
>>>>> > >> >>> > and finish vectors
>>>>> > >> >>> > IW 0 [2022-09-13T02:42:56.430309031Z; main]: 0 msec to
>>>>> write fieldInfos
>>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.431721622Z; main]: new segment
>>>>> has 0 deleted docs
>>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.431921144Z; main]: new segment
>>>>> has 0
>>>>> > >> >>> > soft-deleted docs
>>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.435738086Z; main]: new segment
>>>>> has no
>>>>> > >> >>> > vectors; no norms; no docValues; no prox; freqs
>>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.435952356Z; main]:
>>>>> > >> >>> > flushedFiles=[_6_Lucene94HnswVectorsFormat_0.vec, _6.fdm,
>>>>> _6.fdt, _6_\
>>>>> > >> >>> > Lucene94HnswVectorsFormat_0.vem, _6.fnm, _6.fdx,
>>>>> > >> >>> > _6_Lucene94HnswVectorsFormat_0.vex]
>>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.436121861Z; main]: flushed
>>>>> codec=Lucene94
>>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.437691468Z; main]: flushed:
>>>>> segment=_6
>>>>> > >> >>> > ramUsed=1,945.002 MB newFlushedSize=1,065.701 MB \
>>>>> > >> >>> > docs/MB=521.134
>>>>> > >> >>> >
>>>>> > >> >>> > so I think it's this perThreadHardLimit that is triggering
>>>>> the
>>>>> > >> >>> > flushes? TBH this isn't something I had seen before; but
>>>>> the docs say:
>>>>> > >> >>> >
>>>>> > >> >>> > /**
>>>>> > >> >>> > * Expert: Sets the maximum memory consumption per thread
>>>>> triggering
>>>>> > >> >>> > a forced flush if exceeded. A
>>>>> > >> >>> > * {@link DocumentsWriterPerThread} is forcefully flushed
>>>>> once it
>>>>> > >> >>> > exceeds this limit even if the
>>>>> > >> >>> > * {@link #getRAMBufferSizeMB()} has not been exceeded.
>>>>> This is a
>>>>> > >> >>> > safety limit to prevent a {@link
>>>>> > >> >>> > * DocumentsWriterPerThread} from address space
>>>>> exhaustion due to
>>>>> > >> >>> > its internal 32 bit signed
>>>>> > >> >>> > * integer based memory addressing. The given value must
>>>>> be less
>>>>> > >> >>> > that 2GB (2048MB)
>>>>> > >> >>> > *
>>>>> > >> >>> > * @see #DEFAULT_RAM_PER_THREAD_HARD_LIMIT_MB
>>>>> > >> >>> > */
>>>>> > >> >>> >
>>>>> > >> >>> > On Mon, Sep 12, 2022 at 6:28 PM Michael Sokolov <
>>>>> msokolov@gmail.com> wrote:
>>>>> > >> >>> > >
>>>>> > >> >>> > > Hi Mayya, thanks for persisting - I think we need to
>>>>> wrestle this to
>>>>> > >> >>> > > the ground for sure. In the test I ran, RAM buffer was
>>>>> the default
>>>>> > >> >>> > > checked in, which is weirdly: 1994MB. I did not
>>>>> specifically set heap
>>>>> > >> >>> > > size. I used maxConn/M=200. I'll try with larger buffer
>>>>> to see if I
>>>>> > >> >>> > > can get 9.4 to produce a single segment for the same test
>>>>> settings. I
>>>>> > >> >>> > > see you used a much smaller M (16), which should have
>>>>> produced quite
>>>>> > >> >>> > > small graphs, and I agree, should have been a single
>>>>> segment. Were you
>>>>> > >> >>> > > able to verify the number of segments?
>>>>> > >> >>> > >
>>>>> > >> >>> > > Agree that decrease in recall is not expected when more
>>>>> segments are produced.
>>>>> > >> >>> > >
>>>>> > >> >>> > > On Mon, Sep 12, 2022 at 1:51 PM Mayya Sharipova
>>>>> > >> >>> > > <mayya.sharipova@elastic.co.invalid> wrote:
>>>>> > >> >>> > > >
>>>>> > >> >>> > > > Hello Michael,
>>>>> > >> >>> > > > Thanks for checking.
>>>>> > >> >>> > > > Sorry for bringing this up again.
>>>>> > >> >>> > > > First of all, I am ok with proceeding with the Lucene
>>>>> 9.4 release and leaving the performance investigations for later.
>>>>> > >> >>> > > >
>>>>> > >> >>> > > > I am interested in what's the maxConn/M value you used
>>>>> for your tests? What was the heap memory and the size of the RAM buffer for
>>>>> indexing?
>>>>> > >> >>> > > > Usually, when we have multiple segments, recall should
>>>>> increase, not decrease. But I agree that with multiple segments we can see
>>>>> a big drop in QPS.
>>>>> > >> >>> > > >
>>>>> > >> >>> > > > Here is my investigation with detailed output of the
>>>>> performance difference between 9.3 and 9.4 releases. In my tests I used a
>>>>> large indexing buffer (2Gb) and large heap (5Gb) to end up with a single
>>>>> segment for both 9.3 and 9.4 tests, but still see a big drop in QPS in 9.4.
>>>>> > >> >>> > > >
>>>>> > >> >>> > > > Thank you.
>>>>> > >> >>> > > >
>>>>> > >> >>> > > >
>>>>> > >> >>> > > >
>>>>> > >> >>> > > >
>>>>> > >> >>> > > >
>>>>> > >> >>> > > > On Fri, Sep 9, 2022 at 12:21 PM Alan Woodward <
>>>>> romseygeek@gmail.com> wrote:
>>>>> > >> >>> > > >>
>>>>> > >> >>> > > >> Done. Thanks!
>>>>> > >> >>> > > >>
>>>>> > >> >>> > > >> > On 9 Sep 2022, at 16:32, Michael Sokolov <
>>>>> msokolov@gmail.com> wrote:
>>>>> > >> >>> > > >> >
>>>>> > >> >>> > > >> > Hi Alan - I checked out the interval queries patch;
>>>>> seems pretty safe,
>>>>> > >> >>> > > >> > please go ahead and port to 9.4. Thanks!
>>>>> > >> >>> > > >> >
>>>>> > >> >>> > > >> > Mike
>>>>> > >> >>> > > >> >
>>>>> > >> >>> > > >> > On Fri, Sep 9, 2022 at 10:41 AM Alan Woodward <
>>>>> romseygeek@gmail.com> wrote:
>>>>> > >> >>> > > >> >>
>>>>> > >> >>> > > >> >> Hi Mike,
>>>>> > >> >>> > > >> >>
>>>>> > >> >>> > > >> >> I’ve opened
>>>>> https://github.com/apache/lucene/pull/11760 as a small bug fix PR for
>>>>> a problem with interval queries. Am I OK to port this to the 9.4 branch?
>>>>> > >> >>> > > >> >>
>>>>> > >> >>> > > >> >> Thanks, Alan
>>>>> > >> >>> > > >> >>
>>>>> > >> >>> > > >> >> On 2 Sep 2022, at 20:42, Michael Sokolov <
>>>>> msokolov@gmail.com> wrote:
>>>>> > >> >>> > > >> >>
>>>>> > >> >>> > > >> >> NOTICE:
>>>>> > >> >>> > > >> >>
>>>>> > >> >>> > > >> >> Branch branch_9_4 has been cut and versions updated
>>>>> to 9.5 on stable branch.
>>>>> > >> >>> > > >> >>
>>>>> > >> >>> > > >> >> Please observe the normal rules:
>>>>> > >> >>> > > >> >>
>>>>> > >> >>> > > >> >> * No new features may be committed to the branch.
>>>>> > >> >>> > > >> >> * Documentation patches, build patches and serious
>>>>> bug fixes may be
>>>>> > >> >>> > > >> >> committed to the branch. However, you should submit
>>>>> all patches you
>>>>> > >> >>> > > >> >> want to commit to Jira first to give others the
>>>>> chance to review
>>>>> > >> >>> > > >> >> and possibly vote against the patch. Keep in mind
>>>>> that it is our
>>>>> > >> >>> > > >> >> main intention to keep the branch as stable as
>>>>> possible.
>>>>> > >> >>> > > >> >> * All patches that are intended for the branch
>>>>> should first be committed
>>>>> > >> >>> > > >> >> to the unstable branch, merged into the stable
>>>>> branch, and then into
>>>>> > >> >>> > > >> >> the current release branch.
>>>>> > >> >>> > > >> >> * Normal unstable and stable branch development may
>>>>> continue as usual.
>>>>> > >> >>> > > >> >> However, if you plan to commit a big change to the
>>>>> unstable branch
>>>>> > >> >>> > > >> >> while the branch feature freeze is in effect, think
>>>>> twice: can't the
>>>>> > >> >>> > > >> >> addition wait a couple more days? Merges of bug
>>>>> fixes into the branch
>>>>> > >> >>> > > >> >> may become more difficult.
>>>>> > >> >>> > > >> >> * Only Jira issues with Fix version 9.4 and
>>>>> priority "Blocker" will delay
>>>>> > >> >>> > > >> >> a release candidate build.
>>>>> > >> >>> > > >> >>
>>>>> > >> >>> > > >> >>
>>>>> ---------------------------------------------------------------------
>>>>> > >> >>> > > >> >> To unsubscribe, e-mail:
>>>>> dev-unsubscribe@lucene.apache.org
>>>>> > >> >>> > > >> >> For additional commands, e-mail:
>>>>> dev-help@lucene.apache.org
>>>>> > >> >>> > > >> >>
>>>>> > >> >>> > > >> >>
>>>>> > >> >>> > > >> >
>>>>> > >> >>> > > >> >
>>>>> ---------------------------------------------------------------------
>>>>> > >> >>> > > >> > To unsubscribe, e-mail:
>>>>> dev-unsubscribe@lucene.apache.org
>>>>> > >> >>> > > >> > For additional commands, e-mail:
>>>>> dev-help@lucene.apache.org
>>>>> > >> >>> > > >> >
>>>>> > >> >>> > > >>
>>>>> > >> >>> > > >>
>>>>> > >> >>> > > >>
>>>>> ---------------------------------------------------------------------
>>>>> > >> >>> > > >> To unsubscribe, e-mail:
>>>>> dev-unsubscribe@lucene.apache.org
>>>>> > >> >>> > > >> For additional commands, e-mail:
>>>>> dev-help@lucene.apache.org
>>>>> > >> >>> > > >>
>>>>> > >> >>>
>>>>> > >> >>>
>>>>> ---------------------------------------------------------------------
>>>>> > >> >>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>>> > >> >>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>> > >> >>>
>>>>> > >> >
>>>>> > >> >
>>>>> > >> > --
>>>>> > >> > Adrien
>>>>> > >>
>>>>> > >>
>>>>> ---------------------------------------------------------------------
>>>>> > >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>>> > >> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>> > >>
>>>>> > >
>>>>> > >
>>>>> > > --
>>>>> > > Adrien
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>>
>>>>>

Re: Subject: New branch and feature freeze for Lucene 9.4.0 [ In reply to ]

Sep 20, 2022, 3:24 AM

Post #29 of 37 (415 views)

Hi Mike,

If you have not started a RC yet, I'd like to include some small fixes for
bugs that were recently introduced in Lucene:
- https://github.com/apache/lucene/pull/11792
- https://github.com/apache/lucene/pull/11794

On Tue, Sep 20, 2022 at 1:26 AM Julie Tibshirani <julietibs@gmail.com>
wrote:

> Sorry for the confusion. To explain, I use a local ann-benchmarks set-up
> that makes use of KnnGraphTester. It is a bit hacky and I accidentally
> included the warm-ups in the final timings. So the change to warm-up
> explains why we saw different results in our tests. This is great
> motivation to solidify and publish my local ann-benchmarks set-up so that
> it's not so fragile!
>
> In summary, with your latest fix the recall and QPS look good to me -- I
> don't detect any regression between 9.3 and 9.4.
>
> Julie
>
> On Mon, Sep 19, 2022 at 3:45 PM Michael Sokolov <msokolov@gmail.com>
> wrote:
>
>> I'm confused, since warming should not be counted in the timings. Are you
>> saying that the recall was affected??
>>
>> On Mon, Sep 19, 2022 at 6:12 PM Julie Tibshirani <julietibs@gmail.com>
>> wrote:
>>
>>> Using the ann-benchmarks framework, I still saw a similar regression as
>>> Mayya between 9.3 and 9.4. I investigated and found it was due to
>>> "KnnGraphTester to use KnnVectorQuery" (
>>> https://github.com/apache/lucene/pull/796), specifically the change to
>>> the warm-up strategy. If I revert it, the results look exactly as expected.
>>>
>>> I guess we can keep an eye on the nightly benchmarks tomorrow to
>>> double-check there's no drop. It would also be nice to formalize the
>>> ann-benchmarks set-up and run it regularly (like we've discussed in
>>> https://github.com/apache/lucene/issues/10665).
>>>
>>> Julie
>>>
>>> On Mon, Sep 19, 2022 at 10:33 AM Michael Sokolov <msokolov@gmail.com>
>>> wrote:
>>>
>>>> Thanks for your speedy testing! I am observing comparable latencies
>>>> *when the index geometry (ie number of segments)* is unchanged. Agree we
>>>> can leave this for a later day. I'll proceed to cut 9.4 artifacts
>>>>
>>>> On Mon, Sep 19, 2022 at 11:02 AM Mayya Sharipova
>>>> <mayya.sharipova@elastic.co.invalid> wrote:
>>>>
>>>>> It would be great if you all are able to test again with
>>>>>> https://github.com/apache/lucene/pull/11781/ applied
>>>>>
>>>>>
>>>>>
>>>>> I ran the ann benchmarks with this change, and was happy to confirm
>>>>> that in my test recall with this PR is the same as in 9.3 branch, although
>>>>> QPS is lower, but we can investigate QPSs later.
>>>>>
>>>>> glove-100-angular M:16 efConstruction:100
>>>>> 9.3 recall9.3 QPSthis PR recallthis PR QPS
>>>>> n_cands=10 0.620 2745.933 0.620 1675.500
>>>>> n_cands=20 0.680 2288.665 0.680 1512.744
>>>>> n_cands=40 0.746 1770.243 0.746 1040.240
>>>>> n_cands=80 0.809 1226.738 0.809 695.236
>>>>> n_cands=120 0.843 948.908 0.843 525.914
>>>>> n_cands=200 0.878 671.781 0.878 351.529
>>>>> n_cands=400 0.918 392.265 0.918 207.854
>>>>> n_cands=600 0.937 282.403 0.937 144.311
>>>>> n_cands=800 0.949 214.620 0.949 116.875
>>>>>
>>>>> On Sun, Sep 18, 2022 at 6:25 PM Michael Sokolov <msokolov@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> OK, I think I was wrong about latency having increased due to a change
>>>>>> in KnnGraphTester -- I did some testing there and couldn't reproduce.
>>>>>> There does seem to be a slight vector search latency increase,
>>>>>> possibly noise, but maybe due to the branching introduced to check
>>>>>> whether to do byte vs float operations? It would be a little
>>>>>> surprising if that were the case given the small number of branchings
>>>>>> compared to the number of multiplies in dot-product though.
>>>>>>
>>>>>> On Sun, Sep 18, 2022 at 3:25 PM Michael Sokolov <msokolov@gmail.com>
>>>>>> wrote:
>>>>>> >
>>>>>> > Thanks for the deep-dive Julie. I was able to reproduce the changing
>>>>>> > recall. I had introduced some bugs in the diversity checks (that may
>>>>>> > have partially canceled each other out? it's hard to understand what
>>>>>> > was happening in the buggy case) and posted a fix today
>>>>>> > https://github.com/apache/lucene/pull/11781.
>>>>>> >
>>>>>> > There are a couple of other outstanding issues I found while doing a
>>>>>> > bunch of git bisecting;
>>>>>> >
>>>>>> > I think we might have introduced a (test-only) performance
>>>>>> regression
>>>>>> > in KnnGraphTester
>>>>>> >
>>>>>> > We may still be over-allocating the size of NeighborArray, leading
>>>>>> to
>>>>>> > excessive segmentation? I wonder if we could avoid dynamic
>>>>>> > re-allocation there, and simply initialize every neighbor array to
>>>>>> > 2*M+1.
>>>>>> >
>>>>>> > While I don't think these are necessarily blockers, given that we
>>>>>> are
>>>>>> > releasing HNSW improvements, it seems like we should address these,
>>>>>> > especially as the build-graph-on-index is one of the things we are
>>>>>> > releasing, and it is (may be?) impacted. I will see if I can put up
>>>>>> a
>>>>>> > patch or two.
>>>>>> >
>>>>>> > It would be great if you all are able to test again with
>>>>>> > https://github.com/apache/lucene/pull/11781/ applied
>>>>>> >
>>>>>> > -Mike
>>>>>> >
>>>>>> > On Fri, Sep 16, 2022 at 11:07 AM Adrien Grand <jpountz@gmail.com>
>>>>>> wrote:
>>>>>> > >
>>>>>> > > Thank you Mike, I just backported the change.
>>>>>> > >
>>>>>> > > On Thu, Sep 15, 2022 at 6:32 PM Michael Sokolov <
>>>>>> msokolov@gmail.com> wrote:
>>>>>> > >>
>>>>>> > >> it looks like a small bug fix, we have had on main (and 9.x?)
>>>>>> for a
>>>>>> > >> while now and no test failures showed up, I guess. Should be OK
>>>>>> to
>>>>>> > >> port. I plan to cut artifacts this weekend, or Monday at the
>>>>>> latest,
>>>>>> > >> but if you can do the backport today or tomorrow, that's fine by
>>>>>> me.
>>>>>> > >>
>>>>>> > >> On Thu, Sep 15, 2022 at 10:55 AM Adrien Grand <jpountz@gmail.com>
>>>>>> wrote:
>>>>>> > >> >
>>>>>> > >> > Mike, I'm tempted to backport
>>>>>> https://github.com/apache/lucene/pull/1068 to branch_9_4, which is a
>>>>>> bugfix that looks pretty safe to me. What do you think?
>>>>>> > >> >
>>>>>> > >> > On Tue, Sep 13, 2022 at 4:11 PM Mayya Sharipova <
>>>>>> mayya.sharipova@elastic.co.invalid> wrote:
>>>>>> > >> >>
>>>>>> > >> >> Thanks for running more tests, Michael.
>>>>>> > >> >> It is encouraging that you saw a similar performance between
>>>>>> 9.3 and 9.4. I will also run more tests with different parameters.
>>>>>> > >> >>
>>>>>> > >> >> On Tue, Sep 13, 2022 at 9:30 AM Michael Sokolov <
>>>>>> msokolov@gmail.com> wrote:
>>>>>> > >> >>>
>>>>>> > >> >>> As a follow-up, I ran a test using the same parameters as
>>>>>> above, only
>>>>>> > >> >>> changing M=200 to M=16. This did result in a single segment
>>>>>> in both
>>>>>> > >> >>> cases (9.3, 9.4) and the performance was pretty similar;
>>>>>> within noise
>>>>>> > >> >>> I think. The main difference I saw was that the 9.3 index
>>>>>> was written
>>>>>> > >> >>> using CFS:
>>>>>> > >> >>>
>>>>>> > >> >>> 9.4:
>>>>>> > >> >>> recall latency nDoc fanout maxConn beamWidth
>>>>>> visited index ms
>>>>>> > >> >>> 0.755 1.36 1000000 100 16 100 200
>>>>>> 891402 1.00
>>>>>> > >> >>> post-filter
>>>>>> > >> >>> -rw-r--r-- 1 sokolovm amazon 382M Sep 13 13:06
>>>>>> > >> >>> _0_Lucene94HnswVectorsFormat_0.vec
>>>>>> > >> >>> -rw-r--r-- 1 sokolovm amazon 262K Sep 13 13:06
>>>>>> > >> >>> _0_Lucene94HnswVectorsFormat_0.vem
>>>>>> > >> >>> -rw-r--r-- 1 sokolovm amazon 131M Sep 13 13:06
>>>>>> > >> >>> _0_Lucene94HnswVectorsFormat_0.vex
>>>>>> > >> >>>
>>>>>> > >> >>> 9.3:
>>>>>> > >> >>> recall latency nDoc fanout maxConn beamWidth
>>>>>> visited index ms
>>>>>> > >> >>> 0.775 1.34 1000000 100 16 100 4033
>>>>>> 977043
>>>>>> > >> >>> rw-r--r-- 1 sokolovm amazon 297 Sep 13 13:26 _0.cfe
>>>>>> > >> >>> -rw-r--r-- 1 sokolovm amazon 516M Sep 13 13:26 _0.cfs
>>>>>> > >> >>> -rw-r--r-- 1 sokolovm amazon 340 Sep 13 13:26 _0.si
>>>>>> > >> >>>
>>>>>> > >> >>> On Tue, Sep 13, 2022 at 8:50 AM Michael Sokolov <
>>>>>> msokolov@gmail.com> wrote:
>>>>>> > >> >>> >
>>>>>> > >> >>> > I ran another test. I thought I had increased the RAM
>>>>>> buffer size to
>>>>>> > >> >>> > 8G and heap to 16G. However I still see two segments in
>>>>>> the index that
>>>>>> > >> >>> > was created. And looking at the infostream I see:
>>>>>> > >> >>> >
>>>>>> > >> >>> > dir=MMapDirectory@
>>>>>> /local/home/sokolovm/workspace/knn-perf/glove-100-angular.hdf5-train-200-200.index
>>>>>> > >> >>> > lockFactory=org\
>>>>>> > >> >>> > .apache.lucene.store.NativeFSLockFactory@4466af20
>>>>>> > >> >>> > index=
>>>>>> > >> >>> > version=9.4.0
>>>>>> > >> >>> >
>>>>>> analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
>>>>>> > >> >>> > ramBufferSizeMB=8000.0
>>>>>> > >> >>> > maxBufferedDocs=-1
>>>>>> > >> >>> > ...
>>>>>> > >> >>> > perThreadHardLimitMB=1945
>>>>>> > >> >>> > ...
>>>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:53.329404950Z; main]: flush
>>>>>> postings as
>>>>>> > >> >>> > segment _6 numDocs=555373
>>>>>> > >> >>> > IW 0 [2022-09-13T02:42:53.330671171Z; main]: 0 msec to
>>>>>> write norms
>>>>>> > >> >>> > IW 0 [2022-09-13T02:42:53.331113184Z; main]: 0 msec to
>>>>>> write docValues
>>>>>> > >> >>> > IW 0 [2022-09-13T02:42:53.331320146Z; main]: 0 msec to
>>>>>> write points
>>>>>> > >> >>> > IW 0 [2022-09-13T02:42:56.424195657Z; main]: 3092 msec to
>>>>>> write vectors
>>>>>> > >> >>> > IW 0 [2022-09-13T02:42:56.429239944Z; main]: 4 msec to
>>>>>> finish stored fields
>>>>>> > >> >>> > IW 0 [2022-09-13T02:42:56.429593512Z; main]: 0 msec to
>>>>>> write postings
>>>>>> > >> >>> > and finish vectors
>>>>>> > >> >>> > IW 0 [2022-09-13T02:42:56.430309031Z; main]: 0 msec to
>>>>>> write fieldInfos
>>>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.431721622Z; main]: new segment
>>>>>> has 0 deleted docs
>>>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.431921144Z; main]: new segment
>>>>>> has 0
>>>>>> > >> >>> > soft-deleted docs
>>>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.435738086Z; main]: new segment
>>>>>> has no
>>>>>> > >> >>> > vectors; no norms; no docValues; no prox; freqs
>>>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.435952356Z; main]:
>>>>>> > >> >>> > flushedFiles=[_6_Lucene94HnswVectorsFormat_0.vec, _6.fdm,
>>>>>> _6.fdt, _6_\
>>>>>> > >> >>> > Lucene94HnswVectorsFormat_0.vem, _6.fnm, _6.fdx,
>>>>>> > >> >>> > _6_Lucene94HnswVectorsFormat_0.vex]
>>>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.436121861Z; main]: flushed
>>>>>> codec=Lucene94
>>>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.437691468Z; main]: flushed:
>>>>>> segment=_6
>>>>>> > >> >>> > ramUsed=1,945.002 MB newFlushedSize=1,065.701 MB \
>>>>>> > >> >>> > docs/MB=521.134
>>>>>> > >> >>> >
>>>>>> > >> >>> > so I think it's this perThreadHardLimit that is triggering
>>>>>> the
>>>>>> > >> >>> > flushes? TBH this isn't something I had seen before; but
>>>>>> the docs say:
>>>>>> > >> >>> >
>>>>>> > >> >>> > /**
>>>>>> > >> >>> > * Expert: Sets the maximum memory consumption per
>>>>>> thread triggering
>>>>>> > >> >>> > a forced flush if exceeded. A
>>>>>> > >> >>> > * {@link DocumentsWriterPerThread} is forcefully
>>>>>> flushed once it
>>>>>> > >> >>> > exceeds this limit even if the
>>>>>> > >> >>> > * {@link #getRAMBufferSizeMB()} has not been exceeded.
>>>>>> This is a
>>>>>> > >> >>> > safety limit to prevent a {@link
>>>>>> > >> >>> > * DocumentsWriterPerThread} from address space
>>>>>> exhaustion due to
>>>>>> > >> >>> > its internal 32 bit signed
>>>>>> > >> >>> > * integer based memory addressing. The given value must
>>>>>> be less
>>>>>> > >> >>> > that 2GB (2048MB)
>>>>>> > >> >>> > *
>>>>>> > >> >>> > * @see #DEFAULT_RAM_PER_THREAD_HARD_LIMIT_MB
>>>>>> > >> >>> > */
>>>>>> > >> >>> >
>>>>>> > >> >>> > On Mon, Sep 12, 2022 at 6:28 PM Michael Sokolov <
>>>>>> msokolov@gmail.com> wrote:
>>>>>> > >> >>> > >
>>>>>> > >> >>> > > Hi Mayya, thanks for persisting - I think we need to
>>>>>> wrestle this to
>>>>>> > >> >>> > > the ground for sure. In the test I ran, RAM buffer was
>>>>>> the default
>>>>>> > >> >>> > > checked in, which is weirdly: 1994MB. I did not
>>>>>> specifically set heap
>>>>>> > >> >>> > > size. I used maxConn/M=200. I'll try with larger buffer
>>>>>> to see if I
>>>>>> > >> >>> > > can get 9.4 to produce a single segment for the same
>>>>>> test settings. I
>>>>>> > >> >>> > > see you used a much smaller M (16), which should have
>>>>>> produced quite
>>>>>> > >> >>> > > small graphs, and I agree, should have been a single
>>>>>> segment. Were you
>>>>>> > >> >>> > > able to verify the number of segments?
>>>>>> > >> >>> > >
>>>>>> > >> >>> > > Agree that decrease in recall is not expected when more
>>>>>> segments are produced.
>>>>>> > >> >>> > >
>>>>>> > >> >>> > > On Mon, Sep 12, 2022 at 1:51 PM Mayya Sharipova
>>>>>> > >> >>> > > <mayya.sharipova@elastic.co.invalid> wrote:
>>>>>> > >> >>> > > >
>>>>>> > >> >>> > > > Hello Michael,
>>>>>> > >> >>> > > > Thanks for checking.
>>>>>> > >> >>> > > > Sorry for bringing this up again.
>>>>>> > >> >>> > > > First of all, I am ok with proceeding with the Lucene
>>>>>> 9.4 release and leaving the performance investigations for later.
>>>>>> > >> >>> > > >
>>>>>> > >> >>> > > > I am interested in what's the maxConn/M value you used
>>>>>> for your tests? What was the heap memory and the size of the RAM buffer for
>>>>>> indexing?
>>>>>> > >> >>> > > > Usually, when we have multiple segments, recall should
>>>>>> increase, not decrease. But I agree that with multiple segments we can see
>>>>>> a big drop in QPS.
>>>>>> > >> >>> > > >
>>>>>> > >> >>> > > > Here is my investigation with detailed output of the
>>>>>> performance difference between 9.3 and 9.4 releases. In my tests I used a
>>>>>> large indexing buffer (2Gb) and large heap (5Gb) to end up with a single
>>>>>> segment for both 9.3 and 9.4 tests, but still see a big drop in QPS in 9.4.
>>>>>> > >> >>> > > >
>>>>>> > >> >>> > > > Thank you.
>>>>>> > >> >>> > > >
>>>>>> > >> >>> > > >
>>>>>> > >> >>> > > >
>>>>>> > >> >>> > > >
>>>>>> > >> >>> > > >
>>>>>> > >> >>> > > > On Fri, Sep 9, 2022 at 12:21 PM Alan Woodward <
>>>>>> romseygeek@gmail.com> wrote:
>>>>>> > >> >>> > > >>
>>>>>> > >> >>> > > >> Done. Thanks!
>>>>>> > >> >>> > > >>
>>>>>> > >> >>> > > >> > On 9 Sep 2022, at 16:32, Michael Sokolov <
>>>>>> msokolov@gmail.com> wrote:
>>>>>> > >> >>> > > >> >
>>>>>> > >> >>> > > >> > Hi Alan - I checked out the interval queries patch;
>>>>>> seems pretty safe,
>>>>>> > >> >>> > > >> > please go ahead and port to 9.4. Thanks!
>>>>>> > >> >>> > > >> >
>>>>>> > >> >>> > > >> > Mike
>>>>>> > >> >>> > > >> >
>>>>>> > >> >>> > > >> > On Fri, Sep 9, 2022 at 10:41 AM Alan Woodward <
>>>>>> romseygeek@gmail.com> wrote:
>>>>>> > >> >>> > > >> >>
>>>>>> > >> >>> > > >> >> Hi Mike,
>>>>>> > >> >>> > > >> >>
>>>>>> > >> >>> > > >> >> I’ve opened
>>>>>> https://github.com/apache/lucene/pull/11760 as a small bug fix PR
>>>>>> for a problem with interval queries. Am I OK to port this to the 9.4
>>>>>> branch?
>>>>>> > >> >>> > > >> >>
>>>>>> > >> >>> > > >> >> Thanks, Alan
>>>>>> > >> >>> > > >> >>
>>>>>> > >> >>> > > >> >> On 2 Sep 2022, at 20:42, Michael Sokolov <
>>>>>> msokolov@gmail.com> wrote:
>>>>>> > >> >>> > > >> >>
>>>>>> > >> >>> > > >> >> NOTICE:
>>>>>> > >> >>> > > >> >>
>>>>>> > >> >>> > > >> >> Branch branch_9_4 has been cut and versions
>>>>>> updated to 9.5 on stable branch.
>>>>>> > >> >>> > > >> >>
>>>>>> > >> >>> > > >> >> Please observe the normal rules:
>>>>>> > >> >>> > > >> >>
>>>>>> > >> >>> > > >> >> * No new features may be committed to the branch.
>>>>>> > >> >>> > > >> >> * Documentation patches, build patches and serious
>>>>>> bug fixes may be
>>>>>> > >> >>> > > >> >> committed to the branch. However, you should
>>>>>> submit all patches you
>>>>>> > >> >>> > > >> >> want to commit to Jira first to give others the
>>>>>> chance to review
>>>>>> > >> >>> > > >> >> and possibly vote against the patch. Keep in mind
>>>>>> that it is our
>>>>>> > >> >>> > > >> >> main intention to keep the branch as stable as
>>>>>> possible.
>>>>>> > >> >>> > > >> >> * All patches that are intended for the branch
>>>>>> should first be committed
>>>>>> > >> >>> > > >> >> to the unstable branch, merged into the stable
>>>>>> branch, and then into
>>>>>> > >> >>> > > >> >> the current release branch.
>>>>>> > >> >>> > > >> >> * Normal unstable and stable branch development
>>>>>> may continue as usual.
>>>>>> > >> >>> > > >> >> However, if you plan to commit a big change to the
>>>>>> unstable branch
>>>>>> > >> >>> > > >> >> while the branch feature freeze is in effect,
>>>>>> think twice: can't the
>>>>>> > >> >>> > > >> >> addition wait a couple more days? Merges of bug
>>>>>> fixes into the branch
>>>>>> > >> >>> > > >> >> may become more difficult.
>>>>>> > >> >>> > > >> >> * Only Jira issues with Fix version 9.4 and
>>>>>> priority "Blocker" will delay
>>>>>> > >> >>> > > >> >> a release candidate build.
>>>>>> > >> >>> > > >> >>
>>>>>> > >> >>> > > >> >>
>>>>>> ---------------------------------------------------------------------
>>>>>> > >> >>> > > >> >> To unsubscribe, e-mail:
>>>>>> dev-unsubscribe@lucene.apache.org
>>>>>> > >> >>> > > >> >> For additional commands, e-mail:
>>>>>> dev-help@lucene.apache.org
>>>>>> > >> >>> > > >> >>
>>>>>> > >> >>> > > >> >>
>>>>>> > >> >>> > > >> >
>>>>>> > >> >>> > > >> >
>>>>>> ---------------------------------------------------------------------
>>>>>> > >> >>> > > >> > To unsubscribe, e-mail:
>>>>>> dev-unsubscribe@lucene.apache.org
>>>>>> > >> >>> > > >> > For additional commands, e-mail:
>>>>>> dev-help@lucene.apache.org
>>>>>> > >> >>> > > >> >
>>>>>> > >> >>> > > >>
>>>>>> > >> >>> > > >>
>>>>>> > >> >>> > > >>
>>>>>> ---------------------------------------------------------------------
>>>>>> > >> >>> > > >> To unsubscribe, e-mail:
>>>>>> dev-unsubscribe@lucene.apache.org
>>>>>> > >> >>> > > >> For additional commands, e-mail:
>>>>>> dev-help@lucene.apache.org
>>>>>> > >> >>> > > >>
>>>>>> > >> >>>
>>>>>> > >> >>>
>>>>>> ---------------------------------------------------------------------
>>>>>> > >> >>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>>>> > >> >>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>>> > >> >>>
>>>>>> > >> >
>>>>>> > >> >
>>>>>> > >> > --
>>>>>> > >> > Adrien
>>>>>> > >>
>>>>>> > >>
>>>>>> ---------------------------------------------------------------------
>>>>>> > >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>>>> > >> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>>> > >>
>>>>>> > >
>>>>>> > >
>>>>>> > > --
>>>>>> > > Adrien
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>>>
>>>>>>

--
Adrien

Re: Subject: New branch and feature freeze for Lucene 9.4.0 [ In reply to ]

Sep 20, 2022, 4:30 AM

Post #30 of 37 (415 views)

well, I did start, optimistically, but I think I need to re-spin to include
a fix for this test failure that has been popping up, so I will pull these
in too.

On Tue, Sep 20, 2022 at 6:24 AM Adrien Grand <jpountz@gmail.com> wrote:

> Hi Mike,
>
> If you have not started a RC yet, I'd like to include some small fixes for
> bugs that were recently introduced in Lucene:
> - https://github.com/apache/lucene/pull/11792
> - https://github.com/apache/lucene/pull/11794
>
> On Tue, Sep 20, 2022 at 1:26 AM Julie Tibshirani <julietibs@gmail.com>
> wrote:
>
>> Sorry for the confusion. To explain, I use a local ann-benchmarks set-up
>> that makes use of KnnGraphTester. It is a bit hacky and I accidentally
>> included the warm-ups in the final timings. So the change to warm-up
>> explains why we saw different results in our tests. This is great
>> motivation to solidify and publish my local ann-benchmarks set-up so that
>> it's not so fragile!
>>
>> In summary, with your latest fix the recall and QPS look good to me -- I
>> don't detect any regression between 9.3 and 9.4.
>>
>> Julie
>>
>> On Mon, Sep 19, 2022 at 3:45 PM Michael Sokolov <msokolov@gmail.com>
>> wrote:
>>
>>> I'm confused, since warming should not be counted in the timings. Are
>>> you saying that the recall was affected??
>>>
>>> On Mon, Sep 19, 2022 at 6:12 PM Julie Tibshirani <julietibs@gmail.com>
>>> wrote:
>>>
>>>> Using the ann-benchmarks framework, I still saw a similar regression as
>>>> Mayya between 9.3 and 9.4. I investigated and found it was due to
>>>> "KnnGraphTester to use KnnVectorQuery" (
>>>> https://github.com/apache/lucene/pull/796), specifically the change to
>>>> the warm-up strategy. If I revert it, the results look exactly as expected.
>>>>
>>>> I guess we can keep an eye on the nightly benchmarks tomorrow to
>>>> double-check there's no drop. It would also be nice to formalize the
>>>> ann-benchmarks set-up and run it regularly (like we've discussed in
>>>> https://github.com/apache/lucene/issues/10665).
>>>>
>>>> Julie
>>>>
>>>> On Mon, Sep 19, 2022 at 10:33 AM Michael Sokolov <msokolov@gmail.com>
>>>> wrote:
>>>>
>>>>> Thanks for your speedy testing! I am observing comparable latencies
>>>>> *when the index geometry (ie number of segments)* is unchanged. Agree we
>>>>> can leave this for a later day. I'll proceed to cut 9.4 artifacts
>>>>>
>>>>> On Mon, Sep 19, 2022 at 11:02 AM Mayya Sharipova
>>>>> <mayya.sharipova@elastic.co.invalid> wrote:
>>>>>
>>>>>> It would be great if you all are able to test again with
>>>>>>> https://github.com/apache/lucene/pull/11781/ applied
>>>>>>
>>>>>>
>>>>>>
>>>>>> I ran the ann benchmarks with this change, and was happy to confirm
>>>>>> that in my test recall with this PR is the same as in 9.3 branch, although
>>>>>> QPS is lower, but we can investigate QPSs later.
>>>>>>
>>>>>> glove-100-angular M:16 efConstruction:100
>>>>>> 9.3 recall9.3 QPSthis PR recallthis PR QPS
>>>>>> n_cands=10 0.620 2745.933 0.620 1675.500
>>>>>> n_cands=20 0.680 2288.665 0.680 1512.744
>>>>>> n_cands=40 0.746 1770.243 0.746 1040.240
>>>>>> n_cands=80 0.809 1226.738 0.809 695.236
>>>>>> n_cands=120 0.843 948.908 0.843 525.914
>>>>>> n_cands=200 0.878 671.781 0.878 351.529
>>>>>> n_cands=400 0.918 392.265 0.918 207.854
>>>>>> n_cands=600 0.937 282.403 0.937 144.311
>>>>>> n_cands=800 0.949 214.620 0.949 116.875
>>>>>>
>>>>>> On Sun, Sep 18, 2022 at 6:25 PM Michael Sokolov <msokolov@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> OK, I think I was wrong about latency having increased due to a
>>>>>>> change
>>>>>>> in KnnGraphTester -- I did some testing there and couldn't reproduce.
>>>>>>> There does seem to be a slight vector search latency increase,
>>>>>>> possibly noise, but maybe due to the branching introduced to check
>>>>>>> whether to do byte vs float operations? It would be a little
>>>>>>> surprising if that were the case given the small number of branchings
>>>>>>> compared to the number of multiplies in dot-product though.
>>>>>>>
>>>>>>> On Sun, Sep 18, 2022 at 3:25 PM Michael Sokolov <msokolov@gmail.com>
>>>>>>> wrote:
>>>>>>> >
>>>>>>> > Thanks for the deep-dive Julie. I was able to reproduce the
>>>>>>> changing
>>>>>>> > recall. I had introduced some bugs in the diversity checks (that
>>>>>>> may
>>>>>>> > have partially canceled each other out? it's hard to understand
>>>>>>> what
>>>>>>> > was happening in the buggy case) and posted a fix today
>>>>>>> > https://github.com/apache/lucene/pull/11781.
>>>>>>> >
>>>>>>> > There are a couple of other outstanding issues I found while doing
>>>>>>> a
>>>>>>> > bunch of git bisecting;
>>>>>>> >
>>>>>>> > I think we might have introduced a (test-only) performance
>>>>>>> regression
>>>>>>> > in KnnGraphTester
>>>>>>> >
>>>>>>> > We may still be over-allocating the size of NeighborArray, leading
>>>>>>> to
>>>>>>> > excessive segmentation? I wonder if we could avoid dynamic
>>>>>>> > re-allocation there, and simply initialize every neighbor array to
>>>>>>> > 2*M+1.
>>>>>>> >
>>>>>>> > While I don't think these are necessarily blockers, given that we
>>>>>>> are
>>>>>>> > releasing HNSW improvements, it seems like we should address these,
>>>>>>> > especially as the build-graph-on-index is one of the things we are
>>>>>>> > releasing, and it is (may be?) impacted. I will see if I can put
>>>>>>> up a
>>>>>>> > patch or two.
>>>>>>> >
>>>>>>> > It would be great if you all are able to test again with
>>>>>>> > https://github.com/apache/lucene/pull/11781/ applied
>>>>>>> >
>>>>>>> > -Mike
>>>>>>> >
>>>>>>> > On Fri, Sep 16, 2022 at 11:07 AM Adrien Grand <jpountz@gmail.com>
>>>>>>> wrote:
>>>>>>> > >
>>>>>>> > > Thank you Mike, I just backported the change.
>>>>>>> > >
>>>>>>> > > On Thu, Sep 15, 2022 at 6:32 PM Michael Sokolov <
>>>>>>> msokolov@gmail.com> wrote:
>>>>>>> > >>
>>>>>>> > >> it looks like a small bug fix, we have had on main (and 9.x?)
>>>>>>> for a
>>>>>>> > >> while now and no test failures showed up, I guess. Should be OK
>>>>>>> to
>>>>>>> > >> port. I plan to cut artifacts this weekend, or Monday at the
>>>>>>> latest,
>>>>>>> > >> but if you can do the backport today or tomorrow, that's fine
>>>>>>> by me.
>>>>>>> > >>
>>>>>>> > >> On Thu, Sep 15, 2022 at 10:55 AM Adrien Grand <
>>>>>>> jpountz@gmail.com> wrote:
>>>>>>> > >> >
>>>>>>> > >> > Mike, I'm tempted to backport
>>>>>>> https://github.com/apache/lucene/pull/1068 to branch_9_4, which is
>>>>>>> a bugfix that looks pretty safe to me. What do you think?
>>>>>>> > >> >
>>>>>>> > >> > On Tue, Sep 13, 2022 at 4:11 PM Mayya Sharipova <
>>>>>>> mayya.sharipova@elastic.co.invalid> wrote:
>>>>>>> > >> >>
>>>>>>> > >> >> Thanks for running more tests, Michael.
>>>>>>> > >> >> It is encouraging that you saw a similar performance between
>>>>>>> 9.3 and 9.4. I will also run more tests with different parameters.
>>>>>>> > >> >>
>>>>>>> > >> >> On Tue, Sep 13, 2022 at 9:30 AM Michael Sokolov <
>>>>>>> msokolov@gmail.com> wrote:
>>>>>>> > >> >>>
>>>>>>> > >> >>> As a follow-up, I ran a test using the same parameters as
>>>>>>> above, only
>>>>>>> > >> >>> changing M=200 to M=16. This did result in a single segment
>>>>>>> in both
>>>>>>> > >> >>> cases (9.3, 9.4) and the performance was pretty similar;
>>>>>>> within noise
>>>>>>> > >> >>> I think. The main difference I saw was that the 9.3 index
>>>>>>> was written
>>>>>>> > >> >>> using CFS:
>>>>>>> > >> >>>
>>>>>>> > >> >>> 9.4:
>>>>>>> > >> >>> recall latency nDoc fanout maxConn beamWidth
>>>>>>> visited index ms
>>>>>>> > >> >>> 0.755 1.36 1000000 100 16 100 200
>>>>>>> 891402 1.00
>>>>>>> > >> >>> post-filter
>>>>>>> > >> >>> -rw-r--r-- 1 sokolovm amazon 382M Sep 13 13:06
>>>>>>> > >> >>> _0_Lucene94HnswVectorsFormat_0.vec
>>>>>>> > >> >>> -rw-r--r-- 1 sokolovm amazon 262K Sep 13 13:06
>>>>>>> > >> >>> _0_Lucene94HnswVectorsFormat_0.vem
>>>>>>> > >> >>> -rw-r--r-- 1 sokolovm amazon 131M Sep 13 13:06
>>>>>>> > >> >>> _0_Lucene94HnswVectorsFormat_0.vex
>>>>>>> > >> >>>
>>>>>>> > >> >>> 9.3:
>>>>>>> > >> >>> recall latency nDoc fanout maxConn beamWidth
>>>>>>> visited index ms
>>>>>>> > >> >>> 0.775 1.34 1000000 100 16 100 4033
>>>>>>> 977043
>>>>>>> > >> >>> rw-r--r-- 1 sokolovm amazon 297 Sep 13 13:26 _0.cfe
>>>>>>> > >> >>> -rw-r--r-- 1 sokolovm amazon 516M Sep 13 13:26 _0.cfs
>>>>>>> > >> >>> -rw-r--r-- 1 sokolovm amazon 340 Sep 13 13:26 _0.si
>>>>>>> > >> >>>
>>>>>>> > >> >>> On Tue, Sep 13, 2022 at 8:50 AM Michael Sokolov <
>>>>>>> msokolov@gmail.com> wrote:
>>>>>>> > >> >>> >
>>>>>>> > >> >>> > I ran another test. I thought I had increased the RAM
>>>>>>> buffer size to
>>>>>>> > >> >>> > 8G and heap to 16G. However I still see two segments in
>>>>>>> the index that
>>>>>>> > >> >>> > was created. And looking at the infostream I see:
>>>>>>> > >> >>> >
>>>>>>> > >> >>> > dir=MMapDirectory@
>>>>>>> /local/home/sokolovm/workspace/knn-perf/glove-100-angular.hdf5-train-200-200.index
>>>>>>> > >> >>> > lockFactory=org\
>>>>>>> > >> >>> > .apache.lucene.store.NativeFSLockFactory@4466af20
>>>>>>> > >> >>> > index=
>>>>>>> > >> >>> > version=9.4.0
>>>>>>> > >> >>> >
>>>>>>> analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
>>>>>>> > >> >>> > ramBufferSizeMB=8000.0
>>>>>>> > >> >>> > maxBufferedDocs=-1
>>>>>>> > >> >>> > ...
>>>>>>> > >> >>> > perThreadHardLimitMB=1945
>>>>>>> > >> >>> > ...
>>>>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:53.329404950Z; main]: flush
>>>>>>> postings as
>>>>>>> > >> >>> > segment _6 numDocs=555373
>>>>>>> > >> >>> > IW 0 [2022-09-13T02:42:53.330671171Z; main]: 0 msec to
>>>>>>> write norms
>>>>>>> > >> >>> > IW 0 [2022-09-13T02:42:53.331113184Z; main]: 0 msec to
>>>>>>> write docValues
>>>>>>> > >> >>> > IW 0 [2022-09-13T02:42:53.331320146Z; main]: 0 msec to
>>>>>>> write points
>>>>>>> > >> >>> > IW 0 [2022-09-13T02:42:56.424195657Z; main]: 3092 msec to
>>>>>>> write vectors
>>>>>>> > >> >>> > IW 0 [2022-09-13T02:42:56.429239944Z; main]: 4 msec to
>>>>>>> finish stored fields
>>>>>>> > >> >>> > IW 0 [2022-09-13T02:42:56.429593512Z; main]: 0 msec to
>>>>>>> write postings
>>>>>>> > >> >>> > and finish vectors
>>>>>>> > >> >>> > IW 0 [2022-09-13T02:42:56.430309031Z; main]: 0 msec to
>>>>>>> write fieldInfos
>>>>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.431721622Z; main]: new
>>>>>>> segment has 0 deleted docs
>>>>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.431921144Z; main]: new
>>>>>>> segment has 0
>>>>>>> > >> >>> > soft-deleted docs
>>>>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.435738086Z; main]: new
>>>>>>> segment has no
>>>>>>> > >> >>> > vectors; no norms; no docValues; no prox; freqs
>>>>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.435952356Z; main]:
>>>>>>> > >> >>> > flushedFiles=[_6_Lucene94HnswVectorsFormat_0.vec, _6.fdm,
>>>>>>> _6.fdt, _6_\
>>>>>>> > >> >>> > Lucene94HnswVectorsFormat_0.vem, _6.fnm, _6.fdx,
>>>>>>> > >> >>> > _6_Lucene94HnswVectorsFormat_0.vex]
>>>>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.436121861Z; main]: flushed
>>>>>>> codec=Lucene94
>>>>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.437691468Z; main]: flushed:
>>>>>>> segment=_6
>>>>>>> > >> >>> > ramUsed=1,945.002 MB newFlushedSize=1,065.701 MB \
>>>>>>> > >> >>> > docs/MB=521.134
>>>>>>> > >> >>> >
>>>>>>> > >> >>> > so I think it's this perThreadHardLimit that is
>>>>>>> triggering the
>>>>>>> > >> >>> > flushes? TBH this isn't something I had seen before; but
>>>>>>> the docs say:
>>>>>>> > >> >>> >
>>>>>>> > >> >>> > /**
>>>>>>> > >> >>> > * Expert: Sets the maximum memory consumption per
>>>>>>> thread triggering
>>>>>>> > >> >>> > a forced flush if exceeded. A
>>>>>>> > >> >>> > * {@link DocumentsWriterPerThread} is forcefully
>>>>>>> flushed once it
>>>>>>> > >> >>> > exceeds this limit even if the
>>>>>>> > >> >>> > * {@link #getRAMBufferSizeMB()} has not been exceeded.
>>>>>>> This is a
>>>>>>> > >> >>> > safety limit to prevent a {@link
>>>>>>> > >> >>> > * DocumentsWriterPerThread} from address space
>>>>>>> exhaustion due to
>>>>>>> > >> >>> > its internal 32 bit signed
>>>>>>> > >> >>> > * integer based memory addressing. The given value
>>>>>>> must be less
>>>>>>> > >> >>> > that 2GB (2048MB)
>>>>>>> > >> >>> > *
>>>>>>> > >> >>> > * @see #DEFAULT_RAM_PER_THREAD_HARD_LIMIT_MB
>>>>>>> > >> >>> > */
>>>>>>> > >> >>> >
>>>>>>> > >> >>> > On Mon, Sep 12, 2022 at 6:28 PM Michael Sokolov <
>>>>>>> msokolov@gmail.com> wrote:
>>>>>>> > >> >>> > >
>>>>>>> > >> >>> > > Hi Mayya, thanks for persisting - I think we need to
>>>>>>> wrestle this to
>>>>>>> > >> >>> > > the ground for sure. In the test I ran, RAM buffer was
>>>>>>> the default
>>>>>>> > >> >>> > > checked in, which is weirdly: 1994MB. I did not
>>>>>>> specifically set heap
>>>>>>> > >> >>> > > size. I used maxConn/M=200. I'll try with larger
>>>>>>> buffer to see if I
>>>>>>> > >> >>> > > can get 9.4 to produce a single segment for the same
>>>>>>> test settings. I
>>>>>>> > >> >>> > > see you used a much smaller M (16), which should have
>>>>>>> produced quite
>>>>>>> > >> >>> > > small graphs, and I agree, should have been a single
>>>>>>> segment. Were you
>>>>>>> > >> >>> > > able to verify the number of segments?
>>>>>>> > >> >>> > >
>>>>>>> > >> >>> > > Agree that decrease in recall is not expected when more
>>>>>>> segments are produced.
>>>>>>> > >> >>> > >
>>>>>>> > >> >>> > > On Mon, Sep 12, 2022 at 1:51 PM Mayya Sharipova
>>>>>>> > >> >>> > > <mayya.sharipova@elastic.co.invalid> wrote:
>>>>>>> > >> >>> > > >
>>>>>>> > >> >>> > > > Hello Michael,
>>>>>>> > >> >>> > > > Thanks for checking.
>>>>>>> > >> >>> > > > Sorry for bringing this up again.
>>>>>>> > >> >>> > > > First of all, I am ok with proceeding with the Lucene
>>>>>>> 9.4 release and leaving the performance investigations for later.
>>>>>>> > >> >>> > > >
>>>>>>> > >> >>> > > > I am interested in what's the maxConn/M value you
>>>>>>> used for your tests? What was the heap memory and the size of the RAM
>>>>>>> buffer for indexing?
>>>>>>> > >> >>> > > > Usually, when we have multiple segments, recall
>>>>>>> should increase, not decrease. But I agree that with multiple segments we
>>>>>>> can see a big drop in QPS.
>>>>>>> > >> >>> > > >
>>>>>>> > >> >>> > > > Here is my investigation with detailed output of the
>>>>>>> performance difference between 9.3 and 9.4 releases. In my tests I used a
>>>>>>> large indexing buffer (2Gb) and large heap (5Gb) to end up with a single
>>>>>>> segment for both 9.3 and 9.4 tests, but still see a big drop in QPS in 9.4.
>>>>>>> > >> >>> > > >
>>>>>>> > >> >>> > > > Thank you.
>>>>>>> > >> >>> > > >
>>>>>>> > >> >>> > > >
>>>>>>> > >> >>> > > >
>>>>>>> > >> >>> > > >
>>>>>>> > >> >>> > > >
>>>>>>> > >> >>> > > > On Fri, Sep 9, 2022 at 12:21 PM Alan Woodward <
>>>>>>> romseygeek@gmail.com> wrote:
>>>>>>> > >> >>> > > >>
>>>>>>> > >> >>> > > >> Done. Thanks!
>>>>>>> > >> >>> > > >>
>>>>>>> > >> >>> > > >> > On 9 Sep 2022, at 16:32, Michael Sokolov <
>>>>>>> msokolov@gmail.com> wrote:
>>>>>>> > >> >>> > > >> >
>>>>>>> > >> >>> > > >> > Hi Alan - I checked out the interval queries
>>>>>>> patch; seems pretty safe,
>>>>>>> > >> >>> > > >> > please go ahead and port to 9.4. Thanks!
>>>>>>> > >> >>> > > >> >
>>>>>>> > >> >>> > > >> > Mike
>>>>>>> > >> >>> > > >> >
>>>>>>> > >> >>> > > >> > On Fri, Sep 9, 2022 at 10:41 AM Alan Woodward <
>>>>>>> romseygeek@gmail.com> wrote:
>>>>>>> > >> >>> > > >> >>
>>>>>>> > >> >>> > > >> >> Hi Mike,
>>>>>>> > >> >>> > > >> >>
>>>>>>> > >> >>> > > >> >> I’ve opened
>>>>>>> https://github.com/apache/lucene/pull/11760 as a small bug fix PR
>>>>>>> for a problem with interval queries. Am I OK to port this to the 9.4
>>>>>>> branch?
>>>>>>> > >> >>> > > >> >>
>>>>>>> > >> >>> > > >> >> Thanks, Alan
>>>>>>> > >> >>> > > >> >>
>>>>>>> > >> >>> > > >> >> On 2 Sep 2022, at 20:42, Michael Sokolov <
>>>>>>> msokolov@gmail.com> wrote:
>>>>>>> > >> >>> > > >> >>
>>>>>>> > >> >>> > > >> >> NOTICE:
>>>>>>> > >> >>> > > >> >>
>>>>>>> > >> >>> > > >> >> Branch branch_9_4 has been cut and versions
>>>>>>> updated to 9.5 on stable branch.
>>>>>>> > >> >>> > > >> >>
>>>>>>> > >> >>> > > >> >> Please observe the normal rules:
>>>>>>> > >> >>> > > >> >>
>>>>>>> > >> >>> > > >> >> * No new features may be committed to the branch.
>>>>>>> > >> >>> > > >> >> * Documentation patches, build patches and
>>>>>>> serious bug fixes may be
>>>>>>> > >> >>> > > >> >> committed to the branch. However, you should
>>>>>>> submit all patches you
>>>>>>> > >> >>> > > >> >> want to commit to Jira first to give others the
>>>>>>> chance to review
>>>>>>> > >> >>> > > >> >> and possibly vote against the patch. Keep in mind
>>>>>>> that it is our
>>>>>>> > >> >>> > > >> >> main intention to keep the branch as stable as
>>>>>>> possible.
>>>>>>> > >> >>> > > >> >> * All patches that are intended for the branch
>>>>>>> should first be committed
>>>>>>> > >> >>> > > >> >> to the unstable branch, merged into the stable
>>>>>>> branch, and then into
>>>>>>> > >> >>> > > >> >> the current release branch.
>>>>>>> > >> >>> > > >> >> * Normal unstable and stable branch development
>>>>>>> may continue as usual.
>>>>>>> > >> >>> > > >> >> However, if you plan to commit a big change to
>>>>>>> the unstable branch
>>>>>>> > >> >>> > > >> >> while the branch feature freeze is in effect,
>>>>>>> think twice: can't the
>>>>>>> > >> >>> > > >> >> addition wait a couple more days? Merges of bug
>>>>>>> fixes into the branch
>>>>>>> > >> >>> > > >> >> may become more difficult.
>>>>>>> > >> >>> > > >> >> * Only Jira issues with Fix version 9.4 and
>>>>>>> priority "Blocker" will delay
>>>>>>> > >> >>> > > >> >> a release candidate build.
>>>>>>> > >> >>> > > >> >>
>>>>>>> > >> >>> > > >> >>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> > >> >>> > > >> >> To unsubscribe, e-mail:
>>>>>>> dev-unsubscribe@lucene.apache.org
>>>>>>> > >> >>> > > >> >> For additional commands, e-mail:
>>>>>>> dev-help@lucene.apache.org
>>>>>>> > >> >>> > > >> >>
>>>>>>> > >> >>> > > >> >>
>>>>>>> > >> >>> > > >> >
>>>>>>> > >> >>> > > >> >
>>>>>>> ---------------------------------------------------------------------
>>>>>>> > >> >>> > > >> > To unsubscribe, e-mail:
>>>>>>> dev-unsubscribe@lucene.apache.org
>>>>>>> > >> >>> > > >> > For additional commands, e-mail:
>>>>>>> dev-help@lucene.apache.org
>>>>>>> > >> >>> > > >> >
>>>>>>> > >> >>> > > >>
>>>>>>> > >> >>> > > >>
>>>>>>> > >> >>> > > >>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> > >> >>> > > >> To unsubscribe, e-mail:
>>>>>>> dev-unsubscribe@lucene.apache.org
>>>>>>> > >> >>> > > >> For additional commands, e-mail:
>>>>>>> dev-help@lucene.apache.org
>>>>>>> > >> >>> > > >>
>>>>>>> > >> >>>
>>>>>>> > >> >>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> > >> >>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>>>>> > >> >>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>>>> > >> >>>
>>>>>>> > >> >
>>>>>>> > >> >
>>>>>>> > >> > --
>>>>>>> > >> > Adrien
>>>>>>> > >>
>>>>>>> > >>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> > >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>>>>> > >> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>>>> > >>
>>>>>>> > >
>>>>>>> > >
>>>>>>> > > --
>>>>>>> > > Adrien
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>>>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>>>>
>>>>>>>
>
> --
> Adrien
>

Re: Subject: New branch and feature freeze for Lucene 9.4.0 [ In reply to ]

Sep 20, 2022, 6:58 AM

Post #31 of 37 (415 views)

Both changes are on branch_9_4 now.

On Tue, Sep 20, 2022 at 1:31 PM Michael Sokolov <msokolov@gmail.com> wrote:

> well, I did start, optimistically, but I think I need to re-spin to
> include a fix for this test failure that has been popping up, so I will
> pull these in too.
>
> On Tue, Sep 20, 2022 at 6:24 AM Adrien Grand <jpountz@gmail.com> wrote:
>
>> Hi Mike,
>>
>> If you have not started a RC yet, I'd like to include some small fixes
>> for bugs that were recently introduced in Lucene:
>> - https://github.com/apache/lucene/pull/11792
>> - https://github.com/apache/lucene/pull/11794
>>
>> On Tue, Sep 20, 2022 at 1:26 AM Julie Tibshirani <julietibs@gmail.com>
>> wrote:
>>
>>> Sorry for the confusion. To explain, I use a local ann-benchmarks set-up
>>> that makes use of KnnGraphTester. It is a bit hacky and I accidentally
>>> included the warm-ups in the final timings. So the change to warm-up
>>> explains why we saw different results in our tests. This is great
>>> motivation to solidify and publish my local ann-benchmarks set-up so that
>>> it's not so fragile!
>>>
>>> In summary, with your latest fix the recall and QPS look good to me -- I
>>> don't detect any regression between 9.3 and 9.4.
>>>
>>> Julie
>>>
>>> On Mon, Sep 19, 2022 at 3:45 PM Michael Sokolov <msokolov@gmail.com>
>>> wrote:
>>>
>>>> I'm confused, since warming should not be counted in the timings. Are
>>>> you saying that the recall was affected??
>>>>
>>>> On Mon, Sep 19, 2022 at 6:12 PM Julie Tibshirani <julietibs@gmail.com>
>>>> wrote:
>>>>
>>>>> Using the ann-benchmarks framework, I still saw a similar regression
>>>>> as Mayya between 9.3 and 9.4. I investigated and found it was due to
>>>>> "KnnGraphTester to use KnnVectorQuery" (
>>>>> https://github.com/apache/lucene/pull/796), specifically the change
>>>>> to the warm-up strategy. If I revert it, the results look exactly as
>>>>> expected.
>>>>>
>>>>> I guess we can keep an eye on the nightly benchmarks tomorrow to
>>>>> double-check there's no drop. It would also be nice to formalize the
>>>>> ann-benchmarks set-up and run it regularly (like we've discussed in
>>>>> https://github.com/apache/lucene/issues/10665).
>>>>>
>>>>> Julie
>>>>>
>>>>> On Mon, Sep 19, 2022 at 10:33 AM Michael Sokolov <msokolov@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Thanks for your speedy testing! I am observing comparable latencies
>>>>>> *when the index geometry (ie number of segments)* is unchanged. Agree we
>>>>>> can leave this for a later day. I'll proceed to cut 9.4 artifacts
>>>>>>
>>>>>> On Mon, Sep 19, 2022 at 11:02 AM Mayya Sharipova
>>>>>> <mayya.sharipova@elastic.co.invalid> wrote:
>>>>>>
>>>>>>> It would be great if you all are able to test again with
>>>>>>>> https://github.com/apache/lucene/pull/11781/ applied
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I ran the ann benchmarks with this change, and was happy to confirm
>>>>>>> that in my test recall with this PR is the same as in 9.3 branch, although
>>>>>>> QPS is lower, but we can investigate QPSs later.
>>>>>>>
>>>>>>> glove-100-angular M:16 efConstruction:100
>>>>>>> 9.3 recall9.3 QPSthis PR recallthis PR QPS
>>>>>>> n_cands=10 0.620 2745.933 0.620 1675.500
>>>>>>> n_cands=20 0.680 2288.665 0.680 1512.744
>>>>>>> n_cands=40 0.746 1770.243 0.746 1040.240
>>>>>>> n_cands=80 0.809 1226.738 0.809 695.236
>>>>>>> n_cands=120 0.843 948.908 0.843 525.914
>>>>>>> n_cands=200 0.878 671.781 0.878 351.529
>>>>>>> n_cands=400 0.918 392.265 0.918 207.854
>>>>>>> n_cands=600 0.937 282.403 0.937 144.311
>>>>>>> n_cands=800 0.949 214.620 0.949 116.875
>>>>>>>
>>>>>>> On Sun, Sep 18, 2022 at 6:25 PM Michael Sokolov <msokolov@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> OK, I think I was wrong about latency having increased due to a
>>>>>>>> change
>>>>>>>> in KnnGraphTester -- I did some testing there and couldn't
>>>>>>>> reproduce.
>>>>>>>> There does seem to be a slight vector search latency increase,
>>>>>>>> possibly noise, but maybe due to the branching introduced to check
>>>>>>>> whether to do byte vs float operations? It would be a little
>>>>>>>> surprising if that were the case given the small number of
>>>>>>>> branchings
>>>>>>>> compared to the number of multiplies in dot-product though.
>>>>>>>>
>>>>>>>> On Sun, Sep 18, 2022 at 3:25 PM Michael Sokolov <msokolov@gmail.com>
>>>>>>>> wrote:
>>>>>>>> >
>>>>>>>> > Thanks for the deep-dive Julie. I was able to reproduce the
>>>>>>>> changing
>>>>>>>> > recall. I had introduced some bugs in the diversity checks (that
>>>>>>>> may
>>>>>>>> > have partially canceled each other out? it's hard to understand
>>>>>>>> what
>>>>>>>> > was happening in the buggy case) and posted a fix today
>>>>>>>> > https://github.com/apache/lucene/pull/11781.
>>>>>>>> >
>>>>>>>> > There are a couple of other outstanding issues I found while
>>>>>>>> doing a
>>>>>>>> > bunch of git bisecting;
>>>>>>>> >
>>>>>>>> > I think we might have introduced a (test-only) performance
>>>>>>>> regression
>>>>>>>> > in KnnGraphTester
>>>>>>>> >
>>>>>>>> > We may still be over-allocating the size of NeighborArray,
>>>>>>>> leading to
>>>>>>>> > excessive segmentation? I wonder if we could avoid dynamic
>>>>>>>> > re-allocation there, and simply initialize every neighbor array to
>>>>>>>> > 2*M+1.
>>>>>>>> >
>>>>>>>> > While I don't think these are necessarily blockers, given that we
>>>>>>>> are
>>>>>>>> > releasing HNSW improvements, it seems like we should address
>>>>>>>> these,
>>>>>>>> > especially as the build-graph-on-index is one of the things we are
>>>>>>>> > releasing, and it is (may be?) impacted. I will see if I can put
>>>>>>>> up a
>>>>>>>> > patch or two.
>>>>>>>> >
>>>>>>>> > It would be great if you all are able to test again with
>>>>>>>> > https://github.com/apache/lucene/pull/11781/ applied
>>>>>>>> >
>>>>>>>> > -Mike
>>>>>>>> >
>>>>>>>> > On Fri, Sep 16, 2022 at 11:07 AM Adrien Grand <jpountz@gmail.com>
>>>>>>>> wrote:
>>>>>>>> > >
>>>>>>>> > > Thank you Mike, I just backported the change.
>>>>>>>> > >
>>>>>>>> > > On Thu, Sep 15, 2022 at 6:32 PM Michael Sokolov <
>>>>>>>> msokolov@gmail.com> wrote:
>>>>>>>> > >>
>>>>>>>> > >> it looks like a small bug fix, we have had on main (and 9.x?)
>>>>>>>> for a
>>>>>>>> > >> while now and no test failures showed up, I guess. Should be
>>>>>>>> OK to
>>>>>>>> > >> port. I plan to cut artifacts this weekend, or Monday at the
>>>>>>>> latest,
>>>>>>>> > >> but if you can do the backport today or tomorrow, that's fine
>>>>>>>> by me.
>>>>>>>> > >>
>>>>>>>> > >> On Thu, Sep 15, 2022 at 10:55 AM Adrien Grand <
>>>>>>>> jpountz@gmail.com> wrote:
>>>>>>>> > >> >
>>>>>>>> > >> > Mike, I'm tempted to backport
>>>>>>>> https://github.com/apache/lucene/pull/1068 to branch_9_4, which is
>>>>>>>> a bugfix that looks pretty safe to me. What do you think?
>>>>>>>> > >> >
>>>>>>>> > >> > On Tue, Sep 13, 2022 at 4:11 PM Mayya Sharipova <
>>>>>>>> mayya.sharipova@elastic.co.invalid> wrote:
>>>>>>>> > >> >>
>>>>>>>> > >> >> Thanks for running more tests, Michael.
>>>>>>>> > >> >> It is encouraging that you saw a similar performance
>>>>>>>> between 9.3 and 9.4. I will also run more tests with different parameters.
>>>>>>>> > >> >>
>>>>>>>> > >> >> On Tue, Sep 13, 2022 at 9:30 AM Michael Sokolov <
>>>>>>>> msokolov@gmail.com> wrote:
>>>>>>>> > >> >>>
>>>>>>>> > >> >>> As a follow-up, I ran a test using the same parameters as
>>>>>>>> above, only
>>>>>>>> > >> >>> changing M=200 to M=16. This did result in a single
>>>>>>>> segment in both
>>>>>>>> > >> >>> cases (9.3, 9.4) and the performance was pretty similar;
>>>>>>>> within noise
>>>>>>>> > >> >>> I think. The main difference I saw was that the 9.3 index
>>>>>>>> was written
>>>>>>>> > >> >>> using CFS:
>>>>>>>> > >> >>>
>>>>>>>> > >> >>> 9.4:
>>>>>>>> > >> >>> recall latency nDoc fanout maxConn beamWidth
>>>>>>>> visited index ms
>>>>>>>> > >> >>> 0.755 1.36 1000000 100 16 100 200
>>>>>>>> 891402 1.00
>>>>>>>> > >> >>> post-filter
>>>>>>>> > >> >>> -rw-r--r-- 1 sokolovm amazon 382M Sep 13 13:06
>>>>>>>> > >> >>> _0_Lucene94HnswVectorsFormat_0.vec
>>>>>>>> > >> >>> -rw-r--r-- 1 sokolovm amazon 262K Sep 13 13:06
>>>>>>>> > >> >>> _0_Lucene94HnswVectorsFormat_0.vem
>>>>>>>> > >> >>> -rw-r--r-- 1 sokolovm amazon 131M Sep 13 13:06
>>>>>>>> > >> >>> _0_Lucene94HnswVectorsFormat_0.vex
>>>>>>>> > >> >>>
>>>>>>>> > >> >>> 9.3:
>>>>>>>> > >> >>> recall latency nDoc fanout maxConn beamWidth
>>>>>>>> visited index ms
>>>>>>>> > >> >>> 0.775 1.34 1000000 100 16 100 4033
>>>>>>>> 977043
>>>>>>>> > >> >>> rw-r--r-- 1 sokolovm amazon 297 Sep 13 13:26 _0.cfe
>>>>>>>> > >> >>> -rw-r--r-- 1 sokolovm amazon 516M Sep 13 13:26 _0.cfs
>>>>>>>> > >> >>> -rw-r--r-- 1 sokolovm amazon 340 Sep 13 13:26 _0.si
>>>>>>>> > >> >>>
>>>>>>>> > >> >>> On Tue, Sep 13, 2022 at 8:50 AM Michael Sokolov <
>>>>>>>> msokolov@gmail.com> wrote:
>>>>>>>> > >> >>> >
>>>>>>>> > >> >>> > I ran another test. I thought I had increased the RAM
>>>>>>>> buffer size to
>>>>>>>> > >> >>> > 8G and heap to 16G. However I still see two segments in
>>>>>>>> the index that
>>>>>>>> > >> >>> > was created. And looking at the infostream I see:
>>>>>>>> > >> >>> >
>>>>>>>> > >> >>> > dir=MMapDirectory@
>>>>>>>> /local/home/sokolovm/workspace/knn-perf/glove-100-angular.hdf5-train-200-200.index
>>>>>>>> > >> >>> > lockFactory=org\
>>>>>>>> > >> >>> > .apache.lucene.store.NativeFSLockFactory@4466af20
>>>>>>>> > >> >>> > index=
>>>>>>>> > >> >>> > version=9.4.0
>>>>>>>> > >> >>> >
>>>>>>>> analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
>>>>>>>> > >> >>> > ramBufferSizeMB=8000.0
>>>>>>>> > >> >>> > maxBufferedDocs=-1
>>>>>>>> > >> >>> > ...
>>>>>>>> > >> >>> > perThreadHardLimitMB=1945
>>>>>>>> > >> >>> > ...
>>>>>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:53.329404950Z; main]: flush
>>>>>>>> postings as
>>>>>>>> > >> >>> > segment _6 numDocs=555373
>>>>>>>> > >> >>> > IW 0 [2022-09-13T02:42:53.330671171Z; main]: 0 msec to
>>>>>>>> write norms
>>>>>>>> > >> >>> > IW 0 [2022-09-13T02:42:53.331113184Z; main]: 0 msec to
>>>>>>>> write docValues
>>>>>>>> > >> >>> > IW 0 [2022-09-13T02:42:53.331320146Z; main]: 0 msec to
>>>>>>>> write points
>>>>>>>> > >> >>> > IW 0 [2022-09-13T02:42:56.424195657Z; main]: 3092 msec
>>>>>>>> to write vectors
>>>>>>>> > >> >>> > IW 0 [2022-09-13T02:42:56.429239944Z; main]: 4 msec to
>>>>>>>> finish stored fields
>>>>>>>> > >> >>> > IW 0 [2022-09-13T02:42:56.429593512Z; main]: 0 msec to
>>>>>>>> write postings
>>>>>>>> > >> >>> > and finish vectors
>>>>>>>> > >> >>> > IW 0 [2022-09-13T02:42:56.430309031Z; main]: 0 msec to
>>>>>>>> write fieldInfos
>>>>>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.431721622Z; main]: new
>>>>>>>> segment has 0 deleted docs
>>>>>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.431921144Z; main]: new
>>>>>>>> segment has 0
>>>>>>>> > >> >>> > soft-deleted docs
>>>>>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.435738086Z; main]: new
>>>>>>>> segment has no
>>>>>>>> > >> >>> > vectors; no norms; no docValues; no prox; freqs
>>>>>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.435952356Z; main]:
>>>>>>>> > >> >>> > flushedFiles=[_6_Lucene94HnswVectorsFormat_0.vec,
>>>>>>>> _6.fdm, _6.fdt, _6_\
>>>>>>>> > >> >>> > Lucene94HnswVectorsFormat_0.vem, _6.fnm, _6.fdx,
>>>>>>>> > >> >>> > _6_Lucene94HnswVectorsFormat_0.vex]
>>>>>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.436121861Z; main]: flushed
>>>>>>>> codec=Lucene94
>>>>>>>> > >> >>> > DWPT 0 [2022-09-13T02:42:56.437691468Z; main]: flushed:
>>>>>>>> segment=_6
>>>>>>>> > >> >>> > ramUsed=1,945.002 MB newFlushedSize=1,065.701 MB \
>>>>>>>> > >> >>> > docs/MB=521.134
>>>>>>>> > >> >>> >
>>>>>>>> > >> >>> > so I think it's this perThreadHardLimit that is
>>>>>>>> triggering the
>>>>>>>> > >> >>> > flushes? TBH this isn't something I had seen before; but
>>>>>>>> the docs say:
>>>>>>>> > >> >>> >
>>>>>>>> > >> >>> > /**
>>>>>>>> > >> >>> > * Expert: Sets the maximum memory consumption per
>>>>>>>> thread triggering
>>>>>>>> > >> >>> > a forced flush if exceeded. A
>>>>>>>> > >> >>> > * {@link DocumentsWriterPerThread} is forcefully
>>>>>>>> flushed once it
>>>>>>>> > >> >>> > exceeds this limit even if the
>>>>>>>> > >> >>> > * {@link #getRAMBufferSizeMB()} has not been
>>>>>>>> exceeded. This is a
>>>>>>>> > >> >>> > safety limit to prevent a {@link
>>>>>>>> > >> >>> > * DocumentsWriterPerThread} from address space
>>>>>>>> exhaustion due to
>>>>>>>> > >> >>> > its internal 32 bit signed
>>>>>>>> > >> >>> > * integer based memory addressing. The given value
>>>>>>>> must be less
>>>>>>>> > >> >>> > that 2GB (2048MB)
>>>>>>>> > >> >>> > *
>>>>>>>> > >> >>> > * @see #DEFAULT_RAM_PER_THREAD_HARD_LIMIT_MB
>>>>>>>> > >> >>> > */
>>>>>>>> > >> >>> >
>>>>>>>> > >> >>> > On Mon, Sep 12, 2022 at 6:28 PM Michael Sokolov <
>>>>>>>> msokolov@gmail.com> wrote:
>>>>>>>> > >> >>> > >
>>>>>>>> > >> >>> > > Hi Mayya, thanks for persisting - I think we need to
>>>>>>>> wrestle this to
>>>>>>>> > >> >>> > > the ground for sure. In the test I ran, RAM buffer was
>>>>>>>> the default
>>>>>>>> > >> >>> > > checked in, which is weirdly: 1994MB. I did not
>>>>>>>> specifically set heap
>>>>>>>> > >> >>> > > size. I used maxConn/M=200. I'll try with larger
>>>>>>>> buffer to see if I
>>>>>>>> > >> >>> > > can get 9.4 to produce a single segment for the same
>>>>>>>> test settings. I
>>>>>>>> > >> >>> > > see you used a much smaller M (16), which should have
>>>>>>>> produced quite
>>>>>>>> > >> >>> > > small graphs, and I agree, should have been a single
>>>>>>>> segment. Were you
>>>>>>>> > >> >>> > > able to verify the number of segments?
>>>>>>>> > >> >>> > >
>>>>>>>> > >> >>> > > Agree that decrease in recall is not expected when
>>>>>>>> more segments are produced.
>>>>>>>> > >> >>> > >
>>>>>>>> > >> >>> > > On Mon, Sep 12, 2022 at 1:51 PM Mayya Sharipova
>>>>>>>> > >> >>> > > <mayya.sharipova@elastic.co.invalid> wrote:
>>>>>>>> > >> >>> > > >
>>>>>>>> > >> >>> > > > Hello Michael,
>>>>>>>> > >> >>> > > > Thanks for checking.
>>>>>>>> > >> >>> > > > Sorry for bringing this up again.
>>>>>>>> > >> >>> > > > First of all, I am ok with proceeding with the
>>>>>>>> Lucene 9.4 release and leaving the performance investigations for later.
>>>>>>>> > >> >>> > > >
>>>>>>>> > >> >>> > > > I am interested in what's the maxConn/M value you
>>>>>>>> used for your tests? What was the heap memory and the size of the RAM
>>>>>>>> buffer for indexing?
>>>>>>>> > >> >>> > > > Usually, when we have multiple segments, recall
>>>>>>>> should increase, not decrease. But I agree that with multiple segments we
>>>>>>>> can see a big drop in QPS.
>>>>>>>> > >> >>> > > >
>>>>>>>> > >> >>> > > > Here is my investigation with detailed output of the
>>>>>>>> performance difference between 9.3 and 9.4 releases. In my tests I used a
>>>>>>>> large indexing buffer (2Gb) and large heap (5Gb) to end up with a single
>>>>>>>> segment for both 9.3 and 9.4 tests, but still see a big drop in QPS in 9.4.
>>>>>>>> > >> >>> > > >
>>>>>>>> > >> >>> > > > Thank you.
>>>>>>>> > >> >>> > > >
>>>>>>>> > >> >>> > > >
>>>>>>>> > >> >>> > > >
>>>>>>>> > >> >>> > > >
>>>>>>>> > >> >>> > > >
>>>>>>>> > >> >>> > > > On Fri, Sep 9, 2022 at 12:21 PM Alan Woodward <
>>>>>>>> romseygeek@gmail.com> wrote:
>>>>>>>> > >> >>> > > >>
>>>>>>>> > >> >>> > > >> Done. Thanks!
>>>>>>>> > >> >>> > > >>
>>>>>>>> > >> >>> > > >> > On 9 Sep 2022, at 16:32, Michael Sokolov <
>>>>>>>> msokolov@gmail.com> wrote:
>>>>>>>> > >> >>> > > >> >
>>>>>>>> > >> >>> > > >> > Hi Alan - I checked out the interval queries
>>>>>>>> patch; seems pretty safe,
>>>>>>>> > >> >>> > > >> > please go ahead and port to 9.4. Thanks!
>>>>>>>> > >> >>> > > >> >
>>>>>>>> > >> >>> > > >> > Mike
>>>>>>>> > >> >>> > > >> >
>>>>>>>> > >> >>> > > >> > On Fri, Sep 9, 2022 at 10:41 AM Alan Woodward <
>>>>>>>> romseygeek@gmail.com> wrote:
>>>>>>>> > >> >>> > > >> >>
>>>>>>>> > >> >>> > > >> >> Hi Mike,
>>>>>>>> > >> >>> > > >> >>
>>>>>>>> > >> >>> > > >> >> I’ve opened
>>>>>>>> https://github.com/apache/lucene/pull/11760 as a small bug fix PR
>>>>>>>> for a problem with interval queries. Am I OK to port this to the 9.4
>>>>>>>> branch?
>>>>>>>> > >> >>> > > >> >>
>>>>>>>> > >> >>> > > >> >> Thanks, Alan
>>>>>>>> > >> >>> > > >> >>
>>>>>>>> > >> >>> > > >> >> On 2 Sep 2022, at 20:42, Michael Sokolov <
>>>>>>>> msokolov@gmail.com> wrote:
>>>>>>>> > >> >>> > > >> >>
>>>>>>>> > >> >>> > > >> >> NOTICE:
>>>>>>>> > >> >>> > > >> >>
>>>>>>>> > >> >>> > > >> >> Branch branch_9_4 has been cut and versions
>>>>>>>> updated to 9.5 on stable branch.
>>>>>>>> > >> >>> > > >> >>
>>>>>>>> > >> >>> > > >> >> Please observe the normal rules:
>>>>>>>> > >> >>> > > >> >>
>>>>>>>> > >> >>> > > >> >> * No new features may be committed to the branch.
>>>>>>>> > >> >>> > > >> >> * Documentation patches, build patches and
>>>>>>>> serious bug fixes may be
>>>>>>>> > >> >>> > > >> >> committed to the branch. However, you should
>>>>>>>> submit all patches you
>>>>>>>> > >> >>> > > >> >> want to commit to Jira first to give others the
>>>>>>>> chance to review
>>>>>>>> > >> >>> > > >> >> and possibly vote against the patch. Keep in
>>>>>>>> mind that it is our
>>>>>>>> > >> >>> > > >> >> main intention to keep the branch as stable as
>>>>>>>> possible.
>>>>>>>> > >> >>> > > >> >> * All patches that are intended for the branch
>>>>>>>> should first be committed
>>>>>>>> > >> >>> > > >> >> to the unstable branch, merged into the stable
>>>>>>>> branch, and then into
>>>>>>>> > >> >>> > > >> >> the current release branch.
>>>>>>>> > >> >>> > > >> >> * Normal unstable and stable branch development
>>>>>>>> may continue as usual.
>>>>>>>> > >> >>> > > >> >> However, if you plan to commit a big change to
>>>>>>>> the unstable branch
>>>>>>>> > >> >>> > > >> >> while the branch feature freeze is in effect,
>>>>>>>> think twice: can't the
>>>>>>>> > >> >>> > > >> >> addition wait a couple more days? Merges of bug
>>>>>>>> fixes into the branch
>>>>>>>> > >> >>> > > >> >> may become more difficult.
>>>>>>>> > >> >>> > > >> >> * Only Jira issues with Fix version 9.4 and
>>>>>>>> priority "Blocker" will delay
>>>>>>>> > >> >>> > > >> >> a release candidate build.
>>>>>>>> > >> >>> > > >> >>
>>>>>>>> > >> >>> > > >> >>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> > >> >>> > > >> >> To unsubscribe, e-mail:
>>>>>>>> dev-unsubscribe@lucene.apache.org
>>>>>>>> > >> >>> > > >> >> For additional commands, e-mail:
>>>>>>>> dev-help@lucene.apache.org
>>>>>>>> > >> >>> > > >> >>
>>>>>>>> > >> >>> > > >> >>
>>>>>>>> > >> >>> > > >> >
>>>>>>>> > >> >>> > > >> >
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> > >> >>> > > >> > To unsubscribe, e-mail:
>>>>>>>> dev-unsubscribe@lucene.apache.org
>>>>>>>> > >> >>> > > >> > For additional commands, e-mail:
>>>>>>>> dev-help@lucene.apache.org
>>>>>>>> > >> >>> > > >> >
>>>>>>>> > >> >>> > > >>
>>>>>>>> > >> >>> > > >>
>>>>>>>> > >> >>> > > >>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> > >> >>> > > >> To unsubscribe, e-mail:
>>>>>>>> dev-unsubscribe@lucene.apache.org
>>>>>>>> > >> >>> > > >> For additional commands, e-mail:
>>>>>>>> dev-help@lucene.apache.org
>>>>>>>> > >> >>> > > >>
>>>>>>>> > >> >>>
>>>>>>>> > >> >>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> > >> >>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>>>>>> > >> >>> For additional commands, e-mail:
>>>>>>>> dev-help@lucene.apache.org
>>>>>>>> > >> >>>
>>>>>>>> > >> >
>>>>>>>> > >> >
>>>>>>>> > >> > --
>>>>>>>> > >> > Adrien
>>>>>>>> > >>
>>>>>>>> > >>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> > >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>>>>>> > >> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>>>>> > >>
>>>>>>>> > >
>>>>>>>> > >
>>>>>>>> > > --
>>>>>>>> > > Adrien
>>>>>>>>
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>>>>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>>>>>
>>>>>>>>
>>
>> --
>> Adrien
>>
>

--
Adrien

Re: Subject: New branch and feature freeze for Lucene 9.4.0 [ In reply to ]

Sep 21, 2022, 1:32 AM

Post #32 of 37 (415 views)

Hi,

JDK 19 was released yesterday and I am still waiting for AdoptOpenJDK to
publish Gradle Toolchain compatible releases to be available. To me the
schedule is a bit bad: Just on the day of the possibility to add
(optional) support for JDK 19 Panama powered MMAP, we started the release.

I would really like to have a JDK 19 release that adds support for JDK
19 preview (it can only be tested for exactly the JDK 19 series, won't
work with JDK 20), so we should have a release including the new code
between now and March (ideally before XMAS).

Options we have

* Let the 9.4.0 release go out now and add a 9.5.0 in a month or so (I
would be release manager). I do not want to add a bugfix release as
theres a small API change (MMAPDirectory ctor changes from int ->
long parameter for chunk size). We can name this release as the
"Java 19 release for early adopters"
* Wait a few days and respin the release after
https://github.com/apache/lucene/pull/912 went in? The code is
thoroughly tested by Policeman Jenkins since several months, just
the compilation does not work out of box until theres a Temurin
build of OpenJDK 19.

I just repeat, the above PR does not change any productive code, so it
should be bug-free. It only adds a few class files which are only used
when you pass "--enable-preview" to your JDK. This makes it easy for
users to check Solr or Elasticsearch with JDK 19. No risk, it only
activates when you enable it.

Thoughts?

Uwe

Am 02.09.2022 um 21:42 schrieb Michael Sokolov:
> NOTICE:
>
> Branch branch_9_4 has been cut and versions updated to 9.5 on stable branch.
>
> Please observe the normal rules:
>
> * No new features may be committed to the branch.
> * Documentation patches, build patches and serious bug fixes may be
> committed to the branch. However, you should submit all patches you
> want to commit to Jira first to give others the chance to review
> and possibly vote against the patch. Keep in mind that it is our
> main intention to keep the branch as stable as possible.
> * All patches that are intended for the branch should first be committed
> to the unstable branch, merged into the stable branch, and then into
> the current release branch.
> * Normal unstable and stable branch development may continue as usual.
> However, if you plan to commit a big change to the unstable branch
> while the branch feature freeze is in effect, think twice: can't the
> addition wait a couple more days? Merges of bug fixes into the branch
> may become more difficult.
> * Only Jira issues with Fix version 9.4 and priority "Blocker" will delay
> a release candidate build.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail:dev-help@lucene.apache.org
>
--
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail:uwe@thetaphi.de

Re: Subject: New branch and feature freeze for Lucene 9.4.0 [ In reply to ]

Sep 21, 2022, 5:05 AM

Post #33 of 37 (415 views)

I see; I would kind of like to get the release out before ApacheCon
NA, which starts Oct 3. Do you think it's likely AdoptOpenJDK will
release its JDK19 in the next week (say by Sep 26)?

On Wed, Sep 21, 2022 at 4:32 AM Uwe Schindler <uwe@thetaphi.de> wrote:
>
> Hi,
>
> JDK 19 was released yesterday and I am still waiting for AdoptOpenJDK to publish Gradle Toolchain compatible releases to be available. To me the schedule is a bit bad: Just on the day of the possibility to add (optional) support for JDK 19 Panama powered MMAP, we started the release.
>
> I would really like to have a JDK 19 release that adds support for JDK 19 preview (it can only be tested for exactly the JDK 19 series, won't work with JDK 20), so we should have a release including the new code between now and March (ideally before XMAS).
>
> Options we have
>
> Let the 9.4.0 release go out now and add a 9.5.0 in a month or so (I would be release manager). I do not want to add a bugfix release as theres a small API change (MMAPDirectory ctor changes from int -> long parameter for chunk size). We can name this release as the "Java 19 release for early adopters"
> Wait a few days and respin the release after https://github.com/apache/lucene/pull/912 went in? The code is thoroughly tested by Policeman Jenkins since several months, just the compilation does not work out of box until theres a Temurin build of OpenJDK 19.
>
> I just repeat, the above PR does not change any productive code, so it should be bug-free. It only adds a few class files which are only used when you pass "--enable-preview" to your JDK. This makes it easy for users to check Solr or Elasticsearch with JDK 19. No risk, it only activates when you enable it.
>
> Thoughts?
>
> Uwe
>
> Am 02.09.2022 um 21:42 schrieb Michael Sokolov:
>
> NOTICE:
>
> Branch branch_9_4 has been cut and versions updated to 9.5 on stable branch.
>
> Please observe the normal rules:
>
> * No new features may be committed to the branch.
> * Documentation patches, build patches and serious bug fixes may be
> committed to the branch. However, you should submit all patches you
> want to commit to Jira first to give others the chance to review
> and possibly vote against the patch. Keep in mind that it is our
> main intention to keep the branch as stable as possible.
> * All patches that are intended for the branch should first be committed
> to the unstable branch, merged into the stable branch, and then into
> the current release branch.
> * Normal unstable and stable branch development may continue as usual.
> However, if you plan to commit a big change to the unstable branch
> while the branch feature freeze is in effect, think twice: can't the
> addition wait a couple more days? Merges of bug fixes into the branch
> may become more difficult.
> * Only Jira issues with Fix version 9.4 and priority "Blocker" will delay
> a release candidate build.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
> --
> Uwe Schindler
> Achterdiek 19, D-28357 Bremen
> https://www.thetaphi.de
> eMail: uwe@thetaphi.de

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Subject: New branch and feature freeze for Lucene 9.4.0 [ In reply to ]

Sep 21, 2022, 5:31 AM

Post #34 of 37 (415 views)

Hi,

I will check later today how long it took last time in March. I would
have expected that they just need to wait until the builds and tests are
done so it gets released.

I don't want to hold up the release. The vote is still ongoning, so we
have all options.

Uwe

Am 21.09.2022 um 14:05 schrieb Michael Sokolov:
> I see; I would kind of like to get the release out before ApacheCon
> NA, which starts Oct 3. Do you think it's likely AdoptOpenJDK will
> release its JDK19 in the next week (say by Sep 26)?
>
> On Wed, Sep 21, 2022 at 4:32 AM Uwe Schindler <uwe@thetaphi.de> wrote:
>> Hi,
>>
>> JDK 19 was released yesterday and I am still waiting for AdoptOpenJDK to publish Gradle Toolchain compatible releases to be available. To me the schedule is a bit bad: Just on the day of the possibility to add (optional) support for JDK 19 Panama powered MMAP, we started the release.
>>
>> I would really like to have a JDK 19 release that adds support for JDK 19 preview (it can only be tested for exactly the JDK 19 series, won't work with JDK 20), so we should have a release including the new code between now and March (ideally before XMAS).
>>
>> Options we have
>>
>> Let the 9.4.0 release go out now and add a 9.5.0 in a month or so (I would be release manager). I do not want to add a bugfix release as theres a small API change (MMAPDirectory ctor changes from int -> long parameter for chunk size). We can name this release as the "Java 19 release for early adopters"
>> Wait a few days and respin the release after https://github.com/apache/lucene/pull/912 went in? The code is thoroughly tested by Policeman Jenkins since several months, just the compilation does not work out of box until theres a Temurin build of OpenJDK 19.
>>
>> I just repeat, the above PR does not change any productive code, so it should be bug-free. It only adds a few class files which are only used when you pass "--enable-preview" to your JDK. This makes it easy for users to check Solr or Elasticsearch with JDK 19. No risk, it only activates when you enable it.
>>
>> Thoughts?
>>
>> Uwe
>>
>> Am 02.09.2022 um 21:42 schrieb Michael Sokolov:
>>
>> NOTICE:
>>
>> Branch branch_9_4 has been cut and versions updated to 9.5 on stable branch.
>>
>> Please observe the normal rules:
>>
>> * No new features may be committed to the branch.
>> * Documentation patches, build patches and serious bug fixes may be
>> committed to the branch. However, you should submit all patches you
>> want to commit to Jira first to give others the chance to review
>> and possibly vote against the patch. Keep in mind that it is our
>> main intention to keep the branch as stable as possible.
>> * All patches that are intended for the branch should first be committed
>> to the unstable branch, merged into the stable branch, and then into
>> the current release branch.
>> * Normal unstable and stable branch development may continue as usual.
>> However, if you plan to commit a big change to the unstable branch
>> while the branch feature freeze is in effect, think twice: can't the
>> addition wait a couple more days? Merges of bug fixes into the branch
>> may become more difficult.
>> * Only Jira issues with Fix version 9.4 and priority "Blocker" will delay
>> a release candidate build.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>> --
>> Uwe Schindler
>> Achterdiek 19, D-28357 Bremen
>> https://www.thetaphi.de
>> eMail: uwe@thetaphi.de
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
--
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail: uwe@thetaphi.de

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Subject: New branch and feature freeze for Lucene 9.4.0 [ In reply to ]

Sep 21, 2022, 10:34 AM

Post #35 of 37 (415 views)

FYI, here (https://github.com/adoptium/adoptium/issues/171) Eclipse says:

* Add website banner (automate* via github workflow in website
repository) - Announce that we target releases to be available within
48-72 hours of the GA tags being available

Uwe

Am 21.09.2022 um 14:31 schrieb Uwe Schindler:
> Hi,
>
> I will check later today how long it took last time in March. I would
> have expected that they just need to wait until the builds and tests
> are done so it gets released.
>
> I don't want to hold up the release. The vote is still ongoning, so we
> have all options.
>
> Uwe
>
> Am 21.09.2022 um 14:05 schrieb Michael Sokolov:
>> I see; I would kind of like to get the release out before ApacheCon
>> NA, which starts Oct 3. Do you think it's likely AdoptOpenJDK will
>> release its JDK19 in the next week (say by Sep 26)?
>>
>> On Wed, Sep 21, 2022 at 4:32 AM Uwe Schindler <uwe@thetaphi.de> wrote:
>>> Hi,
>>>
>>> JDK 19 was released yesterday and I am still waiting for
>>> AdoptOpenJDK to publish Gradle Toolchain compatible releases to be
>>> available. To me the schedule is a bit bad: Just on the day of the
>>> possibility to add (optional) support for JDK 19 Panama powered
>>> MMAP, we started the release.
>>>
>>> I would really like to have a JDK 19 release that adds support for
>>> JDK 19 preview (it can only be tested for exactly the JDK 19 series,
>>> won't work with JDK 20), so we should have a release including the
>>> new code between now and March (ideally before XMAS).
>>>
>>> Options we have
>>>
>>> Let the 9.4.0 release go out now and add a 9.5.0 in a month or so (I
>>> would be release manager). I do not want to add a bugfix release as
>>> theres a small API change (MMAPDirectory ctor changes from int ->
>>> long parameter for chunk size). We can name this release as the
>>> "Java 19 release for early adopters"
>>> Wait a few days and respin the release after
>>> https://github.com/apache/lucene/pull/912 went in? The code is
>>> thoroughly tested by Policeman Jenkins since several months, just
>>> the compilation does not work out of box until theres a Temurin
>>> build of OpenJDK 19.
>>>
>>> I just repeat, the above PR does not change any productive code, so
>>> it should be bug-free. It only adds a few class files which are only
>>> used when you pass "--enable-preview" to your JDK. This makes it
>>> easy for users to check Solr or Elasticsearch with JDK 19. No risk,
>>> it only activates when you enable it.
>>>
>>> Thoughts?
>>>
>>> Uwe
>>>
>>> Am 02.09.2022 um 21:42 schrieb Michael Sokolov:
>>>
>>> NOTICE:
>>>
>>> Branch branch_9_4 has been cut and versions updated to 9.5 on stable
>>> branch.
>>>
>>> Please observe the normal rules:
>>>
>>> * No new features may be committed to the branch.
>>> * Documentation patches, build patches and serious bug fixes may be
>>>    committed to the branch. However, you should submit all patches you
>>>    want to commit to Jira first to give others the chance to review
>>>    and possibly vote against the patch. Keep in mind that it is our
>>>    main intention to keep the branch as stable as possible.
>>> * All patches that are intended for the branch should first be
>>> committed
>>>    to the unstable branch, merged into the stable branch, and then into
>>>    the current release branch.
>>> * Normal unstable and stable branch development may continue as usual.
>>>    However, if you plan to commit a big change to the unstable branch
>>>    while the branch feature freeze is in effect, think twice: can't the
>>>    addition wait a couple more days? Merges of bug fixes into the
>>> branch
>>>    may become more difficult.
>>> * Only Jira issues with Fix version 9.4 and priority "Blocker" will
>>> delay
>>>    a release candidate build.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>
>>> --
>>> Uwe Schindler
>>> Achterdiek 19, D-28357 Bremen
>>> https://www.thetaphi.de
>>> eMail: uwe@thetaphi.de
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
--
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail: uwe@thetaphi.de

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Subject: New branch and feature freeze for Lucene 9.4.0 [ In reply to ]

Sep 21, 2022, 11:44 AM

Post #36 of 37 (415 views)

OK, how does this sound: if there is a (JDK19 AdoptOpenJDK) release
this week as it seems there should be, and you are able to fast-follow
with the Lucene changes to use it then I can re-spin RC2 on Monday or
Tuesday.

On Wed, Sep 21, 2022 at 1:35 PM Uwe Schindler <uwe@thetaphi.de> wrote:
>
> FYI, here (https://github.com/adoptium/adoptium/issues/171) Eclipse says:
>
> * Add website banner (automate* via github workflow in website
> repository) - Announce that we target releases to be available within
> 48-72 hours of the GA tags being available
>
> Uwe
>
> Am 21.09.2022 um 14:31 schrieb Uwe Schindler:
> > Hi,
> >
> > I will check later today how long it took last time in March. I would
> > have expected that they just need to wait until the builds and tests
> > are done so it gets released.
> >
> > I don't want to hold up the release. The vote is still ongoning, so we
> > have all options.
> >
> > Uwe
> >
> > Am 21.09.2022 um 14:05 schrieb Michael Sokolov:
> >> I see; I would kind of like to get the release out before ApacheCon
> >> NA, which starts Oct 3. Do you think it's likely AdoptOpenJDK will
> >> release its JDK19 in the next week (say by Sep 26)?
> >>
> >> On Wed, Sep 21, 2022 at 4:32 AM Uwe Schindler <uwe@thetaphi.de> wrote:
> >>> Hi,
> >>>
> >>> JDK 19 was released yesterday and I am still waiting for
> >>> AdoptOpenJDK to publish Gradle Toolchain compatible releases to be
> >>> available. To me the schedule is a bit bad: Just on the day of the
> >>> possibility to add (optional) support for JDK 19 Panama powered
> >>> MMAP, we started the release.
> >>>
> >>> I would really like to have a JDK 19 release that adds support for
> >>> JDK 19 preview (it can only be tested for exactly the JDK 19 series,
> >>> won't work with JDK 20), so we should have a release including the
> >>> new code between now and March (ideally before XMAS).
> >>>
> >>> Options we have
> >>>
> >>> Let the 9.4.0 release go out now and add a 9.5.0 in a month or so (I
> >>> would be release manager). I do not want to add a bugfix release as
> >>> theres a small API change (MMAPDirectory ctor changes from int ->
> >>> long parameter for chunk size). We can name this release as the
> >>> "Java 19 release for early adopters"
> >>> Wait a few days and respin the release after
> >>> https://github.com/apache/lucene/pull/912 went in? The code is
> >>> thoroughly tested by Policeman Jenkins since several months, just
> >>> the compilation does not work out of box until theres a Temurin
> >>> build of OpenJDK 19.
> >>>
> >>> I just repeat, the above PR does not change any productive code, so
> >>> it should be bug-free. It only adds a few class files which are only
> >>> used when you pass "--enable-preview" to your JDK. This makes it
> >>> easy for users to check Solr or Elasticsearch with JDK 19. No risk,
> >>> it only activates when you enable it.
> >>>
> >>> Thoughts?
> >>>
> >>> Uwe
> >>>
> >>> Am 02.09.2022 um 21:42 schrieb Michael Sokolov:
> >>>
> >>> NOTICE:
> >>>
> >>> Branch branch_9_4 has been cut and versions updated to 9.5 on stable
> >>> branch.
> >>>
> >>> Please observe the normal rules:
> >>>
> >>> * No new features may be committed to the branch.
> >>> * Documentation patches, build patches and serious bug fixes may be
> >>> committed to the branch. However, you should submit all patches you
> >>> want to commit to Jira first to give others the chance to review
> >>> and possibly vote against the patch. Keep in mind that it is our
> >>> main intention to keep the branch as stable as possible.
> >>> * All patches that are intended for the branch should first be
> >>> committed
> >>> to the unstable branch, merged into the stable branch, and then into
> >>> the current release branch.
> >>> * Normal unstable and stable branch development may continue as usual.
> >>> However, if you plan to commit a big change to the unstable branch
> >>> while the branch feature freeze is in effect, think twice: can't the
> >>> addition wait a couple more days? Merges of bug fixes into the
> >>> branch
> >>> may become more difficult.
> >>> * Only Jira issues with Fix version 9.4 and priority "Blocker" will
> >>> delay
> >>> a release candidate build.
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> >>> For additional commands, e-mail: dev-help@lucene.apache.org
> >>>
> >>> --
> >>> Uwe Schindler
> >>> Achterdiek 19, D-28357 Bremen
> >>> https://www.thetaphi.de
> >>> eMail: uwe@thetaphi.de
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: dev-help@lucene.apache.org
> >>
> --
> Uwe Schindler
> Achterdiek 19, D-28357 Bremen
> https://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Subject: New branch and feature freeze for Lucene 9.4.0 [ In reply to ]

Sep 21, 2022, 1:11 PM

Post #37 of 37 (415 views)