Mailing List Archive

Lucene 9.0 release candidate
Hello,

I plan to build a RC for Lucene 9.0 in the next few days.

We don't have blockers left, but there are two faceting changes that look
like we could save some backward compatibility logic in 10.x by folding
them into 9.0:
- LUCENE-10062 <https://issues.apache.org/jira/browse/LUCENE-10062>:
https://github.com/apache/lucene/pull/264
- LUCENE-10122 <https://issues.apache.org/jira/browse/LUCENE-10122>:
https://github.com/apache/lucene/pull/420

I'm interested in thoughts regarding whether I should wait for these
changes.

--
Adrien
Re: Lucene 9.0 release candidate [ In reply to ]
I think we should wait for these two changes.

I also think we should add automatic bundle names to all JARs starting
with 9.0. Even if they're not proper modules yet, it'd clarify what
the module names would be for all of the 9x line. I think a short name
of lucene.* is sufficient (we don't need to prefix with
org.apache.lucene) so that we have modules like lucene.core,
lucene.analysis.kuromoji, etc. I can add this - already have a local
patch that does it and enables Luke to become a first-class module,
for example.

Dawid

On Sat, Nov 13, 2021 at 8:49 PM Adrien Grand <jpountz@gmail.com> wrote:
>
> Hello,
>
> I plan to build a RC for Lucene 9.0 in the next few days.
>
> We don't have blockers left, but there are two faceting changes that look like we could save some backward compatibility logic in 10.x by folding them into 9.0:
> - LUCENE-10062: https://github.com/apache/lucene/pull/264
> - LUCENE-10122: https://github.com/apache/lucene/pull/420
>
> I'm interested in thoughts regarding whether I should wait for these changes.
>
> --
> Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: Lucene 9.0 release candidate [ In reply to ]
Thanks Dawid.

@Greg Miller <gsmiller@gmail.com> What do you think about getting these two
PRs in for 9.0?

On Sun, Nov 14, 2021 at 9:25 AM Dawid Weiss <dawid.weiss@gmail.com> wrote:

> I think we should wait for these two changes.
>
> I also think we should add automatic bundle names to all JARs starting
> with 9.0. Even if they're not proper modules yet, it'd clarify what
> the module names would be for all of the 9x line. I think a short name
> of lucene.* is sufficient (we don't need to prefix with
> org.apache.lucene) so that we have modules like lucene.core,
> lucene.analysis.kuromoji, etc. I can add this - already have a local
> patch that does it and enables Luke to become a first-class module,
> for example.
>
> Dawid
>
> On Sat, Nov 13, 2021 at 8:49 PM Adrien Grand <jpountz@gmail.com> wrote:
> >
> > Hello,
> >
> > I plan to build a RC for Lucene 9.0 in the next few days.
> >
> > We don't have blockers left, but there are two faceting changes that
> look like we could save some backward compatibility logic in 10.x by
> folding them into 9.0:
> > - LUCENE-10062: https://github.com/apache/lucene/pull/264
> > - LUCENE-10122: https://github.com/apache/lucene/pull/420
> >
> > I'm interested in thoughts regarding whether I should wait for these
> changes.
> >
> > --
> > Adrien
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

--
Adrien
Re: Lucene 9.0 release candidate [ In reply to ]
+1, yeah let's see if we can get these in. It would be nice to not
carry the back-compat logic into 10. I'll prioritize these
today/tomorrow; I think we should be able to turn them around pretty
quickly. I'll update on this thread when PRs have been merged.

Cheers,
-Greg

On Mon, Nov 15, 2021 at 6:20 AM Adrien Grand <jpountz@gmail.com> wrote:
>
> Thanks Dawid.
>
> @Greg Miller What do you think about getting these two PRs in for 9.0?
>
> On Sun, Nov 14, 2021 at 9:25 AM Dawid Weiss <dawid.weiss@gmail.com> wrote:
>>
>> I think we should wait for these two changes.
>>
>> I also think we should add automatic bundle names to all JARs starting
>> with 9.0. Even if they're not proper modules yet, it'd clarify what
>> the module names would be for all of the 9x line. I think a short name
>> of lucene.* is sufficient (we don't need to prefix with
>> org.apache.lucene) so that we have modules like lucene.core,
>> lucene.analysis.kuromoji, etc. I can add this - already have a local
>> patch that does it and enables Luke to become a first-class module,
>> for example.
>>
>> Dawid
>>
>> On Sat, Nov 13, 2021 at 8:49 PM Adrien Grand <jpountz@gmail.com> wrote:
>> >
>> > Hello,
>> >
>> > I plan to build a RC for Lucene 9.0 in the next few days.
>> >
>> > We don't have blockers left, but there are two faceting changes that look like we could save some backward compatibility logic in 10.x by folding them into 9.0:
>> > - LUCENE-10062: https://github.com/apache/lucene/pull/264
>> > - LUCENE-10122: https://github.com/apache/lucene/pull/420
>> >
>> > I'm interested in thoughts regarding whether I should wait for these changes.
>> >
>> > --
>> > Adrien
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>
>
> --
> Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: Lucene 9.0 release candidate [ In reply to ]
Thanks Greg!

On Mon, Nov 15, 2021 at 6:09 PM Greg Miller <gsmiller@gmail.com> wrote:

> +1, yeah let's see if we can get these in. It would be nice to not
> carry the back-compat logic into 10. I'll prioritize these
> today/tomorrow; I think we should be able to turn them around pretty
> quickly. I'll update on this thread when PRs have been merged.
>
> Cheers,
> -Greg
>
> On Mon, Nov 15, 2021 at 6:20 AM Adrien Grand <jpountz@gmail.com> wrote:
> >
> > Thanks Dawid.
> >
> > @Greg Miller What do you think about getting these two PRs in for 9.0?
> >
> > On Sun, Nov 14, 2021 at 9:25 AM Dawid Weiss <dawid.weiss@gmail.com>
> wrote:
> >>
> >> I think we should wait for these two changes.
> >>
> >> I also think we should add automatic bundle names to all JARs starting
> >> with 9.0. Even if they're not proper modules yet, it'd clarify what
> >> the module names would be for all of the 9x line. I think a short name
> >> of lucene.* is sufficient (we don't need to prefix with
> >> org.apache.lucene) so that we have modules like lucene.core,
> >> lucene.analysis.kuromoji, etc. I can add this - already have a local
> >> patch that does it and enables Luke to become a first-class module,
> >> for example.
> >>
> >> Dawid
> >>
> >> On Sat, Nov 13, 2021 at 8:49 PM Adrien Grand <jpountz@gmail.com> wrote:
> >> >
> >> > Hello,
> >> >
> >> > I plan to build a RC for Lucene 9.0 in the next few days.
> >> >
> >> > We don't have blockers left, but there are two faceting changes that
> look like we could save some backward compatibility logic in 10.x by
> folding them into 9.0:
> >> > - LUCENE-10062: https://github.com/apache/lucene/pull/264
> >> > - LUCENE-10122: https://github.com/apache/lucene/pull/420
> >> >
> >> > I'm interested in thoughts regarding whether I should wait for these
> changes.
> >> >
> >> > --
> >> > Adrien
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: dev-help@lucene.apache.org
> >>
> >
> >
> > --
> > Adrien
>


--
Adrien
Re: Lucene 9.0 release candidate [ In reply to ]
I think for PR 420 (https://github.com/apache/lucene/pull/420) we are
(confusingly!) not really seeing performance benefits -- taxonomy index got
a bit bigger, and loading the parent arrays no faster? So Patrick closed
that one.

But the other PR (https://github.com/apache/lucene/pull/264) looks like it
shows huge speedups on luceneutil. Thanks Greg for pushing on this one!

Mike McCandless

http://blog.mikemccandless.com


On Mon, Nov 15, 2021 at 12:22 PM Adrien Grand <jpountz@gmail.com> wrote:

> Thanks Greg!
>
> On Mon, Nov 15, 2021 at 6:09 PM Greg Miller <gsmiller@gmail.com> wrote:
>
>> +1, yeah let's see if we can get these in. It would be nice to not
>> carry the back-compat logic into 10. I'll prioritize these
>> today/tomorrow; I think we should be able to turn them around pretty
>> quickly. I'll update on this thread when PRs have been merged.
>>
>> Cheers,
>> -Greg
>>
>> On Mon, Nov 15, 2021 at 6:20 AM Adrien Grand <jpountz@gmail.com> wrote:
>> >
>> > Thanks Dawid.
>> >
>> > @Greg Miller What do you think about getting these two PRs in for 9.0?
>> >
>> > On Sun, Nov 14, 2021 at 9:25 AM Dawid Weiss <dawid.weiss@gmail.com>
>> wrote:
>> >>
>> >> I think we should wait for these two changes.
>> >>
>> >> I also think we should add automatic bundle names to all JARs starting
>> >> with 9.0. Even if they're not proper modules yet, it'd clarify what
>> >> the module names would be for all of the 9x line. I think a short name
>> >> of lucene.* is sufficient (we don't need to prefix with
>> >> org.apache.lucene) so that we have modules like lucene.core,
>> >> lucene.analysis.kuromoji, etc. I can add this - already have a local
>> >> patch that does it and enables Luke to become a first-class module,
>> >> for example.
>> >>
>> >> Dawid
>> >>
>> >> On Sat, Nov 13, 2021 at 8:49 PM Adrien Grand <jpountz@gmail.com>
>> wrote:
>> >> >
>> >> > Hello,
>> >> >
>> >> > I plan to build a RC for Lucene 9.0 in the next few days.
>> >> >
>> >> > We don't have blockers left, but there are two faceting changes that
>> look like we could save some backward compatibility logic in 10.x by
>> folding them into 9.0:
>> >> > - LUCENE-10062: https://github.com/apache/lucene/pull/264
>> >> > - LUCENE-10122: https://github.com/apache/lucene/pull/420
>> >> >
>> >> > I'm interested in thoughts regarding whether I should wait for these
>> changes.
>> >> >
>> >> > --
>> >> > Adrien
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> >> For additional commands, e-mail: dev-help@lucene.apache.org
>> >>
>> >
>> >
>> > --
>> > Adrien
>>
>
>
> --
> Adrien
>
Re: Lucene 9.0 release candidate [ In reply to ]
On Mon, Nov 15, 2021 at 12:57 PM Michael McCandless
<lucene@mikemccandless.com> wrote:
>
> I think for PR 420 (https://github.com/apache/lucene/pull/420) we are (confusingly!) not really seeing performance benefits -- taxonomy index got a bit bigger, and loading the parent arrays no faster? So Patrick closed that one.

I'm confused about this (Sorry I am not up to speed), but are we not
able to offload today's very large arrays to docvalues (e.g. mmap)
with the change? Wasn't that the original motivation, that the memory
usage was somewhat trappy? I wouldn't expect to see performance
benefits over today's on-heap arrays that are read from payloads or
whatever, instead it would be a memory benefit?

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: Lucene 9.0 release candidate [ In reply to ]
On Mon, Nov 15, 2021 at 1:14 PM Robert Muir <rcmuir@gmail.com> wrote:

On Mon, Nov 15, 2021 at 12:57 PM Michael McCandless
> <lucene@mikemccandless.com> wrote:
> >
> > I think for PR 420 (https://github.com/apache/lucene/pull/420) we are
> (confusingly!) not really seeing performance benefits -- taxonomy index got
> a bit bigger, and loading the parent arrays no faster? So Patrick closed
> that one.
>
> I'm confused about this (Sorry I am not up to speed), but are we not
> able to offload today's very large arrays to docvalues (e.g. mmap)
> with the change? Wasn't that the original motivation, that the memory
> usage was somewhat trappy? I wouldn't expect to see performance
> benefits over today's on-heap arrays that are read from payloads or
> whatever, instead it would be a memory benefit?
>

Yeah I love that idea, but that's not what Patrick's PR explored (yet?).

His explored switching away from custom token positions to NumericDocValues
to store the same data (ordinal -> parent mapping), but it still loaded all
of those into massive heap-resident int[].

I agree it would be awesome to try avoiding those big int[] and reading
live from NumericDocValues during faceting! It would require some re-work
of the facetting code to e.g. sort the ordinals to (efficiently) visiting
them in forward iterator-friendly order.

But that is a different change and probably we should not hold 9.0 for it?

Mike McCandless

http://blog.mikemccandless.com
Re: Lucene 9.0 release candidate [ In reply to ]
On Mon, Nov 15, 2021 at 2:02 PM Michael McCandless
<lucene@mikemccandless.com> wrote:
>
>
> Yeah I love that idea, but that's not what Patrick's PR explored (yet?).
>
> His explored switching away from custom token positions to NumericDocValues to store the same data (ordinal -> parent mapping), but it still loaded all of those into massive heap-resident int[].
>
> I agree it would be awesome to try avoiding those big int[] and reading live from NumericDocValues during faceting! It would require some re-work of the facetting code to e.g. sort the ordinals to (efficiently) visiting them in forward iterator-friendly order.
>
> But that is a different change and probably we should not hold 9.0 for it?
>

Agreed: I was confused about the scope of the change.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: Lucene 9.0 release candidate [ In reply to ]
Heads up that both LUCENE-10122 and LUCENE-10062 have been merged onto
branch_9_0 now. @Adrien Grand I know you're aware already, but
following up here just for completeness. Thanks!

-Greg

On Mon, Nov 15, 2021 at 11:17 AM Robert Muir <rcmuir@gmail.com> wrote:
>
> On Mon, Nov 15, 2021 at 2:02 PM Michael McCandless
> <lucene@mikemccandless.com> wrote:
> >
> >
> > Yeah I love that idea, but that's not what Patrick's PR explored (yet?).
> >
> > His explored switching away from custom token positions to NumericDocValues to store the same data (ordinal -> parent mapping), but it still loaded all of those into massive heap-resident int[].
> >
> > I agree it would be awesome to try avoiding those big int[] and reading live from NumericDocValues during faceting! It would require some re-work of the facetting code to e.g. sort the ordinals to (efficiently) visiting them in forward iterator-friendly order.
> >
> > But that is a different change and probably we should not hold 9.0 for it?
> >
>
> Agreed: I was confused about the scope of the change.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: Lucene 9.0 release candidate [ In reply to ]
Thanks Greg, Patrick, Mike and Robert for the quick turnaround on
getting these changes merged! I'll now resume work on the 9.0 release.

On Fri, Nov 19, 2021 at 4:44 PM Greg Miller <gsmiller@gmail.com> wrote:
>
> Heads up that both LUCENE-10122 and LUCENE-10062 have been merged onto
> branch_9_0 now. @Adrien Grand I know you're aware already, but
> following up here just for completeness. Thanks!
>
> -Greg
>
> On Mon, Nov 15, 2021 at 11:17 AM Robert Muir <rcmuir@gmail.com> wrote:
> >
> > On Mon, Nov 15, 2021 at 2:02 PM Michael McCandless
> > <lucene@mikemccandless.com> wrote:
> > >
> > >
> > > Yeah I love that idea, but that's not what Patrick's PR explored (yet?).
> > >
> > > His explored switching away from custom token positions to NumericDocValues to store the same data (ordinal -> parent mapping), but it still loaded all of those into massive heap-resident int[].
> > >
> > > I agree it would be awesome to try avoiding those big int[] and reading live from NumericDocValues during faceting! It would require some re-work of the facetting code to e.g. sort the ordinals to (efficiently) visiting them in forward iterator-friendly order.
> > >
> > > But that is a different change and probably we should not hold 9.0 for it?
> > >
> >
> > Agreed: I was confused about the scope of the change.



--
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: Lucene 9.0 release candidate [ In reply to ]
Can we also push commit to branch 9x (just don't want it to get forgotten)

On Fri, Nov 19, 2021 at 10:50 AM Adrien Grand <jpountz@gmail.com> wrote:
>
> Thanks Greg, Patrick, Mike and Robert for the quick turnaround on
> getting these changes merged! I'll now resume work on the 9.0 release.
>
> On Fri, Nov 19, 2021 at 4:44 PM Greg Miller <gsmiller@gmail.com> wrote:
> >
> > Heads up that both LUCENE-10122 and LUCENE-10062 have been merged onto
> > branch_9_0 now. @Adrien Grand I know you're aware already, but
> > following up here just for completeness. Thanks!
> >
> > -Greg
> >
> > On Mon, Nov 15, 2021 at 11:17 AM Robert Muir <rcmuir@gmail.com> wrote:
> > >
> > > On Mon, Nov 15, 2021 at 2:02 PM Michael McCandless
> > > <lucene@mikemccandless.com> wrote:
> > > >
> > > >
> > > > Yeah I love that idea, but that's not what Patrick's PR explored (yet?).
> > > >
> > > > His explored switching away from custom token positions to NumericDocValues to store the same data (ordinal -> parent mapping), but it still loaded all of those into massive heap-resident int[].
> > > >
> > > > I agree it would be awesome to try avoiding those big int[] and reading live from NumericDocValues during faceting! It would require some re-work of the facetting code to e.g. sort the ordinals to (efficiently) visiting them in forward iterator-friendly order.
> > > >
> > > > But that is a different change and probably we should not hold 9.0 for it?
> > > >
> > >
> > > Agreed: I was confused about the scope of the change.
>
>
>
> --
> Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: Lucene 9.0 release candidate [ In reply to ]
Greg already has a PR for it: https://github.com/apache/lucene/pull/458.

On Fri, Nov 19, 2021 at 5:30 PM Robert Muir <rcmuir@gmail.com> wrote:
>
> Can we also push commit to branch 9x (just don't want it to get forgotten)
>
> On Fri, Nov 19, 2021 at 10:50 AM Adrien Grand <jpountz@gmail.com> wrote:
> >
> > Thanks Greg, Patrick, Mike and Robert for the quick turnaround on
> > getting these changes merged! I'll now resume work on the 9.0 release.
> >
> > On Fri, Nov 19, 2021 at 4:44 PM Greg Miller <gsmiller@gmail.com> wrote:
> > >
> > > Heads up that both LUCENE-10122 and LUCENE-10062 have been merged onto
> > > branch_9_0 now. @Adrien Grand I know you're aware already, but
> > > following up here just for completeness. Thanks!
> > >
> > > -Greg
> > >
> > > On Mon, Nov 15, 2021 at 11:17 AM Robert Muir <rcmuir@gmail.com> wrote:
> > > >
> > > > On Mon, Nov 15, 2021 at 2:02 PM Michael McCandless
> > > > <lucene@mikemccandless.com> wrote:
> > > > >
> > > > >
> > > > > Yeah I love that idea, but that's not what Patrick's PR explored (yet?).
> > > > >
> > > > > His explored switching away from custom token positions to NumericDocValues to store the same data (ordinal -> parent mapping), but it still loaded all of those into massive heap-resident int[].
> > > > >
> > > > > I agree it would be awesome to try avoiding those big int[] and reading live from NumericDocValues during faceting! It would require some re-work of the facetting code to e.g. sort the ordinals to (efficiently) visiting them in forward iterator-friendly order.
> > > > >
> > > > > But that is a different change and probably we should not hold 9.0 for it?
> > > > >
> > > >
> > > > Agreed: I was confused about the scope of the change.
> >
> >
> >
> > --
> > Adrien



--
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org