Mailing List Archive

call for 9.4.1 release (bug in vectors format)
Hi everyone,

We recently discovered a severe bug in the 9.4 release in the kNN vectors
format: https://github.com/apache/lucene/issues/11858. Explaining the
problem: when ingesting a lot of data, or when performing a force merge,
segments can grow large. The format validation code accidentally uses an
int instead of a long to compute the data size, so it can fail on these
large segments. When format validation fails, the segment is essentially
lost and unusable. For some client systems like Elasticsearch, it can send
the whole index into a "failed" state, blocking further writes or searches.

I think this bug is sufficiently bad that we should perform a 9.4.1 release
as soon as possible. The fix is just an update to the read-side validation
code, there won't be any effect on the data format. This means it is safe
to merge the fix into the existing 9.4 vectors format. The bug was
introduced during the work to add quantization (
https://github.com/apache/lucene/pull/1054) and does not affect versions
before 9.4.

Let me know what you think! I could serve as release manager. (We
should also follow up with a plan to prevent this from happening in the
future -- maybe we need to regularly run larger-scale benchmarks?)

Julie
Re: call for 9.4.1 release (bug in vectors format) [ In reply to ]
+1 :-)

Thanks

Michael

Am 18.10.22 um 19:52 schrieb Julie Tibshirani:
> Hi everyone,
>
> We recently discovered a severe bug in the 9.4 release in the kNN
> vectors format: https://github.com/apache/lucene/issues/11858.
> Explaining the problem: when ingesting a lot of data, or when
> performing a force merge, segments can grow large. The format
> validation code accidentally uses an int instead of a long to compute
> the data size, so it can fail on these large segments. When format
> validation fails, the segment is essentially lost and unusable. For
> some client systems like Elasticsearch, it can send the whole index
> into a "failed" state, blocking further writes or searches.
>
> I think this bug is sufficiently bad that we should perform a 9.4.1
> release as soon as possible. The fix is just an update to the
> read-side validation code, there won't be any effect on the data
> format. This means it is safe to merge the fix into the existing 9.4
> vectors format. The bug was introduced during the work to add
> quantization (https://github.com/apache/lucene/pull/1054) and does not
> affect versions before 9.4.
>
> Let me know what you think! I could serve as release manager. (We
> should also follow up with a plan to prevent this from happening in
> the future -- maybe we need to regularly run larger-scale benchmarks?)
>
> Julie


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: call for 9.4.1 release (bug in vectors format) [ In reply to ]
+1,
Thanks Julie for tackling this, and serving as a release manager.

On Tue, Oct 18, 2022 at 2:51 PM Michael Wechner <michael.wechner@wyona.com>
wrote:

> +1 :-)
>
> Thanks
>
> Michael
>
> Am 18.10.22 um 19:52 schrieb Julie Tibshirani:
> > Hi everyone,
> >
> > We recently discovered a severe bug in the 9.4 release in the kNN
> > vectors format: https://github.com/apache/lucene/issues/11858.
> > Explaining the problem: when ingesting a lot of data, or when
> > performing a force merge, segments can grow large. The format
> > validation code accidentally uses an int instead of a long to compute
> > the data size, so it can fail on these large segments. When format
> > validation fails, the segment is essentially lost and unusable. For
> > some client systems like Elasticsearch, it can send the whole index
> > into a "failed" state, blocking further writes or searches.
> >
> > I think this bug is sufficiently bad that we should perform a 9.4.1
> > release as soon as possible. The fix is just an update to the
> > read-side validation code, there won't be any effect on the data
> > format. This means it is safe to merge the fix into the existing 9.4
> > vectors format. The bug was introduced during the work to add
> > quantization (https://github.com/apache/lucene/pull/1054) and does not
> > affect versions before 9.4.
> >
> > Let me know what you think! I could serve as release manager. (We
> > should also follow up with a plan to prevent this from happening in
> > the future -- maybe we need to regularly run larger-scale benchmarks?)
> >
> > Julie
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>
Re: call for 9.4.1 release (bug in vectors format) [ In reply to ]
Oh no! Very sorry -- thank you for volunteering to fix (hangs head in
shame). I guess I'll see where the bug is soon ...

On Tue, Oct 18, 2022 at 2:50 PM Michael Wechner
<michael.wechner@wyona.com> wrote:
>
> +1 :-)
>
> Thanks
>
> Michael
>
> Am 18.10.22 um 19:52 schrieb Julie Tibshirani:
> > Hi everyone,
> >
> > We recently discovered a severe bug in the 9.4 release in the kNN
> > vectors format: https://github.com/apache/lucene/issues/11858.
> > Explaining the problem: when ingesting a lot of data, or when
> > performing a force merge, segments can grow large. The format
> > validation code accidentally uses an int instead of a long to compute
> > the data size, so it can fail on these large segments. When format
> > validation fails, the segment is essentially lost and unusable. For
> > some client systems like Elasticsearch, it can send the whole index
> > into a "failed" state, blocking further writes or searches.
> >
> > I think this bug is sufficiently bad that we should perform a 9.4.1
> > release as soon as possible. The fix is just an update to the
> > read-side validation code, there won't be any effect on the data
> > format. This means it is safe to merge the fix into the existing 9.4
> > vectors format. The bug was introduced during the work to add
> > quantization (https://github.com/apache/lucene/pull/1054) and does not
> > affect versions before 9.4.
> >
> > Let me know what you think! I could serve as release manager. (We
> > should also follow up with a plan to prevent this from happening in
> > the future -- maybe we need to regularly run larger-scale benchmarks?)
> >
> > Julie
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: call for 9.4.1 release (bug in vectors format) [ In reply to ]
I've uploaded a fix in https://github.com/apache/lucene/pull/11861 (thanks
Mike for the review!). If there are no objections, I plan to merge it
tomorrow and then get started on a 9.4.1 release candidate.

Julie

On Tue, Oct 18, 2022 at 2:52 PM Michael Sokolov <msokolov@gmail.com> wrote:

> Oh no! Very sorry -- thank you for volunteering to fix (hangs head in
> shame). I guess I'll see where the bug is soon ...
>
> On Tue, Oct 18, 2022 at 2:50 PM Michael Wechner
> <michael.wechner@wyona.com> wrote:
> >
> > +1 :-)
> >
> > Thanks
> >
> > Michael
> >
> > Am 18.10.22 um 19:52 schrieb Julie Tibshirani:
> > > Hi everyone,
> > >
> > > We recently discovered a severe bug in the 9.4 release in the kNN
> > > vectors format: https://github.com/apache/lucene/issues/11858.
> > > Explaining the problem: when ingesting a lot of data, or when
> > > performing a force merge, segments can grow large. The format
> > > validation code accidentally uses an int instead of a long to compute
> > > the data size, so it can fail on these large segments. When format
> > > validation fails, the segment is essentially lost and unusable. For
> > > some client systems like Elasticsearch, it can send the whole index
> > > into a "failed" state, blocking further writes or searches.
> > >
> > > I think this bug is sufficiently bad that we should perform a 9.4.1
> > > release as soon as possible. The fix is just an update to the
> > > read-side validation code, there won't be any effect on the data
> > > format. This means it is safe to merge the fix into the existing 9.4
> > > vectors format. The bug was introduced during the work to add
> > > quantization (https://github.com/apache/lucene/pull/1054) and does not
> > > affect versions before 9.4.
> > >
> > > Let me know what you think! I could serve as release manager. (We
> > > should also follow up with a plan to prevent this from happening in
> > > the future -- maybe we need to regularly run larger-scale benchmarks?)
> > >
> > > Julie
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: dev-help@lucene.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>