Mailing List Archive

Moving from lucene 6.x to 8.x
Hello,
I am using Apache Solr 7.7.2 with indexes which were originally created on
4.8 and upgraded ever since. I recently tried upgrading to 8.x using the
lucene IndexUpgrader tool and the upgrade fails. I know that lucene 8.x
prevents opening any segment which was touched by <= 6.x at any point in
the past. I also know the general recommendation is to reindex upon
migration to another major release, however it is not always feasible.

So I tried to remove the check for LATEST-1 in SegmentInfos.java (
https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.1/lucene/core/src/java/org/apache/lucene/index/SegmentInfos.java#L321)
and also checked for other references to IndexFormatTooOldException. Turns
out that removing this check and rebuilding lucene-core lets the upgrade go
through fine. I ran a full sequence of index upgrades from 5.x -> 6.x ->
7.x ->8.x. which went through fine. Also search/update operations work
without any issues in 8.x.

I could not find any JIRAs which talk about the technical reason behind
imposing this restriction, and would like to know the nitty-gritties. Also
would like to know about any potential pitfalls that I might be overlooking
with the above hack.

Thanks,
Rahul
Re: Moving from lucene 6.x to 8.x [ In reply to ]
Hello,
Would appreciate any insights on the issue.Are there any backward
incompatible changes in 8.x index because of which the lucene upgrader is
unable to upgrade any index EVER touched by <= 6.x ? Or is the restriction
more of a safety net at this point for possible future incompatibilities ?

Thanks,
Rahul

On Thu, Jan 6, 2022 at 11:46 PM Rahul Goswami <rahul196452@gmail.com> wrote:

> Hello,
> I am using Apache Solr 7.7.2 with indexes which were originally created on
> 4.8 and upgraded ever since. I recently tried upgrading to 8.x using the
> lucene IndexUpgrader tool and the upgrade fails. I know that lucene 8.x
> prevents opening any segment which was touched by <= 6.x at any point in
> the past. I also know the general recommendation is to reindex upon
> migration to another major release, however it is not always feasible.
>
> So I tried to remove the check for LATEST-1 in SegmentInfos.java (
> https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.1/lucene/core/src/java/org/apache/lucene/index/SegmentInfos.java#L321)
> and also checked for other references to IndexFormatTooOldException. Turns
> out that removing this check and rebuilding lucene-core lets the upgrade go
> through fine. I ran a full sequence of index upgrades from 5.x -> 6.x ->
> 7.x ->8.x. which went through fine. Also search/update operations work
> without any issues in 8.x.
>
> I could not find any JIRAs which talk about the technical reason behind
> imposing this restriction, and would like to know the nitty-gritties. Also
> would like to know about any potential pitfalls that I might be overlooking
> with the above hack.
>
> Thanks,
> Rahul
>
>
Re: Moving from lucene 6.x to 8.x [ In reply to ]
Hi Rahul,

I am not an expert so someone else might provide a better answer. However,
I remember
@Erick briefly talked about this restriction in one of his talks here:-
https://www.youtube.com/watch?v=eaQBH_H3d3g&t=621s (not sure if you have
seen it already).

As he explains, earlier it looked like IndexUpgrader tool was doing the job
perfectly but it wasn't always the case. There is no guarantee that after
using the IndexUpgrader tool, your 8.x index will keep all of the
characteristics of lucene 8. There can be some situations (e.g. incorrect
offset) where you might get an incorrect relevance score which might be
difficult to trace and debug. So, Lucene developers now made it explicit
that what people were doing earlier was not ideal, and they should now plan
to reindex all the documents during the major upgrade.

Having said that, what you have done can just work without any issue as
long as you don't encounter any odd sorting behavior. This may/may not be
super critical depending on the business use case and that is where you
might need to make a decision.

Thanks,
Vinay

On Sat, Jan 8, 2022 at 10:27 PM Rahul Goswami <rahul196452@gmail.com> wrote:

> Hello,
> Would appreciate any insights on the issue.Are there any backward
> incompatible changes in 8.x index because of which the lucene upgrader is
> unable to upgrade any index EVER touched by <= 6.x ? Or is the restriction
> more of a safety net at this point for possible future incompatibilities ?
>
> Thanks,
> Rahul
>
> On Thu, Jan 6, 2022 at 11:46 PM Rahul Goswami <rahul196452@gmail.com>
> wrote:
>
> > Hello,
> > I am using Apache Solr 7.7.2 with indexes which were originally created
> on
> > 4.8 and upgraded ever since. I recently tried upgrading to 8.x using the
> > lucene IndexUpgrader tool and the upgrade fails. I know that lucene 8.x
> > prevents opening any segment which was touched by <= 6.x at any point in
> > the past. I also know the general recommendation is to reindex upon
> > migration to another major release, however it is not always feasible.
> >
> > So I tried to remove the check for LATEST-1 in SegmentInfos.java (
> >
> https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.1/lucene/core/src/java/org/apache/lucene/index/SegmentInfos.java#L321
> )
> > and also checked for other references to IndexFormatTooOldException.
> Turns
> > out that removing this check and rebuilding lucene-core lets the upgrade
> go
> > through fine. I ran a full sequence of index upgrades from 5.x -> 6.x ->
> > 7.x ->8.x. which went through fine. Also search/update operations work
> > without any issues in 8.x.
> >
> > I could not find any JIRAs which talk about the technical reason behind
> > imposing this restriction, and would like to know the nitty-gritties.
> Also
> > would like to know about any potential pitfalls that I might be
> overlooking
> > with the above hack.
> >
> > Thanks,
> > Rahul
> >
> >
>
Re: Moving from lucene 6.x to 8.x [ In reply to ]
Thanks Vinay for the link to Erick's talk! I hadn't seen it and I must
admit it did help put a few things into perspective.

I was able to track down the JIRAs (thank you 'git blame')
surrounding/leading up to this architectural decision and the linked
patches:
https://issues.apache.org/jira/browse/LUCENE-7703 (Record the version that
was used at index creation time)
https://issues.apache.org/jira/browse/LUCENE-7730 (Better encode length
normalization in similarities)
https://issues.apache.org/jira/browse/LUCENE-7837 (Use
indexCreatedVersionMajor to fail opening too old indices)

From these JIRAs what I was able to piece together is that if not
reindexed, relevance scoring might act in unpredictable ways. For my use
case, I can live with that since we provide an explicit sort on one or more
fields.

In LUCENE-7703, Adrien says "we will reject broken offsets in term vectors
as of 7.0". So my questions to the community are
i) What are these offsets, and what feature/s might break with respect to
these offsets if not reindexed?
ii) Do the length normalization changes in LUCENE-7730 affect only
relevance scores?

I understand I could be playing with fire here, but reindexing is not a
practical solution for my situation. At least not in the near future until
I figure out a more seamless way of reindexing with minimal downtime given
that there are multiple 1TB+ indexes. Would appreciate inputs from the dev
community on this.

Thanks,
Rahul

On Sun, Jan 9, 2022 at 2:41 PM Vinay Rajput <vinayrajput4606@gmail.com>
wrote:

> Hi Rahul,
>
> I am not an expert so someone else might provide a better answer. However,
> I remember
> @Erick briefly talked about this restriction in one of his talks here:-
> https://www.youtube.com/watch?v=eaQBH_H3d3g&t=621s (not sure if you have
> seen it already).
>
> As he explains, earlier it looked like IndexUpgrader tool was doing the job
> perfectly but it wasn't always the case. There is no guarantee that after
> using the IndexUpgrader tool, your 8.x index will keep all of the
> characteristics of lucene 8. There can be some situations (e.g. incorrect
> offset) where you might get an incorrect relevance score which might be
> difficult to trace and debug. So, Lucene developers now made it explicit
> that what people were doing earlier was not ideal, and they should now plan
> to reindex all the documents during the major upgrade.
>
> Having said that, what you have done can just work without any issue as
> long as you don't encounter any odd sorting behavior. This may/may not be
> super critical depending on the business use case and that is where you
> might need to make a decision.
>
> Thanks,
> Vinay
>
> On Sat, Jan 8, 2022 at 10:27 PM Rahul Goswami <rahul196452@gmail.com>
> wrote:
>
> > Hello,
> > Would appreciate any insights on the issue.Are there any backward
> > incompatible changes in 8.x index because of which the lucene upgrader is
> > unable to upgrade any index EVER touched by <= 6.x ? Or is the
> restriction
> > more of a safety net at this point for possible future incompatibilities
> ?
> >
> > Thanks,
> > Rahul
> >
> > On Thu, Jan 6, 2022 at 11:46 PM Rahul Goswami <rahul196452@gmail.com>
> > wrote:
> >
> > > Hello,
> > > I am using Apache Solr 7.7.2 with indexes which were originally created
> > on
> > > 4.8 and upgraded ever since. I recently tried upgrading to 8.x using
> the
> > > lucene IndexUpgrader tool and the upgrade fails. I know that lucene 8.x
> > > prevents opening any segment which was touched by <= 6.x at any point
> in
> > > the past. I also know the general recommendation is to reindex upon
> > > migration to another major release, however it is not always feasible.
> > >
> > > So I tried to remove the check for LATEST-1 in SegmentInfos.java (
> > >
> >
> https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.1/lucene/core/src/java/org/apache/lucene/index/SegmentInfos.java#L321
> > )
> > > and also checked for other references to IndexFormatTooOldException.
> > Turns
> > > out that removing this check and rebuilding lucene-core lets the
> upgrade
> > go
> > > through fine. I ran a full sequence of index upgrades from 5.x -> 6.x
> ->
> > > 7.x ->8.x. which went through fine. Also search/update operations work
> > > without any issues in 8.x.
> > >
> > > I could not find any JIRAs which talk about the technical reason behind
> > > imposing this restriction, and would like to know the nitty-gritties.
> > Also
> > > would like to know about any potential pitfalls that I might be
> > overlooking
> > > with the above hack.
> > >
> > > Thanks,
> > > Rahul
> > >
> > >
> >
>
Re: Moving from lucene 6.x to 8.x [ In reply to ]
I think the "broken offsets" refers to offsets of tokens "going
backwards". Offsets are attributes of tokens that refer back to their
byte position in the original indexed text. Going backwards means -- a
token with a greater position (in the sequence of tokens, or token
graph) should not have a lesser (or maybe it must be strictly
increasing I forget) offset. If you use term vectors, and have these
broken offsets, which should not but do often occur with custom
analysis chains, this could be a problem.

On Wed, Jan 12, 2022 at 12:36 AM Rahul Goswami <rahul196452@gmail.com> wrote:
>
> Thanks Vinay for the link to Erick's talk! I hadn't seen it and I must
> admit it did help put a few things into perspective.
>
> I was able to track down the JIRAs (thank you 'git blame')
> surrounding/leading up to this architectural decision and the linked
> patches:
> https://issues.apache.org/jira/browse/LUCENE-7703 (Record the version that
> was used at index creation time)
> https://issues.apache.org/jira/browse/LUCENE-7730 (Better encode length
> normalization in similarities)
> https://issues.apache.org/jira/browse/LUCENE-7837 (Use
> indexCreatedVersionMajor to fail opening too old indices)
>
> From these JIRAs what I was able to piece together is that if not
> reindexed, relevance scoring might act in unpredictable ways. For my use
> case, I can live with that since we provide an explicit sort on one or more
> fields.
>
> In LUCENE-7703, Adrien says "we will reject broken offsets in term vectors
> as of 7.0". So my questions to the community are
> i) What are these offsets, and what feature/s might break with respect to
> these offsets if not reindexed?
> ii) Do the length normalization changes in LUCENE-7730 affect only
> relevance scores?
>
> I understand I could be playing with fire here, but reindexing is not a
> practical solution for my situation. At least not in the near future until
> I figure out a more seamless way of reindexing with minimal downtime given
> that there are multiple 1TB+ indexes. Would appreciate inputs from the dev
> community on this.
>
> Thanks,
> Rahul
>
> On Sun, Jan 9, 2022 at 2:41 PM Vinay Rajput <vinayrajput4606@gmail.com>
> wrote:
>
> > Hi Rahul,
> >
> > I am not an expert so someone else might provide a better answer. However,
> > I remember
> > @Erick briefly talked about this restriction in one of his talks here:-
> > https://www.youtube.com/watch?v=eaQBH_H3d3g&t=621s (not sure if you have
> > seen it already).
> >
> > As he explains, earlier it looked like IndexUpgrader tool was doing the job
> > perfectly but it wasn't always the case. There is no guarantee that after
> > using the IndexUpgrader tool, your 8.x index will keep all of the
> > characteristics of lucene 8. There can be some situations (e.g. incorrect
> > offset) where you might get an incorrect relevance score which might be
> > difficult to trace and debug. So, Lucene developers now made it explicit
> > that what people were doing earlier was not ideal, and they should now plan
> > to reindex all the documents during the major upgrade.
> >
> > Having said that, what you have done can just work without any issue as
> > long as you don't encounter any odd sorting behavior. This may/may not be
> > super critical depending on the business use case and that is where you
> > might need to make a decision.
> >
> > Thanks,
> > Vinay
> >
> > On Sat, Jan 8, 2022 at 10:27 PM Rahul Goswami <rahul196452@gmail.com>
> > wrote:
> >
> > > Hello,
> > > Would appreciate any insights on the issue.Are there any backward
> > > incompatible changes in 8.x index because of which the lucene upgrader is
> > > unable to upgrade any index EVER touched by <= 6.x ? Or is the
> > restriction
> > > more of a safety net at this point for possible future incompatibilities
> > ?
> > >
> > > Thanks,
> > > Rahul
> > >
> > > On Thu, Jan 6, 2022 at 11:46 PM Rahul Goswami <rahul196452@gmail.com>
> > > wrote:
> > >
> > > > Hello,
> > > > I am using Apache Solr 7.7.2 with indexes which were originally created
> > > on
> > > > 4.8 and upgraded ever since. I recently tried upgrading to 8.x using
> > the
> > > > lucene IndexUpgrader tool and the upgrade fails. I know that lucene 8.x
> > > > prevents opening any segment which was touched by <= 6.x at any point
> > in
> > > > the past. I also know the general recommendation is to reindex upon
> > > > migration to another major release, however it is not always feasible.
> > > >
> > > > So I tried to remove the check for LATEST-1 in SegmentInfos.java (
> > > >
> > >
> > https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.1/lucene/core/src/java/org/apache/lucene/index/SegmentInfos.java#L321
> > > )
> > > > and also checked for other references to IndexFormatTooOldException.
> > > Turns
> > > > out that removing this check and rebuilding lucene-core lets the
> > upgrade
> > > go
> > > > through fine. I ran a full sequence of index upgrades from 5.x -> 6.x
> > ->
> > > > 7.x ->8.x. which went through fine. Also search/update operations work
> > > > without any issues in 8.x.
> > > >
> > > > I could not find any JIRAs which talk about the technical reason behind
> > > > imposing this restriction, and would like to know the nitty-gritties.
> > > Also
> > > > would like to know about any potential pitfalls that I might be
> > > overlooking
> > > > with the above hack.
> > > >
> > > > Thanks,
> > > > Rahul
> > > >
> > > >
> > >
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Moving from lucene 6.x to 8.x [ In reply to ]
Thanks for the explanation Michael. I read more about term vectors and your
explanation in combination helps put things into perspective. .

On Thu, Jan 13, 2022 at 8:53 AM Michael Sokolov <msokolov@gmail.com> wrote:

> I think the "broken offsets" refers to offsets of tokens "going
> backwards". Offsets are attributes of tokens that refer back to their
> byte position in the original indexed text. Going backwards means -- a
> token with a greater position (in the sequence of tokens, or token
> graph) should not have a lesser (or maybe it must be strictly
> increasing I forget) offset. If you use term vectors, and have these
> broken offsets, which should not but do often occur with custom
> analysis chains, this could be a problem.
>
> On Wed, Jan 12, 2022 at 12:36 AM Rahul Goswami <rahul196452@gmail.com>
> wrote:
> >
> > Thanks Vinay for the link to Erick's talk! I hadn't seen it and I must
> > admit it did help put a few things into perspective.
> >
> > I was able to track down the JIRAs (thank you 'git blame')
> > surrounding/leading up to this architectural decision and the linked
> > patches:
> > https://issues.apache.org/jira/browse/LUCENE-7703 (Record the version
> that
> > was used at index creation time)
> > https://issues.apache.org/jira/browse/LUCENE-7730 (Better encode length
> > normalization in similarities)
> > https://issues.apache.org/jira/browse/LUCENE-7837 (Use
> > indexCreatedVersionMajor to fail opening too old indices)
> >
> > From these JIRAs what I was able to piece together is that if not
> > reindexed, relevance scoring might act in unpredictable ways. For my use
> > case, I can live with that since we provide an explicit sort on one or
> more
> > fields.
> >
> > In LUCENE-7703, Adrien says "we will reject broken offsets in term
> vectors
> > as of 7.0". So my questions to the community are
> > i) What are these offsets, and what feature/s might break with respect to
> > these offsets if not reindexed?
> > ii) Do the length normalization changes in LUCENE-7730 affect only
> > relevance scores?
> >
> > I understand I could be playing with fire here, but reindexing is not a
> > practical solution for my situation. At least not in the near future
> until
> > I figure out a more seamless way of reindexing with minimal downtime
> given
> > that there are multiple 1TB+ indexes. Would appreciate inputs from the
> dev
> > community on this.
> >
> > Thanks,
> > Rahul
> >
> > On Sun, Jan 9, 2022 at 2:41 PM Vinay Rajput <vinayrajput4606@gmail.com>
> > wrote:
> >
> > > Hi Rahul,
> > >
> > > I am not an expert so someone else might provide a better answer.
> However,
> > > I remember
> > > @Erick briefly talked about this restriction in one of his talks here:-
> > > https://www.youtube.com/watch?v=eaQBH_H3d3g&t=621s (not sure if you
> have
> > > seen it already).
> > >
> > > As he explains, earlier it looked like IndexUpgrader tool was doing
> the job
> > > perfectly but it wasn't always the case. There is no guarantee that
> after
> > > using the IndexUpgrader tool, your 8.x index will keep all of the
> > > characteristics of lucene 8. There can be some situations (e.g.
> incorrect
> > > offset) where you might get an incorrect relevance score which might be
> > > difficult to trace and debug. So, Lucene developers now made it
> explicit
> > > that what people were doing earlier was not ideal, and they should now
> plan
> > > to reindex all the documents during the major upgrade.
> > >
> > > Having said that, what you have done can just work without any issue as
> > > long as you don't encounter any odd sorting behavior. This may/may not
> be
> > > super critical depending on the business use case and that is where you
> > > might need to make a decision.
> > >
> > > Thanks,
> > > Vinay
> > >
> > > On Sat, Jan 8, 2022 at 10:27 PM Rahul Goswami <rahul196452@gmail.com>
> > > wrote:
> > >
> > > > Hello,
> > > > Would appreciate any insights on the issue.Are there any backward
> > > > incompatible changes in 8.x index because of which the lucene
> upgrader is
> > > > unable to upgrade any index EVER touched by <= 6.x ? Or is the
> > > restriction
> > > > more of a safety net at this point for possible future
> incompatibilities
> > > ?
> > > >
> > > > Thanks,
> > > > Rahul
> > > >
> > > > On Thu, Jan 6, 2022 at 11:46 PM Rahul Goswami <rahul196452@gmail.com
> >
> > > > wrote:
> > > >
> > > > > Hello,
> > > > > I am using Apache Solr 7.7.2 with indexes which were originally
> created
> > > > on
> > > > > 4.8 and upgraded ever since. I recently tried upgrading to 8.x
> using
> > > the
> > > > > lucene IndexUpgrader tool and the upgrade fails. I know that
> lucene 8.x
> > > > > prevents opening any segment which was touched by <= 6.x at any
> point
> > > in
> > > > > the past. I also know the general recommendation is to reindex upon
> > > > > migration to another major release, however it is not always
> feasible.
> > > > >
> > > > > So I tried to remove the check for LATEST-1 in SegmentInfos.java (
> > > > >
> > > >
> > >
> https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.1/lucene/core/src/java/org/apache/lucene/index/SegmentInfos.java#L321
> > > > )
> > > > > and also checked for other references to
> IndexFormatTooOldException.
> > > > Turns
> > > > > out that removing this check and rebuilding lucene-core lets the
> > > upgrade
> > > > go
> > > > > through fine. I ran a full sequence of index upgrades from 5.x ->
> 6.x
> > > ->
> > > > > 7.x ->8.x. which went through fine. Also search/update operations
> work
> > > > > without any issues in 8.x.
> > > > >
> > > > > I could not find any JIRAs which talk about the technical reason
> behind
> > > > > imposing this restriction, and would like to know the
> nitty-gritties.
> > > > Also
> > > > > would like to know about any potential pitfalls that I might be
> > > > overlooking
> > > > > with the above hack.
> > > > >
> > > > > Thanks,
> > > > > Rahul
> > > > >
> > > > >
> > > >
> > >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
RE: Moving from lucene 6.x to 8.x [ In reply to ]
Hi, one thing that always works to "forcefully" upgrade without reindexing. You just merge the old index into a completely new index not by coping files, but by sending their SegmentReaders to addIndex, stripping all metadata from them with some trick: https://lucene.apache.org/core/8_11_0/core/org/apache/lucene/index/SlowCodecReaderWrapper.html in combination with <https://lucene.apache.org/core/8_11_0/core/org/apache/lucene/index/IndexWriter.html#addIndexes-org.apache.lucene.index.CodecReader...->

One way to do this is the following:
- Open old index using DirectoryReader.open(): reader = DirectoryReader.open(...old directory...)
- Create a new Index with IndexWriter writer: writer = new IndedxWriter(...new directory...)
- Call writer.addIndexes(reader.leaves().stream().map(IndexReaderContext::reader).map(SlowCodecReaderWrapper::wrap).toArray(CodecReader[]::new));

This will add all segments from the old index logically (not reading plain files but using the logical layers on top) and add them to the current index as one large segment. If you want to keep the segment structure, then iterate over the leaves and call addIndexes() for each one separately.

This may be a bit slower as the whole index needs to be processed, but it is still faster than reindexing. If you have incorrect offsets, the process will fail, so there's no risk.

Uwe

-----
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Rahul Goswami <rahul196452@gmail.com>
> Sent: Wednesday, January 12, 2022 6:36 AM
> To: java-user@lucene.apache.org
> Subject: Re: Moving from lucene 6.x to 8.x
>
> Thanks Vinay for the link to Erick's talk! I hadn't seen it and I must
> admit it did help put a few things into perspective.
>
> I was able to track down the JIRAs (thank you 'git blame')
> surrounding/leading up to this architectural decision and the linked
> patches:
> https://issues.apache.org/jira/browse/LUCENE-7703 (Record the version that
> was used at index creation time)
> https://issues.apache.org/jira/browse/LUCENE-7730 (Better encode length
> normalization in similarities)
> https://issues.apache.org/jira/browse/LUCENE-7837 (Use
> indexCreatedVersionMajor to fail opening too old indices)
>
> From these JIRAs what I was able to piece together is that if not
> reindexed, relevance scoring might act in unpredictable ways. For my use
> case, I can live with that since we provide an explicit sort on one or more
> fields.
>
> In LUCENE-7703, Adrien says "we will reject broken offsets in term vectors
> as of 7.0". So my questions to the community are
> i) What are these offsets, and what feature/s might break with respect to
> these offsets if not reindexed?
> ii) Do the length normalization changes in LUCENE-7730 affect only
> relevance scores?
>
> I understand I could be playing with fire here, but reindexing is not a
> practical solution for my situation. At least not in the near future until
> I figure out a more seamless way of reindexing with minimal downtime given
> that there are multiple 1TB+ indexes. Would appreciate inputs from the dev
> community on this.
>
> Thanks,
> Rahul
>
> On Sun, Jan 9, 2022 at 2:41 PM Vinay Rajput <vinayrajput4606@gmail.com>
> wrote:
>
> > Hi Rahul,
> >
> > I am not an expert so someone else might provide a better answer. However,
> > I remember
> > @Erick briefly talked about this restriction in one of his talks here:-
> > https://www.youtube.com/watch?v=eaQBH_H3d3g&t=621s (not sure if you
> have
> > seen it already).
> >
> > As he explains, earlier it looked like IndexUpgrader tool was doing the job
> > perfectly but it wasn't always the case. There is no guarantee that after
> > using the IndexUpgrader tool, your 8.x index will keep all of the
> > characteristics of lucene 8. There can be some situations (e.g. incorrect
> > offset) where you might get an incorrect relevance score which might be
> > difficult to trace and debug. So, Lucene developers now made it explicit
> > that what people were doing earlier was not ideal, and they should now plan
> > to reindex all the documents during the major upgrade.
> >
> > Having said that, what you have done can just work without any issue as
> > long as you don't encounter any odd sorting behavior. This may/may not be
> > super critical depending on the business use case and that is where you
> > might need to make a decision.
> >
> > Thanks,
> > Vinay
> >
> > On Sat, Jan 8, 2022 at 10:27 PM Rahul Goswami <rahul196452@gmail.com>
> > wrote:
> >
> > > Hello,
> > > Would appreciate any insights on the issue.Are there any backward
> > > incompatible changes in 8.x index because of which the lucene upgrader is
> > > unable to upgrade any index EVER touched by <= 6.x ? Or is the
> > restriction
> > > more of a safety net at this point for possible future incompatibilities
> > ?
> > >
> > > Thanks,
> > > Rahul
> > >
> > > On Thu, Jan 6, 2022 at 11:46 PM Rahul Goswami
> <rahul196452@gmail.com>
> > > wrote:
> > >
> > > > Hello,
> > > > I am using Apache Solr 7.7.2 with indexes which were originally created
> > > on
> > > > 4.8 and upgraded ever since. I recently tried upgrading to 8.x using
> > the
> > > > lucene IndexUpgrader tool and the upgrade fails. I know that lucene 8.x
> > > > prevents opening any segment which was touched by <= 6.x at any point
> > in
> > > > the past. I also know the general recommendation is to reindex upon
> > > > migration to another major release, however it is not always feasible.
> > > >
> > > > So I tried to remove the check for LATEST-1 in SegmentInfos.java (
> > > >
> > >
> > https://github.com/apache/lucene-solr/blob/releases/lucene-
> solr/8.11.1/lucene/core/src/java/org/apache/lucene/index/SegmentInfos.java#
> L321
> > > )
> > > > and also checked for other references to IndexFormatTooOldException.
> > > Turns
> > > > out that removing this check and rebuilding lucene-core lets the
> > upgrade
> > > go
> > > > through fine. I ran a full sequence of index upgrades from 5.x -> 6.x
> > ->
> > > > 7.x ->8.x. which went through fine. Also search/update operations work
> > > > without any issues in 8.x.
> > > >
> > > > I could not find any JIRAs which talk about the technical reason behind
> > > > imposing this restriction, and would like to know the nitty-gritties.
> > > Also
> > > > would like to know about any potential pitfalls that I might be
> > > overlooking
> > > > with the above hack.
> > > >
> > > > Thanks,
> > > > Rahul
> > > >
> > > >
> > >
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
RE: Moving from lucene 6.x to 8.x [ In reply to ]
By the way
> Hi, one thing that always works to "forcefully" upgrade without reindexing. You
> just merge the old index into a completely new index not by coping files, but by
> sending their SegmentReaders to addIndex, stripping all metadata from them
> with some trick:
> https://lucene.apache.org/core/8_11_0/core/org/apache/lucene/index/SlowCo
> decReaderWrapper.html in combination with
> <https://lucene.apache.org/core/8_11_0/core/org/apache/lucene/index/Index
> Writer.html#addIndexes-org.apache.lucene.index.CodecReader...->
>
> One way to do this is the following:
> - Open old index using DirectoryReader.open(): reader =
> DirectoryReader.open(...old directory...)
> - Create a new Index with IndexWriter writer: writer = new IndedxWriter(...new
> directory...)
> - Call
> writer.addIndexes(reader.leaves().stream().map(IndexReaderContext::reader).
> map(SlowCodecReaderWrapper::wrap).toArray(CodecReader[]::new));

This trick also works if you want to transform indexes. I wrote some code that on the-fly rewrites old NumericField to PointField. The trick is to add another FilterLeafReader (before wrapping with SlowCodecReaderWrapper), that detects legacy numeric fields, removes them fromm metadata and feeds them as new stream of flat BKD points enumerated by the TermsEnum (which works because order is same and hierarchy is generated by the receiving IndexWriter) to a new field with PointField metadata. This is a bit hacky but works great.

Uwe


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Moving from lucene 6.x to 8.x [ In reply to ]
Uwe,
This is beautiful! Especially for conversion from Trie to Point fields is
going to be extremely handy. I am going to have to check this out further.
Thank you for the tip!

Rahul

On Mon, Jan 17, 2022 at 10:23 AM Uwe Schindler <uwe@thetaphi.de> wrote:

> By the way
> > Hi, one thing that always works to "forcefully" upgrade without
> reindexing. You
> > just merge the old index into a completely new index not by coping
> files, but by
> > sending their SegmentReaders to addIndex, stripping all metadata from
> them
> > with some trick:
> >
> https://lucene.apache.org/core/8_11_0/core/org/apache/lucene/index/SlowCo
> > decReaderWrapper.html in combination with
> > <
> https://lucene.apache.org/core/8_11_0/core/org/apache/lucene/index/Index
> > Writer.html#addIndexes-org.apache.lucene.index.CodecReader...->
> >
> > One way to do this is the following:
> > - Open old index using DirectoryReader.open(): reader =
> > DirectoryReader.open(...old directory...)
> > - Create a new Index with IndexWriter writer: writer = new
> IndedxWriter(...new
> > directory...)
> > - Call
> >
> writer.addIndexes(reader.leaves().stream().map(IndexReaderContext::reader).
> > map(SlowCodecReaderWrapper::wrap).toArray(CodecReader[]::new));
>
> This trick also works if you want to transform indexes. I wrote some code
> that on the-fly rewrites old NumericField to PointField. The trick is to
> add another FilterLeafReader (before wrapping with SlowCodecReaderWrapper),
> that detects legacy numeric fields, removes them fromm metadata and feeds
> them as new stream of flat BKD points enumerated by the TermsEnum (which
> works because order is same and hierarchy is generated by the receiving
> IndexWriter) to a new field with PointField metadata. This is a bit hacky
> but works great.
>
> Uwe
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>