Thanks Patrick for the help!
May I know what lucene version you're using?
>
We are using an older version of lucene as of now (4.7.x) and I believe the
FilterCodecReader of current version is akin to FilterAtomicReader & should
do the job for us!
If it is not available, I'm not sure whether the merge will happen via merge
> policy, maybe you could check the source code and see?
>
Checked & AFAIK, our old version isn't supporting it. But I guess it should
be fine to wrap a SortingAtomicReader and pass it to the API. Guess, it can
be done!
But I think the current default directory implementation is MMapDirectory,
> which delegate the caching to the system and should have
> already optimized this situation
>
We do use the default MMap-dir but I was actually thinking about
unpacking/walking Term-Dict data (FST) repeatedly from various
threads, even if via MMap. Are there optimizations here (caching unpacked
blocks etc..) that we could tap into?
--
Ravi
On Mon, May 24, 2021 at 11:09 PM Patrick Zhai <zhai7631@gmail.com> wrote:
> Hi Ravi,
>
> 1. May I know what lucene version you're using? As far as I know the
> SortingMergePolicy has been deprecated and replaced by
> IndexWriterConfig.setIndexSort in newer lucene version. So if the
> "setIndexSort" is available I would suggest using that to achieve the
> sorted index (as you might have already figured out, the IndexRearranger
> let you pass in an IndexWriterConfig so that you could set it there). If it
> is not available, I'm not sure whether the merge will happen via merge
> policy, maybe you could check the source code and see?
> 2. Yeah it's a good observation, we're doing multiple passes over one
> segment! But I think the current default directory implementation is
> MMapDirectory, which delegate the caching to the system and should have
> already optimized this situation. Here's a great blog explaining the
> MMapDirectory in lucene:
> https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
>
> Best
> Patrick
>
> Ravikumar Govindarajan <ravikumar.govindarajan@gmail.com> ?2021?5?24???
> ??9:54???
>
> > Thanks Michael!
> >
> > This was just what I was looking for!!. Just a couple of questions.
> >
> >
> > - When we call addIndexes(IndexReader...), does the merge happen via
> > MergePolicy? We use a SortingMergePolicy and would like to maintain
> the
> > sort-order in newly created segments too
> > - Concurrency is a cool-trick here. But if I understand the patch
> > correctly, don't we end-up doing multiple passes over the Term Dict,
> one
> > for each Selector? Loading it fully in memory could help here,
> possibly?
> >
> > --
> > Ravi
> >
> > On Mon, May 24, 2021 at 7:37 PM Michael McCandless <
> > lucene@mikemccandless.com> wrote:
> >
> > > Are you trying to rewrite your already created index into a different
> > > segment geometry?
> > >
> > > Maybe have a look at the new IndexRearranger tool
> > > <https://issues.apache.org/jira/browse/LUCENE-9694>? It is already
> > doing
> > > something like what you enumerated below, including mocking LiveDocs to
> > get
> > > the right documents into the right segments.
> > >
> > > Mike McCandless
> > >
> > > http://blog.mikemccandless.com
> > >
> > >
> > > On Sat, May 22, 2021 at 3:50 PM Ravikumar Govindarajan <
> > > ravikumar.govindarajan@gmail.com> wrote:
> > >
> > >> Hello,
> > >>
> > >> We have a use-case for index-rewrite on a "frozen index" where no new
> > >> documents are added. It goes like this..
> > >>
> > >> 1. Get all segments for the index (base-segment-list)
> > >> 2. Create a new segment from base-segment-list with unique set of
> > docs
> > >> (LiveDocs)
> > >> 3. Repeat step 2, for a fixed count. Like say 5 or 10 times
> > >>
> > >> Is something like this achievable via Merge Policy? We can disable
> > commits
> > >> too, till the full run is completed.
> > >>
> > >> Any help is appreciated
> > >>
> > >> Regards,
> > >> Ravi
> > >>
> > >
> >
>