Mailing List Archive

MergeTrigger consistency in MergePolicy "find merges"
MergePolicy "find merges" methods take a MergeTrigger as parameter, except
findForcedMerges() and findForcedDeletesMerges().
In my use-case, I could leverage a MergeTrigger in findForcedMerges(),
which can be EXPLICIT or MERGE_FINISHED, to differentiate the merge
selection between the initial explicit call and the subsequent calls
triggered after the first merges.

Should we add a MergeTrigger parameter to all MergePolicy "find merges"
methods for consistency?
If so, is it an internal or public API? (should this change stay in the
main branch only)
Re: MergeTrigger consistency in MergePolicy "find merges" [ In reply to ]
You seem to imply that `forceMerge` runs a cascaded merge where the first
merge creates some new segments that become inputs to a second merge. Have
you considered running a single merge? We had a discussion about cascaded
forced merges and TieredMergePolicy last year and ended up changing
`findForcedMerges` to never run cascaded merges:
https://issues.apache.org/jira/browse/LUCENE-7020.

On Mon, Jun 20, 2022 at 10:31 AM Bruno Roustant <bruno.roustant@gmail.com>
wrote:

> MergePolicy "find merges" methods take a MergeTrigger as parameter, except
> findForcedMerges() and findForcedDeletesMerges().
> In my use-case, I could leverage a MergeTrigger in findForcedMerges(),
> which can be EXPLICIT or MERGE_FINISHED, to differentiate the merge
> selection between the initial explicit call and the subsequent calls
> triggered after the first merges.
>
> Should we add a MergeTrigger parameter to all MergePolicy "find merges"
> methods for consistency?
> If so, is it an internal or public API? (should this change stay in the
> main branch only)
>


--
Adrien
Re: MergeTrigger consistency in MergePolicy "find merges" [ In reply to ]
If I use a simple "AlwaysForceMergePolicy" in a test, I can see that when a
run IndexWriter.forceMerge(), the first call to
AlwaysForceMergePolicy.findForcedMerges() is done for the
MergeTrigger.EXPLICIT. But then, at IndexWriter.merge() line 4531,
MergePolicy.findForcedMerges() is called with MergeTrigger.MERGE_FINISHED
to merge the segments produced by the output of the first explicit forced
merge, and so on. For this degenerated AlwaysForceMergePolicy, the test
runs merges in an infinite loop.

Le lun. 20 juin 2022 à 11:11, Adrien Grand <jpountz@gmail.com> a écrit :

> You seem to imply that `forceMerge` runs a cascaded merge where the first
> merge creates some new segments that become inputs to a second merge. Have
> you considered running a single merge? We had a discussion about cascaded
> forced merges and TieredMergePolicy last year and ended up changing
> `findForcedMerges` to never run cascaded merges:
> https://issues.apache.org/jira/browse/LUCENE-7020.
>
> On Mon, Jun 20, 2022 at 10:31 AM Bruno Roustant <bruno.roustant@gmail.com>
> wrote:
>
>> MergePolicy "find merges" methods take a MergeTrigger as parameter,
>> except findForcedMerges() and findForcedDeletesMerges().
>> In my use-case, I could leverage a MergeTrigger in findForcedMerges(),
>> which can be EXPLICIT or MERGE_FINISHED, to differentiate the merge
>> selection between the initial explicit call and the subsequent calls
>> triggered after the first merges.
>>
>> Should we add a MergeTrigger parameter to all MergePolicy "find merges"
>> methods for consistency?
>> If so, is it an internal or public API? (should this change stay in the
>> main branch only)
>>
>
>
> --
> Adrien
>
Re: MergeTrigger consistency in MergePolicy "find merges" [ In reply to ]
Wouldn't this be a bug in the AlwaysForceMergePolicy, which should return
no merges if there is already a single segment with no deletes?

On Mon, Jun 20, 2022 at 1:30 PM Bruno Roustant <bruno.roustant@gmail.com>
wrote:

> If I use a simple "AlwaysForceMergePolicy" in a test, I can see that when
> a run IndexWriter.forceMerge(), the first call to
> AlwaysForceMergePolicy.findForcedMerges() is done for the
> MergeTrigger.EXPLICIT. But then, at IndexWriter.merge() line 4531,
> MergePolicy.findForcedMerges() is called with MergeTrigger.MERGE_FINISHED
> to merge the segments produced by the output of the first explicit forced
> merge, and so on. For this degenerated AlwaysForceMergePolicy, the test
> runs merges in an infinite loop.
>
> Le lun. 20 juin 2022 à 11:11, Adrien Grand <jpountz@gmail.com> a écrit :
>
>> You seem to imply that `forceMerge` runs a cascaded merge where the first
>> merge creates some new segments that become inputs to a second merge. Have
>> you considered running a single merge? We had a discussion about cascaded
>> forced merges and TieredMergePolicy last year and ended up changing
>> `findForcedMerges` to never run cascaded merges:
>> https://issues.apache.org/jira/browse/LUCENE-7020.
>>
>> On Mon, Jun 20, 2022 at 10:31 AM Bruno Roustant <bruno.roustant@gmail.com>
>> wrote:
>>
>>> MergePolicy "find merges" methods take a MergeTrigger as parameter,
>>> except findForcedMerges() and findForcedDeletesMerges().
>>> In my use-case, I could leverage a MergeTrigger in findForcedMerges(),
>>> which can be EXPLICIT or MERGE_FINISHED, to differentiate the merge
>>> selection between the initial explicit call and the subsequent calls
>>> triggered after the first merges.
>>>
>>> Should we add a MergeTrigger parameter to all MergePolicy "find merges"
>>> methods for consistency?
>>> If so, is it an internal or public API? (should this change stay in the
>>> main branch only)
>>>
>>
>>
>> --
>> Adrien
>>
>

--
Adrien
Re: MergeTrigger consistency in MergePolicy "find merges" [ In reply to ]
I agree this AlwaysForceMergePolicy is not working correctly. It's just a
test I did to easily understand how MergeTrigger.MERGE_FINISHED was working.

Anyway my question is only about the MergeTrigger not present in the call
to findForcedMerges(), to know if it is expected or inconsistent with the
other find merges methods.


Le lun. 20 juin 2022 à 14:26, Adrien Grand <jpountz@gmail.com> a écrit :

> Wouldn't this be a bug in the AlwaysForceMergePolicy, which should return
> no merges if there is already a single segment with no deletes?
>
> On Mon, Jun 20, 2022 at 1:30 PM Bruno Roustant <bruno.roustant@gmail.com>
> wrote:
>
>> If I use a simple "AlwaysForceMergePolicy" in a test, I can see that when
>> a run IndexWriter.forceMerge(), the first call to
>> AlwaysForceMergePolicy.findForcedMerges() is done for the
>> MergeTrigger.EXPLICIT. But then, at IndexWriter.merge() line 4531,
>> MergePolicy.findForcedMerges() is called with MergeTrigger.MERGE_FINISHED
>> to merge the segments produced by the output of the first explicit forced
>> merge, and so on. For this degenerated AlwaysForceMergePolicy, the test
>> runs merges in an infinite loop.
>>
>> Le lun. 20 juin 2022 à 11:11, Adrien Grand <jpountz@gmail.com> a écrit :
>>
>>> You seem to imply that `forceMerge` runs a cascaded merge where the
>>> first merge creates some new segments that become inputs to a second merge.
>>> Have you considered running a single merge? We had a discussion about
>>> cascaded forced merges and TieredMergePolicy last year and ended up
>>> changing `findForcedMerges` to never run cascaded merges:
>>> https://issues.apache.org/jira/browse/LUCENE-7020.
>>>
>>> On Mon, Jun 20, 2022 at 10:31 AM Bruno Roustant <
>>> bruno.roustant@gmail.com> wrote:
>>>
>>>> MergePolicy "find merges" methods take a MergeTrigger as parameter,
>>>> except findForcedMerges() and findForcedDeletesMerges().
>>>> In my use-case, I could leverage a MergeTrigger in findForcedMerges(),
>>>> which can be EXPLICIT or MERGE_FINISHED, to differentiate the merge
>>>> selection between the initial explicit call and the subsequent calls
>>>> triggered after the first merges.
>>>>
>>>> Should we add a MergeTrigger parameter to all MergePolicy "find merges"
>>>> methods for consistency?
>>>> If so, is it an internal or public API? (should this change stay in the
>>>> main branch only)
>>>>
>>>
>>>
>>> --
>>> Adrien
>>>
>>
>
> --
> Adrien
>
Re: MergeTrigger consistency in MergePolicy "find merges" [ In reply to ]
Some comments on JIRA suggest that this is expected, because natural merges
can have a variety of triggers while forced merges are always called by the
app. I guess you could argue that MERGE_FINISHED is a different trigger,
but are there use-cases for doing things differently in findForcedMerges
depending on the merge trigger?

https://issues.apache.org/jira/browse/LUCENE-4472?focusedCommentId=13476920&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-13476920
.

On Mon, Jun 20, 2022 at 3:26 PM Bruno Roustant <bruno.roustant@gmail.com>
wrote:

> I agree this AlwaysForceMergePolicy is not working correctly. It's just a
> test I did to easily understand how MergeTrigger.MERGE_FINISHED was working.
>
> Anyway my question is only about the MergeTrigger not present in the call
> to findForcedMerges(), to know if it is expected or inconsistent with the
> other find merges methods.
>
>
> Le lun. 20 juin 2022 à 14:26, Adrien Grand <jpountz@gmail.com> a écrit :
>
>> Wouldn't this be a bug in the AlwaysForceMergePolicy, which should return
>> no merges if there is already a single segment with no deletes?
>>
>> On Mon, Jun 20, 2022 at 1:30 PM Bruno Roustant <bruno.roustant@gmail.com>
>> wrote:
>>
>>> If I use a simple "AlwaysForceMergePolicy" in a test, I can see that
>>> when a run IndexWriter.forceMerge(), the first call to
>>> AlwaysForceMergePolicy.findForcedMerges() is done for the
>>> MergeTrigger.EXPLICIT. But then, at IndexWriter.merge() line 4531,
>>> MergePolicy.findForcedMerges() is called with MergeTrigger.MERGE_FINISHED
>>> to merge the segments produced by the output of the first explicit forced
>>> merge, and so on. For this degenerated AlwaysForceMergePolicy, the test
>>> runs merges in an infinite loop.
>>>
>>> Le lun. 20 juin 2022 à 11:11, Adrien Grand <jpountz@gmail.com> a écrit :
>>>
>>>> You seem to imply that `forceMerge` runs a cascaded merge where the
>>>> first merge creates some new segments that become inputs to a second merge.
>>>> Have you considered running a single merge? We had a discussion about
>>>> cascaded forced merges and TieredMergePolicy last year and ended up
>>>> changing `findForcedMerges` to never run cascaded merges:
>>>> https://issues.apache.org/jira/browse/LUCENE-7020.
>>>>
>>>> On Mon, Jun 20, 2022 at 10:31 AM Bruno Roustant <
>>>> bruno.roustant@gmail.com> wrote:
>>>>
>>>>> MergePolicy "find merges" methods take a MergeTrigger as parameter,
>>>>> except findForcedMerges() and findForcedDeletesMerges().
>>>>> In my use-case, I could leverage a MergeTrigger in findForcedMerges(),
>>>>> which can be EXPLICIT or MERGE_FINISHED, to differentiate the merge
>>>>> selection between the initial explicit call and the subsequent calls
>>>>> triggered after the first merges.
>>>>>
>>>>> Should we add a MergeTrigger parameter to all MergePolicy "find
>>>>> merges" methods for consistency?
>>>>> If so, is it an internal or public API? (should this change stay in
>>>>> the main branch only)
>>>>>
>>>>
>>>>
>>>> --
>>>> Adrien
>>>>
>>>
>>
>> --
>> Adrien
>>
>

--
Adrien
Re: MergeTrigger consistency in MergePolicy "find merges" [ In reply to ]
Ok, thanks for all the details. I understand MergeTrigger is not present on
purpose.

Le lun. 20 juin 2022 à 16:17, Adrien Grand <jpountz@gmail.com> a écrit :

> Some comments on JIRA suggest that this is expected, because natural
> merges can have a variety of triggers while forced merges are always called
> by the app. I guess you could argue that MERGE_FINISHED is a different
> trigger, but are there use-cases for doing things differently in
> findForcedMerges depending on the merge trigger?
>
>
> https://issues.apache.org/jira/browse/LUCENE-4472?focusedCommentId=13476920&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-13476920
> .
>
> On Mon, Jun 20, 2022 at 3:26 PM Bruno Roustant <bruno.roustant@gmail.com>
> wrote:
>
>> I agree this AlwaysForceMergePolicy is not working correctly. It's just a
>> test I did to easily understand how MergeTrigger.MERGE_FINISHED was working.
>>
>> Anyway my question is only about the MergeTrigger not present in the call
>> to findForcedMerges(), to know if it is expected or inconsistent with the
>> other find merges methods.
>>
>>
>> Le lun. 20 juin 2022 à 14:26, Adrien Grand <jpountz@gmail.com> a écrit :
>>
>>> Wouldn't this be a bug in the AlwaysForceMergePolicy, which should
>>> return no merges if there is already a single segment with no deletes?
>>>
>>> On Mon, Jun 20, 2022 at 1:30 PM Bruno Roustant <bruno.roustant@gmail.com>
>>> wrote:
>>>
>>>> If I use a simple "AlwaysForceMergePolicy" in a test, I can see that
>>>> when a run IndexWriter.forceMerge(), the first call to
>>>> AlwaysForceMergePolicy.findForcedMerges() is done for the
>>>> MergeTrigger.EXPLICIT. But then, at IndexWriter.merge() line 4531,
>>>> MergePolicy.findForcedMerges() is called with MergeTrigger.MERGE_FINISHED
>>>> to merge the segments produced by the output of the first explicit forced
>>>> merge, and so on. For this degenerated AlwaysForceMergePolicy, the test
>>>> runs merges in an infinite loop.
>>>>
>>>> Le lun. 20 juin 2022 à 11:11, Adrien Grand <jpountz@gmail.com> a
>>>> écrit :
>>>>
>>>>> You seem to imply that `forceMerge` runs a cascaded merge where the
>>>>> first merge creates some new segments that become inputs to a second merge.
>>>>> Have you considered running a single merge? We had a discussion about
>>>>> cascaded forced merges and TieredMergePolicy last year and ended up
>>>>> changing `findForcedMerges` to never run cascaded merges:
>>>>> https://issues.apache.org/jira/browse/LUCENE-7020.
>>>>>
>>>>> On Mon, Jun 20, 2022 at 10:31 AM Bruno Roustant <
>>>>> bruno.roustant@gmail.com> wrote:
>>>>>
>>>>>> MergePolicy "find merges" methods take a MergeTrigger as parameter,
>>>>>> except findForcedMerges() and findForcedDeletesMerges().
>>>>>> In my use-case, I could leverage a MergeTrigger in
>>>>>> findForcedMerges(), which can be EXPLICIT or MERGE_FINISHED, to
>>>>>> differentiate the merge selection between the initial explicit call and the
>>>>>> subsequent calls triggered after the first merges.
>>>>>>
>>>>>> Should we add a MergeTrigger parameter to all MergePolicy "find
>>>>>> merges" methods for consistency?
>>>>>> If so, is it an internal or public API? (should this change stay in
>>>>>> the main branch only)
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Adrien
>>>>>
>>>>
>>>
>>> --
>>> Adrien
>>>
>>
>
> --
> Adrien
>