Mailing List Archive

Accessibility of SegmentInfo::setDiagnostics
In an effort to prepare Elasticsearch for modularization, we are
investigating and eliminating split packages. The situation has improved
through recent refactoring in Lucene 9.0 [1], but a number of split
packages still remain. This message identifies one such so that it can
be discussed in isolation, with a view to a potential solution either in
Lucene or possibly within Elasticsearch itself.

Elasticsearch has a custom merge policy, ShuffleForcedMergePolicy, that
interleaves eldest and newest segments. SFMP has but a single
package-private dependency on lucene, namely `SegmentInfo::setDiagnostics`.
The `setDiagnostics` method is used by SFMP to add a single additional
ES-specific diagnostic key.

It is possible to recreate the whole SegmentCommitInfo (and the SI
within, along with the additional ES-specific diagnostic key), by using
just the accessible parts of the lucene API. This has been prototyped[2],
but shows that something is not quite right [3]. Resolving this issue is
likely better done in lucene.

The comment in `MergePolicy::setMergeInfo` - "Sets the SegmentCommitInfo
of the merged segment. Allows sub-classes to e.g. set diagnostics
properties"[4] - is telling. To better support the use-cases referred to
in this comment and to allow for other-package subclasses (which seems
completely reasonable given other parts of the API), it would be better
for lucene's SI::setDiagnostics method to be public (rather than
package-private). Such a change is a general improvement in the lucene
API, which can be leveraged by any custom merge policy.

-Chris.

[1] https://issues.apache.org/jira/browse/LUCENE-9319
[2] https://github.com/elastic/elasticsearch/pull/78253
[3] https://github.com/elastic/elasticsearch/pull/78253#issuecomment-925861893
[4] https://github.com/apache/lucene/blob/d5d6dc079395c47cd6d12dcce3bcfdd2c7d9dc63/lucene/core/src/java/org/apache/lucene/index/MergePolicy.java#L271
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: Accessibility of SegmentInfo::setDiagnostics [ In reply to ]
I'd +1 a change that makes setDiagnostics public. Longer term I wonder if
we should have a more locked down API that _only_ allows setting
diagnostics. There are lots of things in SegmentCommitInfo that merges
should never override like the segment ID, and I can't think of anything
else than diagnostics that a merge policy should ever need to set.

On Fri, Sep 24, 2021 at 10:57 AM Chris Hegarty
<christopher.hegarty@elastic.co.invalid> wrote:

> In an effort to prepare Elasticsearch for modularization, we are
> investigating and eliminating split packages. The situation has improved
> through recent refactoring in Lucene 9.0 [1], but a number of split
> packages still remain. This message identifies one such so that it can
> be discussed in isolation, with a view to a potential solution either in
> Lucene or possibly within Elasticsearch itself.
>
> Elasticsearch has a custom merge policy, ShuffleForcedMergePolicy, that
> interleaves eldest and newest segments. SFMP has but a single
> package-private dependency on lucene, namely `SegmentInfo::setDiagnostics`.
> The `setDiagnostics` method is used by SFMP to add a single additional
> ES-specific diagnostic key.
>
> It is possible to recreate the whole SegmentCommitInfo (and the SI
> within, along with the additional ES-specific diagnostic key), by using
> just the accessible parts of the lucene API. This has been prototyped[2],
> but shows that something is not quite right [3]. Resolving this issue is
> likely better done in lucene.
>
> The comment in `MergePolicy::setMergeInfo` - "Sets the SegmentCommitInfo
> of the merged segment. Allows sub-classes to e.g. set diagnostics
> properties"[4] - is telling. To better support the use-cases referred to
> in this comment and to allow for other-package subclasses (which seems
> completely reasonable given other parts of the API), it would be better
> for lucene's SI::setDiagnostics method to be public (rather than
> package-private). Such a change is a general improvement in the lucene
> API, which can be leveraged by any custom merge policy.
>
> -Chris.
>
> [1] https://issues.apache.org/jira/browse/LUCENE-9319
> [2] https://github.com/elastic/elasticsearch/pull/78253
> [3]
> https://github.com/elastic/elasticsearch/pull/78253#issuecomment-925861893
> [4]
> https://github.com/apache/lucene/blob/d5d6dc079395c47cd6d12dcce3bcfdd2c7d9dc63/lucene/core/src/java/org/apache/lucene/index/MergePolicy.java#L271
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

--
Adrien
Re: Accessibility of SegmentInfo::setDiagnostics [ In reply to ]
Hi Adrien,

Thanks for your reply.

A further thought... the use-case here is really for custom subclasses
to “add additional” diagnostics, rather than “replace”. How about we
expose just that functionality in SegmentInfo, e.g. through a new method
similar to:

/**
* Adds the given {@code diagnostics} to this segment's diagnostics.
* @param diagnostics the diagnostics to add
*/
public void addDiagnostics(Map<String, String> diagnostics) { ... }

This would satisfy the ES use-case, while disallowing a full replace.

If we think that there are other use-cases that require "replace" (or it
is preferred to defer API discussion for a later time), then I'm happy
to proceed with simply making the existing `setDiagnostics` method
public.

-Chris.

> On 24 Sep 2021, at 17:30, Adrien Grand <jpountz@gmail.com> wrote:
>
> I'd +1 a change that makes setDiagnostics public. Longer term I wonder if we should have a more locked down API that _only_ allows setting diagnostics. There are lots of things in SegmentCommitInfo that merges should never override like the segment ID, and I can't think of anything else than diagnostics that a merge policy should ever need to set.
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org