Mailing List Archive

Increment total positions by 1 instead of term-frequency if position data is missing
Hello Folks,
In Amazon product search we have a use case to override the
term-frequency to hold
a custom scoring signal for a small subset of fields in a document. These
fields do not have positions
enabled. The support for this was added to Lucene in
https://issues.apache.org/jira/browse/LUCENE-7854.

Following this change the *CheckIndex* tool no longer reports the total
token counts correctly on our index.
We have a simple 1-line change in our internal branch to increment total
positions count by 1 (instead of term-frequency)
if a field does not have positions.

*Current*:
https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/CheckIndex.java#L1609

*Proposed*: (hasPositions ? freq : 1);

If the community feels this is useful and something that should be changed
in Lucene then I am happy to open a JIRA and contribute a patch with
suitable unit test(s).

Thanks
-Ankur
Re: Increment total positions by 1 instead of term-frequency if position data is missing [ In reply to ]
In the actual tool, this is reported as the "number of tokens". This
*IS* actually the number of tokens that you have.

On Fri, Sep 3, 2021 at 3:45 PM Ankur Goel <ankur.goel79@gmail.com> wrote:
>
> Hello Folks,
> In Amazon product search we have a use case to override the term-frequency to hold
> a custom scoring signal for a small subset of fields in a document. These fields do not have positions
> enabled. The support for this was added to Lucene in https://issues.apache.org/jira/browse/LUCENE-7854.
>
> Following this change the CheckIndex tool no longer reports the total token counts correctly on our index.
> We have a simple 1-line change in our internal branch to increment total positions count by 1 (instead of term-frequency)
> if a field does not have positions.
>
> Current: https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/CheckIndex.java#L1609
>
> Proposed: (hasPositions ? freq : 1);
>
> If the community feels this is useful and something that should be changed in Lucene then I am happy to open a JIRA and contribute a patch with suitable unit test(s).
>
> Thanks
> -Ankur
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org