Hello Folks,
In Amazon product search we have a use case to override the
term-frequency to hold
a custom scoring signal for a small subset of fields in a document. These
fields do not have positions
enabled. The support for this was added to Lucene in
https://issues.apache.org/jira/browse/LUCENE-7854.
Following this change the *CheckIndex* tool no longer reports the total
token counts correctly on our index.
We have a simple 1-line change in our internal branch to increment total
positions count by 1 (instead of term-frequency)
if a field does not have positions.
*Current*:
https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/CheckIndex.java#L1609
*Proposed*: (hasPositions ? freq : 1);
If the community feels this is useful and something that should be changed
in Lucene then I am happy to open a JIRA and contribute a patch with
suitable unit test(s).
Thanks
-Ankur
In Amazon product search we have a use case to override the
term-frequency to hold
a custom scoring signal for a small subset of fields in a document. These
fields do not have positions
enabled. The support for this was added to Lucene in
https://issues.apache.org/jira/browse/LUCENE-7854.
Following this change the *CheckIndex* tool no longer reports the total
token counts correctly on our index.
We have a simple 1-line change in our internal branch to increment total
positions count by 1 (instead of term-frequency)
if a field does not have positions.
*Current*:
https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/CheckIndex.java#L1609
*Proposed*: (hasPositions ? freq : 1);
If the community feels this is useful and something that should be changed
in Lucene then I am happy to open a JIRA and contribute a patch with
suitable unit test(s).
Thanks
-Ankur