Mailing List Archive

TopScoreDocCollector class usage
Hi,-

 i used this class now before IndexSearher.search api (with collector
as 2nd arg) (Please see the "an interesting case" thread before this
question)


but this time i have a very weird behavior:


i used to have 4000+ hits with default TopScoreDocCollector.create(int
numHits,  ScoreDoc after, int totalHitsThreshold)

internal usage in IndexSearcher.search api which is 1000 and i set after
as null here.


Now when i set totalHitsThreshold and numHits in
TopScoreDocCollector.create to 300

i get 12200+ hits now from totalHits object.


Something is not right here, right?

How can it jump to 3 times when i set totalHitsThreshold as ~ 1/3 of
default value of totalHitsThreshold and numHits?


Best regards



ps.

NOTE: The search(org.apache.lucene.search.Query, int) and
searchAfter(org.apache.lucene.search.ScoreDoc,
org.apache.lucene.search.Query, int) methods are configured to only
count top hits accurately up to 1,000 and may return a lower bound of
the hit count if the hit count is greater than or equal to 1,000. On
queries that match lots of documents, counting the number of hits may
take much longer than computing the top hits so this trade-off allows to
get some minimal information about the hit count without slowing down
search too much. The TopDocs.scoreDocs array is always accurate however.
If this behavior doesn't suit your needs, you should create collectors
manually with either TopScoreDocCollector.create(int, int) or
TopFieldCollector.create(org.apache.lucene.search.Sort, int, int) and
call search(Query, Collector).


at


https://lucene.apache.org/core/8_5_2/core/org/apache/lucene/search/IndexSearcher.html#searchAfter-org.apache.lucene.search.ScoreDoc-org.apache.lucene.search.Query-int-


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: TopScoreDocCollector class usage [ In reply to ]
Ok i found it

300 times number of words in the search string but these needs to be
precisely documented in the Javadocs

i dont want to have trial and error and i guess nobody wants that,
either please.


Best regards



On 6/9/21 12:11 PM, baris.kazar@oracle.com wrote:
> Hi,-
>
>  i used this class now before IndexSearher.search api (with collector
> as 2nd arg) (Please see the "an interesting case" thread before this
> question)
>
>
> but this time i have a very weird behavior:
>
>
> i used to have 4000+ hits with default TopScoreDocCollector.create(int
> numHits,  ScoreDoc after, int totalHitsThreshold)
>
> internal usage in IndexSearcher.search api which is 1000 and i set
> after as null here.
>
>
> Now when i set totalHitsThreshold and numHits in
> TopScoreDocCollector.create to 300
>
> i get 12200+ hits now from totalHits object.
>
>
> Something is not right here, right?
>
> How can it jump to 3 times when i set totalHitsThreshold as ~ 1/3 of
> default value of totalHitsThreshold and numHits?
>
>
> Best regards
>
>
>
> ps.
>
> NOTE: The search(org.apache.lucene.search.Query, int) and
> searchAfter(org.apache.lucene.search.ScoreDoc,
> org.apache.lucene.search.Query, int) methods are configured to only
> count top hits accurately up to 1,000 and may return a lower bound of
> the hit count if the hit count is greater than or equal to 1,000. On
> queries that match lots of documents, counting the number of hits may
> take much longer than computing the top hits so this trade-off allows
> to get some minimal information about the hit count without slowing
> down search too much. The TopDocs.scoreDocs array is always accurate
> however. If this behavior doesn't suit your needs, you should create
> collectors manually with either TopScoreDocCollector.create(int, int)
> or TopFieldCollector.create(org.apache.lucene.search.Sort, int, int)
> and call search(Query, Collector).
>
>
> at
>
>
> https://lucene.apache.org/core/8_5_2/core/org/apache/lucene/search/IndexSearcher.html#searchAfter-org.apache.lucene.search.ScoreDoc-org.apache.lucene.search.Query-int-
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org