Mailing List Archive

PrefixQuery Scoring
*This message was transferred with a trial version of CommuniGate(tm) Pro*

Whenever I add a PrefixQuery to my search the scoring gets really small. For
example if I do a query like this: +java then the scoring starts around
0.866... and so forth. But if I do a query like this: +java* then the
scoring start like 0.00034... Is there a specific reason for this? or is it
a bug?

Thanks,
Jonathan Franzone





--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
RE: PrefixQuery Scoring [ In reply to ]
> From: Jonathan Franzone [mailto:jonathan@franzone.com]
>
> Whenever I add a PrefixQuery to my search the scoring gets
> really small. For
> example if I do a query like this: +java then the scoring
> starts around
> 0.866... and so forth. But if I do a query like this: +java* then the
> scoring start like 0.00034... Is there a specific reason for
> this?

A PrefixQuery is equivalent to a query containing all the terms matching the
prefix, and is hence usually contains a lot of terms. With such a big
query, matching documents are likely to contain fewer of the query terms and
the match is thus weaker. For example, the top scoring document in a prefix
query might contain only one or two of 100 or more query terms. That's not
a very strong match. But the top-scoring document in a single term
non-prefix query is guaranteed to contain all of the query terms, and is
thus a much stronger match.

There are of course other factors involved in scoring (e.g., document length
& term frequency). I call the factor in question here "coordination"
matching. Documents which contain more of the query terms score higher.
This is to make the top hits of boolean "OR" queries look like those of a
boolean "AND" of the same terms, with the "OR" results following.

Doug

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>