Mailing List Archive

What type of indexer is Lucene? Question reworded.
Hi again!

I should really reword my question as follows:

On which criteria are relevant documents chosen given a particular query

and

once retrieved, how are these documents ranked?

The techniques by which this is done will then determine what type of IR model Lucene implements.

Thanks again!

Melissa
Re: What type of indexer is Lucene? Question reworded. [ In reply to ]
I can't answer all of these questions fully, but since Doug is out, I'll
give it a start. Please check the FAQ for more detailed explanation. I
believe you will find enough information there to answer all of your
questions. The FAQ is linked from the Jakarta's page (there are actually
two FAQs so you might want to check both).

As far as I understand, Lucene is a probabilistic indexer. It supports
boolean queries but it also supports phrase queries, where it does true
ranking. The ranking is done based on how many of the search words
appear in a document and how "important" the words are for that
document, which is a function of the word frequency and the size of the
document.

For a given search, the type of result you get depends on the type of
Query that is used. For example, boolean queries can have "traditional"
AND terms which are all required for a match, but they can also have
"optional" terms that rank the document higher if they are found, but do
not rule out a document if they are not.

I hope this helps.
Dmitry.


Melissa Mifsud wrote:

>Hi again!
>
>I should really reword my question as follows:
>
>On which criteria are relevant documents chosen given a particular query
>
>and
>
>once retrieved, how are these documents ranked?
>
>The techniques by which this is done will then determine what type of IR model Lucene implements.
>
>Thanks again!
>
>Melissa
>




--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>