I am currently matching botanic names (with possible mis-spellings)
against an indexed referenced list with Lucene. After quick progress in
the beginning, I am struggeling with the proper query design to achieve
a ranking result I want.
Here is an example:
Search term: Acer campestre 'Rozi'
Tokenized (decomposed) representation:
acer
campestre
rozi
Top 10 hits:
{value=Acer campestre, score=12.288989}
{value=Acer campestre 'Rozi', score=11.955223} // <- why is it 2nd?
{value=Acer campestre 'Arends', score=10.640412}
{value=Acer campestre subsp. leiocarpon, score=10.640412}
{value=Acer campestre 'Carnival', score=10.640412}
{value=Acer campestre 'Commodore', score=10.640412}
{value=Acer campestre 'Nanum', score=10.640412}
{value=Acer campestre 'Elsrijk', score=10.640412}
{value=Acer campestre 'Fastigiatum', score=10.640412}
{value=Acer campestre 'Geessink', score=10.640412}]
And here is how I create my queries:
final BooleanQuery.Builder builder = new BooleanQuery.Builder();
// add individual tokens to query
for (String token : fuzzyTokens) {
final Term term = new Term(NAME_TOKENS.name(), token);
final FuzzyQuery fq = new FuzzyQuery(term);
builder.add(fq, BooleanClause.Occur.SHOULD);
}
return builder.build();
}
Input names are analyzed with a StandardTokenizer and Lowercase filter
when they are added to the IndexWriter.
My question: How can I get a ranking that scores
"Acer campestre 'Rozi'" higher than "Acer campestre"?
I am sure there is an obvious way to achieve this that I have yet
failed to find.
-Matthias
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
against an indexed referenced list with Lucene. After quick progress in
the beginning, I am struggeling with the proper query design to achieve
a ranking result I want.
Here is an example:
Search term: Acer campestre 'Rozi'
Tokenized (decomposed) representation:
acer
campestre
rozi
Top 10 hits:
{value=Acer campestre, score=12.288989}
{value=Acer campestre 'Rozi', score=11.955223} // <- why is it 2nd?
{value=Acer campestre 'Arends', score=10.640412}
{value=Acer campestre subsp. leiocarpon, score=10.640412}
{value=Acer campestre 'Carnival', score=10.640412}
{value=Acer campestre 'Commodore', score=10.640412}
{value=Acer campestre 'Nanum', score=10.640412}
{value=Acer campestre 'Elsrijk', score=10.640412}
{value=Acer campestre 'Fastigiatum', score=10.640412}
{value=Acer campestre 'Geessink', score=10.640412}]
And here is how I create my queries:
final BooleanQuery.Builder builder = new BooleanQuery.Builder();
// add individual tokens to query
for (String token : fuzzyTokens) {
final Term term = new Term(NAME_TOKENS.name(), token);
final FuzzyQuery fq = new FuzzyQuery(term);
builder.add(fq, BooleanClause.Occur.SHOULD);
}
return builder.build();
}
Input names are analyzed with a StandardTokenizer and Lowercase filter
when they are added to the IndexWriter.
My question: How can I get a ranking that scores
"Acer campestre 'Rozi'" higher than "Acer campestre"?
I am sure there is an obvious way to achieve this that I have yet
failed to find.
-Matthias
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org