Mailing List Archive

Best fuzzy match on multiple terms
I am currently matching botanic names (with possible mis-spellings)
against an indexed referenced list with Lucene. After quick progress in
the beginning, I am struggeling with the proper query design to achieve
a ranking result I want.

Here is an example:

Search term: Acer campestre 'Rozi'

Tokenized (decomposed) representation:
acer
campestre
rozi

Top 10 hits:
{value=Acer campestre, score=12.288989}
{value=Acer campestre 'Rozi', score=11.955223} // <- why is it 2nd?
{value=Acer campestre 'Arends', score=10.640412}
{value=Acer campestre subsp. leiocarpon, score=10.640412}
{value=Acer campestre 'Carnival', score=10.640412}
{value=Acer campestre 'Commodore', score=10.640412}
{value=Acer campestre 'Nanum', score=10.640412}
{value=Acer campestre 'Elsrijk', score=10.640412}
{value=Acer campestre 'Fastigiatum', score=10.640412}
{value=Acer campestre 'Geessink', score=10.640412}]


And here is how I create my queries:

final BooleanQuery.Builder builder = new BooleanQuery.Builder();
// add individual tokens to query
for (String token : fuzzyTokens) {
final Term term = new Term(NAME_TOKENS.name(), token);
final FuzzyQuery fq = new FuzzyQuery(term);
builder.add(fq, BooleanClause.Occur.SHOULD);
}
return builder.build();
}


Input names are analyzed with a StandardTokenizer and Lowercase filter
when they are added to the IndexWriter.


My question: How can I get a ranking that scores
"Acer campestre 'Rozi'" higher than "Acer campestre"?
I am sure there is an obvious way to achieve this that I have yet
failed to find.


-Matthias


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Best fuzzy match on multiple terms [ In reply to ]
i would suggest trying (indexing and searching) without === ' === s and
see You can find it first.

Thanks


On 6/13/19 11:25 AM, Matthias Müller wrote:
> I am currently matching botanic names (with possible mis-spellings)
> against an indexed referenced list with Lucene. After quick progress in
> the beginning, I am struggeling with the proper query design to achieve
> a ranking result I want.
>
> Here is an example:
>
> Search term: Acer campestre 'Rozi'
>
> Tokenized (decomposed) representation:
> acer
> campestre
> rozi
>
> Top 10 hits:
> {value=Acer campestre, score=12.288989}
> {value=Acer campestre 'Rozi', score=11.955223} // <- why is it 2nd?
> {value=Acer campestre 'Arends', score=10.640412}
> {value=Acer campestre subsp. leiocarpon, score=10.640412}
> {value=Acer campestre 'Carnival', score=10.640412}
> {value=Acer campestre 'Commodore', score=10.640412}
> {value=Acer campestre 'Nanum', score=10.640412}
> {value=Acer campestre 'Elsrijk', score=10.640412}
> {value=Acer campestre 'Fastigiatum', score=10.640412}
> {value=Acer campestre 'Geessink', score=10.640412}]
>
>
> And here is how I create my queries:
>
> final BooleanQuery.Builder builder = new BooleanQuery.Builder();
> // add individual tokens to query
> for (String token : fuzzyTokens) {
> final Term term = new Term(NAME_TOKENS.name(), token);
> final FuzzyQuery fq = new FuzzyQuery(term);
> builder.add(fq, BooleanClause.Occur.SHOULD);
> }
> return builder.build();
> }
>
>
> Input names are analyzed with a StandardTokenizer and Lowercase filter
> when they are added to the IndexWriter.
>
>
> My question: How can I get a ranking that scores
> "Acer campestre 'Rozi'" higher than "Acer campestre"?
> I am sure there is an obvious way to achieve this that I have yet
> failed to find.
>
>
> -Matthias
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Best fuzzy match on multiple terms [ In reply to ]
Dear Matthias,

First you need to know about the Lucene's ranking concept.
Lucene's basic ranking is BM25 and it depends on your index status.
(https://en.wikipedia.org/wiki/Okapi_BM25)
There can be many reasons.
One of thing that I can guess is your index has a lot of 'rozi' term so it
is getting worthless.
It is called IDF(Inverse Document Frequency).
Anyway, if you want to be a micro controller, you need to understand the
BM25 expression.

And Lucene can tell you how your score came out.
Explanation can be used to get it.
I attach the sample code.
======================================
IndexSearcher searcher = new IndexSearcher(reader);
TopDocs docs = searcher.search(q, hitsPerPage);
ScoreDoc[] hits = docs.scoreDocs;

for (int i = 0; i < hits.length; ++i) {
int docId = hits[i].doc;
Explanation explanation = searcher.explain(q, docId);
// You can see how the score is calculated
System.out.println("Explanation : " + explanation.toString());
}
======================================

I hope it helps :D

Best regards,
Namgyu Kim

P.S. For BM25, the default value in Lucene is k1 = 1.2, b = 0.75.

2019? 6? 14? (?) ?? 12:54, <baris.kazar@oracle.com>?? ??:

> i would suggest trying (indexing and searching) without === ' === s and
> see You can find it first.
>
> Thanks
>
>
> On 6/13/19 11:25 AM, Matthias Müller wrote:
> > I am currently matching botanic names (with possible mis-spellings)
> > against an indexed referenced list with Lucene. After quick progress in
> > the beginning, I am struggeling with the proper query design to achieve
> > a ranking result I want.
> >
> > Here is an example:
> >
> > Search term: Acer campestre 'Rozi'
> >
> > Tokenized (decomposed) representation:
> > acer
> > campestre
> > rozi
> >
> > Top 10 hits:
> > {value=Acer campestre, score=12.288989}
> > {value=Acer campestre 'Rozi', score=11.955223} // <- why is it 2nd?
> > {value=Acer campestre 'Arends', score=10.640412}
> > {value=Acer campestre subsp. leiocarpon, score=10.640412}
> > {value=Acer campestre 'Carnival', score=10.640412}
> > {value=Acer campestre 'Commodore', score=10.640412}
> > {value=Acer campestre 'Nanum', score=10.640412}
> > {value=Acer campestre 'Elsrijk', score=10.640412}
> > {value=Acer campestre 'Fastigiatum', score=10.640412}
> > {value=Acer campestre 'Geessink', score=10.640412}]
> >
> >
> > And here is how I create my queries:
> >
> > final BooleanQuery.Builder builder = new BooleanQuery.Builder();
> > // add individual tokens to query
> > for (String token : fuzzyTokens) {
> > final Term term = new Term(NAME_TOKENS.name(), token);
> > final FuzzyQuery fq = new FuzzyQuery(term);
> > builder.add(fq, BooleanClause.Occur.SHOULD);
> > }
> > return builder.build();
> > }
> >
> >
> > Input names are analyzed with a StandardTokenizer and Lowercase filter
> > when they are added to the IndexWriter.
> >
> >
> > My question: How can I get a ranking that scores
> > "Acer campestre 'Rozi'" higher than "Acer campestre"?
> > I am sure there is an obvious way to achieve this that I have yet
> > failed to find.
> >
> >
> > -Matthias
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
Re: Best fuzzy match on multiple terms [ In reply to ]
Hi Matthias,

What similarity class are you using.
Just a guess... but possibly one reason is document (field) length
normalization. Generally speaking shorter documents would get higher
scores than longer documents. (I saw that classic TFIDF similarity
tends to give much higher scores to shorter documents. Newer version
of lucene uses BM25 similarity as default, that moderates the tendency
and has a tuning parameter 'b' to control the normalization effect.)
See also: https://www.elastic.co/guide/en/elasticsearch/guide/current/pluggable-similarites.html

As Namgyu Kim said, explain() API could help you to examine the details.

Tomoko

2019?6?14?(?) 1:27 Namgyu Kim <kng0828@gmail.com>:
>
> Dear Matthias,
>
> First you need to know about the Lucene's ranking concept.
> Lucene's basic ranking is BM25 and it depends on your index status.
> (https://en.wikipedia.org/wiki/Okapi_BM25)
> There can be many reasons.
> One of thing that I can guess is your index has a lot of 'rozi' term so it
> is getting worthless.
> It is called IDF(Inverse Document Frequency).
> Anyway, if you want to be a micro controller, you need to understand the
> BM25 expression.
>
> And Lucene can tell you how your score came out.
> Explanation can be used to get it.
> I attach the sample code.
> ======================================
> IndexSearcher searcher = new IndexSearcher(reader);
> TopDocs docs = searcher.search(q, hitsPerPage);
> ScoreDoc[] hits = docs.scoreDocs;
>
> for (int i = 0; i < hits.length; ++i) {
> int docId = hits[i].doc;
> Explanation explanation = searcher.explain(q, docId);
> // You can see how the score is calculated
> System.out.println("Explanation : " + explanation.toString());
> }
> ======================================
>
> I hope it helps :D
>
> Best regards,
> Namgyu Kim
>
> P.S. For BM25, the default value in Lucene is k1 = 1.2, b = 0.75.
>
> 2019? 6? 14? (?) ?? 12:54, <baris.kazar@oracle.com>?? ??:
>
> > i would suggest trying (indexing and searching) without === ' === s and
> > see You can find it first.
> >
> > Thanks
> >
> >
> > On 6/13/19 11:25 AM, Matthias Müller wrote:
> > > I am currently matching botanic names (with possible mis-spellings)
> > > against an indexed referenced list with Lucene. After quick progress in
> > > the beginning, I am struggeling with the proper query design to achieve
> > > a ranking result I want.
> > >
> > > Here is an example:
> > >
> > > Search term: Acer campestre 'Rozi'
> > >
> > > Tokenized (decomposed) representation:
> > > acer
> > > campestre
> > > rozi
> > >
> > > Top 10 hits:
> > > {value=Acer campestre, score=12.288989}
> > > {value=Acer campestre 'Rozi', score=11.955223} // <- why is it 2nd?
> > > {value=Acer campestre 'Arends', score=10.640412}
> > > {value=Acer campestre subsp. leiocarpon, score=10.640412}
> > > {value=Acer campestre 'Carnival', score=10.640412}
> > > {value=Acer campestre 'Commodore', score=10.640412}
> > > {value=Acer campestre 'Nanum', score=10.640412}
> > > {value=Acer campestre 'Elsrijk', score=10.640412}
> > > {value=Acer campestre 'Fastigiatum', score=10.640412}
> > > {value=Acer campestre 'Geessink', score=10.640412}]
> > >
> > >
> > > And here is how I create my queries:
> > >
> > > final BooleanQuery.Builder builder = new BooleanQuery.Builder();
> > > // add individual tokens to query
> > > for (String token : fuzzyTokens) {
> > > final Term term = new Term(NAME_TOKENS.name(), token);
> > > final FuzzyQuery fq = new FuzzyQuery(term);
> > > builder.add(fq, BooleanClause.Occur.SHOULD);
> > > }
> > > return builder.build();
> > > }
> > >
> > >
> > > Input names are analyzed with a StandardTokenizer and Lowercase filter
> > > when they are added to the IndexWriter.
> > >
> > >
> > > My question: How can I get a ranking that scores
> > > "Acer campestre 'Rozi'" higher than "Acer campestre"?
> > > I am sure there is an obvious way to achieve this that I have yet
> > > failed to find.
> > >
> > >
> > > -Matthias
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Best fuzzy match on multiple terms [ In reply to ]
Hi Namgyu and Tomoko,

your hint towards Explanation was very helpful and I was not aware of
this feature.

I have now experimented with different scoring functions and it seems
that DFISimilarity and BM25Similarity (with lower 'b') produce results
in the direction I prefer, though not perfect for some cases [1].

The fuzzy term queries probably generate hardly predictable
similarities on additional fields. These add scores to the overall
result and also affect normalization.

Positively, the preferred matches are somewhere in the top ranks. So
maybe rule-based assessment of the top N hits might help me achieve
what I want.


- Matthias


[1]:
"Abelia xgrandiflora" -> "Abelia xgrandiflora 'Wevo1' BELLA DONNA"
(score=13.7869625)
instead of the direct match
"Abelia xgrandiflora" -> "Abelia xgrandiflora" (score=13.74585)

Am Freitag, den 14.06.2019, 16:32 +0900 schrieb Tomoko Uchida:
> Hi Matthias,
>
> What similarity class are you using.
> Just a guess... but possibly one reason is document (field) length
> normalization. Generally speaking shorter documents would get higher
> scores than longer documents. (I saw that classic TFIDF similarity
> tends to give much higher scores to shorter documents. Newer version
> of lucene uses BM25 similarity as default, that moderates the
> tendency
> and has a tuning parameter 'b' to control the normalization effect.)
> See also:
> https://www.elastic.co/guide/en/elasticsearch/guide/current/pluggable-similarites.html
>
> As Namgyu Kim said, explain() API could help you to examine the
> details.
>
> Tomoko
>
> 2019?6?14?(?) 1:27 Namgyu Kim <kng0828@gmail.com>:
> > Dear Matthias,
> >
> > First you need to know about the Lucene's ranking concept.
> > Lucene's basic ranking is BM25 and it depends on your index status.
> > (https://en.wikipedia.org/wiki/Okapi_BM25)
> > There can be many reasons.
> > One of thing that I can guess is your index has a lot of 'rozi'
> > term so it
> > is getting worthless.
> > It is called IDF(Inverse Document Frequency).
> > Anyway, if you want to be a micro controller, you need to
> > understand the
> > BM25 expression.
> >
> > And Lucene can tell you how your score came out.
> > Explanation can be used to get it.
> > I attach the sample code.
> > ======================================
> > IndexSearcher searcher = new IndexSearcher(reader);
> > TopDocs docs = searcher.search(q, hitsPerPage);
> > ScoreDoc[] hits = docs.scoreDocs;
> >
> > for (int i = 0; i < hits.length; ++i) {
> > int docId = hits[i].doc;
> > Explanation explanation = searcher.explain(q, docId);
> > // You can see how the score is calculated
> > System.out.println("Explanation : " + explanation.toString());
> > }
> > ======================================
> >
> > I hope it helps :D
> >
> > Best regards,
> > Namgyu Kim
> >
> > P.S. For BM25, the default value in Lucene is k1 = 1.2, b = 0.75.
> >
> > 2019? 6? 14? (?) ?? 12:54, <baris.kazar@oracle.com>?? ??:
> >
> > > i would suggest trying (indexing and searching) without === ' ===
> > > s and
> > > see You can find it first.
> > >
> > > Thanks
> > >
> > >
> > > On 6/13/19 11:25 AM, Matthias Müller wrote:
> > > > I am currently matching botanic names (with possible mis-
> > > > spellings)
> > > > against an indexed referenced list with Lucene. After quick
> > > > progress in
> > > > the beginning, I am struggeling with the proper query design to
> > > > achieve
> > > > a ranking result I want.
> > > >
> > > > Here is an example:
> > > >
> > > > Search term: Acer campestre 'Rozi'
> > > >
> > > > Tokenized (decomposed) representation:
> > > > acer
> > > > campestre
> > > > rozi
> > > >
> > > > Top 10 hits:
> > > > {value=Acer campestre, score=12.288989}
> > > > {value=Acer campestre 'Rozi', score=11.955223} // <- why is it
> > > > 2nd?
> > > > {value=Acer campestre 'Arends', score=10.640412}
> > > > {value=Acer campestre subsp. leiocarpon, score=10.640412}
> > > > {value=Acer campestre 'Carnival', score=10.640412}
> > > > {value=Acer campestre 'Commodore', score=10.640412}
> > > > {value=Acer campestre 'Nanum', score=10.640412}
> > > > {value=Acer campestre 'Elsrijk', score=10.640412}
> > > > {value=Acer campestre 'Fastigiatum', score=10.640412}
> > > > {value=Acer campestre 'Geessink', score=10.640412}]
> > > >
> > > >
> > > > And here is how I create my queries:
> > > >
> > > > final BooleanQuery.Builder builder = new
> > > > BooleanQuery.Builder();
> > > > // add individual tokens to query
> > > > for (String token : fuzzyTokens) {
> > > > final Term term = new Term(NAME_TOKENS.name(), token);
> > > > final FuzzyQuery fq = new FuzzyQuery(term);
> > > > builder.add(fq, BooleanClause.Occur.SHOULD);
> > > > }
> > > > return builder.build();
> > > > }
> > > >
> > > >
> > > > Input names are analyzed with a StandardTokenizer and Lowercase
> > > > filter
> > > > when they are added to the IndexWriter.
> > > >
> > > >
> > > > My question: How can I get a ranking that scores
> > > > "Acer campestre 'Rozi'" higher than "Acer campestre"?
> > > > I am sure there is an obvious way to achieve this that I have
> > > > yet
> > > > failed to find.
> > > >
> > > >
> > > > -Matthias
> > > >
> > > >
> > > > -------------------------------------------------------------
> > > > --------
> > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > > For additional commands, e-mail:
> > > > java-user-help@lucene.apache.org
> > > >
> > >
> > > ---------------------------------------------------------------
> > > ------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Best fuzzy match on multiple terms [ In reply to ]
These are great suggestions, i was going to suggest explain plan of
query, too.

i really wonder in Your case why 'Rozi' entry does not get higher score.

Is there any effect from " ' " chars?


In my case i have sort of reverse situation:

my query is maink~2 (mains was a special case where i still investigate)

i would expect the second result below to be the first result as it is
shorter and closest hit and first result to be the second result.

NASHUA in results: MAIN DUNSTABLE NASHUA HILLSBOROUGH NEW HAMPSHIRE
UNITED STATES in the 0 th result
NASHUA in results: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
in the 1 th result


Best regards


On 6/14/19 6:45 AM, Matthias Müller wrote:
> Hi Namgyu and Tomoko,
>
> your hint towards Explanation was very helpful and I was not aware of
> this feature.
>
> I have now experimented with different scoring functions and it seems
> that DFISimilarity and BM25Similarity (with lower 'b') produce results
> in the direction I prefer, though not perfect for some cases [1].
>
> The fuzzy term queries probably generate hardly predictable
> similarities on additional fields. These add scores to the overall
> result and also affect normalization.
>
> Positively, the preferred matches are somewhere in the top ranks. So
> maybe rule-based assessment of the top N hits might help me achieve
> what I want.
>
>
> - Matthias
>
>
> [1]:
> "Abelia xgrandiflora" -> "Abelia xgrandiflora 'Wevo1' BELLA DONNA"
> (score=13.7869625)
> instead of the direct match
> "Abelia xgrandiflora" -> "Abelia xgrandiflora" (score=13.74585)
>
> Am Freitag, den 14.06.2019, 16:32 +0900 schrieb Tomoko Uchida:
>> Hi Matthias,
>>
>> What similarity class are you using.
>> Just a guess... but possibly one reason is document (field) length
>> normalization. Generally speaking shorter documents would get higher
>> scores than longer documents. (I saw that classic TFIDF similarity
>> tends to give much higher scores to shorter documents. Newer version
>> of lucene uses BM25 similarity as default, that moderates the
>> tendency
>> and has a tuning parameter 'b' to control the normalization effect.)
>> See also:
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.elastic.co_guide_en_elasticsearch_guide_current_pluggable-2Dsimilarites.html&d=DwIDaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=EQ--nOw2fv4xC2jDVd61qmWey2RW5y71Jx5-esA5Epo&s=xgCA5llK_2kxvxRc4arpgbd1rhgRrSkOqD5j57CA-6Q&e=
>>
>> As Namgyu Kim said, explain() API could help you to examine the
>> details.
>>
>> Tomoko
>>
>> 2019?6?14?(?) 1:27 Namgyu Kim <kng0828@gmail.com>:
>>> Dear Matthias,
>>>
>>> First you need to know about the Lucene's ranking concept.
>>> Lucene's basic ranking is BM25 and it depends on your index status.
>>> (https://urldefense.proofpoint.com/v2/url?u=https-3A__en.wikipedia.org_wiki_Okapi-5FBM25&d=DwIDaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=EQ--nOw2fv4xC2jDVd61qmWey2RW5y71Jx5-esA5Epo&s=3M7Yh2-tiEHd8DVhJc5fBeVfE65WvnaXsphnx2pCdfg&e=)
>>> There can be many reasons.
>>> One of thing that I can guess is your index has a lot of 'rozi'
>>> term so it
>>> is getting worthless.
>>> It is called IDF(Inverse Document Frequency).
>>> Anyway, if you want to be a micro controller, you need to
>>> understand the
>>> BM25 expression.
>>>
>>> And Lucene can tell you how your score came out.
>>> Explanation can be used to get it.
>>> I attach the sample code.
>>> ======================================
>>> IndexSearcher searcher = new IndexSearcher(reader);
>>> TopDocs docs = searcher.search(q, hitsPerPage);
>>> ScoreDoc[] hits = docs.scoreDocs;
>>>
>>> for (int i = 0; i < hits.length; ++i) {
>>> int docId = hits[i].doc;
>>> Explanation explanation = searcher.explain(q, docId);
>>> // You can see how the score is calculated
>>> System.out.println("Explanation : " + explanation.toString());
>>> }
>>> ======================================
>>>
>>> I hope it helps :D
>>>
>>> Best regards,
>>> Namgyu Kim
>>>
>>> P.S. For BM25, the default value in Lucene is k1 = 1.2, b = 0.75.
>>>
>>> 2019? 6? 14? (?) ?? 12:54, <baris.kazar@oracle.com>?? ??:
>>>
>>>> i would suggest trying (indexing and searching) without === ' ===
>>>> s and
>>>> see You can find it first.
>>>>
>>>> Thanks
>>>>
>>>>
>>>> On 6/13/19 11:25 AM, Matthias Müller wrote:
>>>>> I am currently matching botanic names (with possible mis-
>>>>> spellings)
>>>>> against an indexed referenced list with Lucene. After quick
>>>>> progress in
>>>>> the beginning, I am struggeling with the proper query design to
>>>>> achieve
>>>>> a ranking result I want.
>>>>>
>>>>> Here is an example:
>>>>>
>>>>> Search term: Acer campestre 'Rozi'
>>>>>
>>>>> Tokenized (decomposed) representation:
>>>>> acer
>>>>> campestre
>>>>> rozi
>>>>>
>>>>> Top 10 hits:
>>>>> {value=Acer campestre, score=12.288989}
>>>>> {value=Acer campestre 'Rozi', score=11.955223} // <- why is it
>>>>> 2nd?
>>>>> {value=Acer campestre 'Arends', score=10.640412}
>>>>> {value=Acer campestre subsp. leiocarpon, score=10.640412}
>>>>> {value=Acer campestre 'Carnival', score=10.640412}
>>>>> {value=Acer campestre 'Commodore', score=10.640412}
>>>>> {value=Acer campestre 'Nanum', score=10.640412}
>>>>> {value=Acer campestre 'Elsrijk', score=10.640412}
>>>>> {value=Acer campestre 'Fastigiatum', score=10.640412}
>>>>> {value=Acer campestre 'Geessink', score=10.640412}]
>>>>>
>>>>>
>>>>> And here is how I create my queries:
>>>>>
>>>>> final BooleanQuery.Builder builder = new
>>>>> BooleanQuery.Builder();
>>>>> // add individual tokens to query
>>>>> for (String token : fuzzyTokens) {
>>>>> final Term term = new Term(NAME_TOKENS.name(), token);
>>>>> final FuzzyQuery fq = new FuzzyQuery(term);
>>>>> builder.add(fq, BooleanClause.Occur.SHOULD);
>>>>> }
>>>>> return builder.build();
>>>>> }
>>>>>
>>>>>
>>>>> Input names are analyzed with a StandardTokenizer and Lowercase
>>>>> filter
>>>>> when they are added to the IndexWriter.
>>>>>
>>>>>
>>>>> My question: How can I get a ranking that scores
>>>>> "Acer campestre 'Rozi'" higher than "Acer campestre"?
>>>>> I am sure there is an obvious way to achieve this that I have
>>>>> yet
>>>>> failed to find.
>>>>>
>>>>>
>>>>> -Matthias
>>>>>
>>>>>
>>>>> -------------------------------------------------------------
>>>>> --------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail:
>>>>> java-user-help@lucene.apache.org
>>>>>
>>>> ---------------------------------------------------------------
>>>> ------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Best fuzzy match on multiple terms [ In reply to ]
Hi Boris,

"Acer campestre 'Rozi'" now receives a higher score with DFISimilarity
and BM25Similarity (with tuned 'b') instead of the standard BM25.

It really iswas a scoring/normalization issue: While "Rozi" gets a
higher score, "Acer" and "campestere" received lower values and the
combined result was fractions of a score below the desired hit.

-Matthias



Am Freitag, den 14.06.2019, 10:41 -0400 schrieb baris.kazar@oracle.com:
> These are great suggestions, i was going to suggest explain plan of
> query, too.
>
> i really wonder in Your case why 'Rozi' entry does not get higher
> score.
>
> Is there any effect from " ' " chars?
>
>
> In my case i have sort of reverse situation:
>
> my query is maink~2 (mains was a special case where i still
> investigate)
>
> i would expect the second result below to be the first result as it
> is
> shorter and closest hit and first result to be the second result.
>
> NASHUA in results: MAIN DUNSTABLE NASHUA HILLSBOROUGH NEW HAMPSHIRE
> UNITED STATES in the 0 th result
> NASHUA in results: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED
> STATES
> in the 1 th result
>
>
> Best regards
>
>
> On 6/14/19 6:45 AM, Matthias Müller wrote:
> > Hi Namgyu and Tomoko,
> >
> > your hint towards Explanation was very helpful and I was not aware
> > of
> > this feature.
> >
> > I have now experimented with different scoring functions and it
> > seems
> > that DFISimilarity and BM25Similarity (with lower 'b') produce
> > results
> > in the direction I prefer, though not perfect for some cases [1].
> >
> > The fuzzy term queries probably generate hardly predictable
> > similarities on additional fields. These add scores to the overall
> > result and also affect normalization.
> >
> > Positively, the preferred matches are somewhere in the top ranks.
> > So
> > maybe rule-based assessment of the top N hits might help me achieve
> > what I want.
> >
> >
> > - Matthias
> >
> >
> > [1]:
> > "Abelia xgrandiflora" -> "Abelia xgrandiflora 'Wevo1' BELLA DONNA"
> > (score=13.7869625)
> > instead of the direct match
> > "Abelia xgrandiflora" -> "Abelia xgrandiflora" (score=13.74585)
> >
> > Am Freitag, den 14.06.2019, 16:32 +0900 schrieb Tomoko Uchida:
> > > Hi Matthias,
> > >
> > > What similarity class are you using.
> > > Just a guess... but possibly one reason is document (field)
> > > length
> > > normalization. Generally speaking shorter documents would get
> > > higher
> > > scores than longer documents. (I saw that classic TFIDF
> > > similarity
> > > tends to give much higher scores to shorter documents. Newer
> > > version
> > > of lucene uses BM25 similarity as default, that moderates the
> > > tendency
> > > and has a tuning parameter 'b' to control the normalization
> > > effect.)
> > > See also:
> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.elastic.co_guide_en_elasticsearch_guide_current_pluggable-2Dsimilarites.html&d=DwIDaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=EQ--nOw2fv4xC2jDVd61qmWey2RW5y71Jx5-esA5Epo&s=xgCA5llK_2kxvxRc4arpgbd1rhgRrSkOqD5j57CA-6Q&e=
> > >
> > > As Namgyu Kim said, explain() API could help you to examine the
> > > details.
> > >
> > > Tomoko
> > >
> > > 2019?6?14?(?) 1:27 Namgyu Kim <kng0828@gmail.com>:
> > > > Dear Matthias,
> > > >
> > > > First you need to know about the Lucene's ranking concept.
> > > > Lucene's basic ranking is BM25 and it depends on your index
> > > > status.
> > > > (
> > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__en.wikipedia.org_wiki_Okapi-5FBM25&d=DwIDaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=EQ--nOw2fv4xC2jDVd61qmWey2RW5y71Jx5-esA5Epo&s=3M7Yh2-tiEHd8DVhJc5fBeVfE65WvnaXsphnx2pCdfg&e=
> > > > )
> > > > There can be many reasons.
> > > > One of thing that I can guess is your index has a lot of 'rozi'
> > > > term so it
> > > > is getting worthless.
> > > > It is called IDF(Inverse Document Frequency).
> > > > Anyway, if you want to be a micro controller, you need to
> > > > understand the
> > > > BM25 expression.
> > > >
> > > > And Lucene can tell you how your score came out.
> > > > Explanation can be used to get it.
> > > > I attach the sample code.
> > > > ======================================
> > > > IndexSearcher searcher = new IndexSearcher(reader);
> > > > TopDocs docs = searcher.search(q, hitsPerPage);
> > > > ScoreDoc[] hits = docs.scoreDocs;
> > > >
> > > > for (int i = 0; i < hits.length; ++i) {
> > > > int docId = hits[i].doc;
> > > > Explanation explanation = searcher.explain(q, docId);
> > > > // You can see how the score is calculated
> > > > System.out.println("Explanation : " +
> > > > explanation.toString());
> > > > }
> > > > ======================================
> > > >
> > > > I hope it helps :D
> > > >
> > > > Best regards,
> > > > Namgyu Kim
> > > >
> > > > P.S. For BM25, the default value in Lucene is k1 = 1.2, b =
> > > > 0.75.
> > > >
> > > > 2019? 6? 14? (?) ?? 12:54, <baris.kazar@oracle.com>?? ??:
> > > >
> > > > > i would suggest trying (indexing and searching) without === '
> > > > > ===
> > > > > s and
> > > > > see You can find it first.
> > > > >
> > > > > Thanks
> > > > >
> > > > >
> > > > > On 6/13/19 11:25 AM, Matthias Müller wrote:
> > > > > > I am currently matching botanic names (with possible mis-
> > > > > > spellings)
> > > > > > against an indexed referenced list with Lucene. After quick
> > > > > > progress in
> > > > > > the beginning, I am struggeling with the proper query
> > > > > > design to
> > > > > > achieve
> > > > > > a ranking result I want.
> > > > > >
> > > > > > Here is an example:
> > > > > >
> > > > > > Search term: Acer campestre 'Rozi'
> > > > > >
> > > > > > Tokenized (decomposed) representation:
> > > > > > acer
> > > > > > campestre
> > > > > > rozi
> > > > > >
> > > > > > Top 10 hits:
> > > > > > {value=Acer campestre, score=12.288989}
> > > > > > {value=Acer campestre 'Rozi', score=11.955223} // <- why is
> > > > > > it
> > > > > > 2nd?
> > > > > > {value=Acer campestre 'Arends', score=10.640412}
> > > > > > {value=Acer campestre subsp. leiocarpon, score=10.640412}
> > > > > > {value=Acer campestre 'Carnival', score=10.640412}
> > > > > > {value=Acer campestre 'Commodore', score=10.640412}
> > > > > > {value=Acer campestre 'Nanum', score=10.640412}
> > > > > > {value=Acer campestre 'Elsrijk', score=10.640412}
> > > > > > {value=Acer campestre 'Fastigiatum', score=10.640412}
> > > > > > {value=Acer campestre 'Geessink', score=10.640412}]
> > > > > >
> > > > > >
> > > > > > And here is how I create my queries:
> > > > > >
> > > > > > final BooleanQuery.Builder builder = new
> > > > > > BooleanQuery.Builder();
> > > > > > // add individual tokens to query
> > > > > > for (String token : fuzzyTokens) {
> > > > > > final Term term = new Term(NAME_TOKENS.name(),
> > > > > > token);
> > > > > > final FuzzyQuery fq = new FuzzyQuery(term);
> > > > > > builder.add(fq, BooleanClause.Occur.SHOULD);
> > > > > > }
> > > > > > return builder.build();
> > > > > > }
> > > > > >
> > > > > >
> > > > > > Input names are analyzed with a StandardTokenizer and
> > > > > > Lowercase
> > > > > > filter
> > > > > > when they are added to the IndexWriter.
> > > > > >
> > > > > >
> > > > > > My question: How can I get a ranking that scores
> > > > > > "Acer campestre 'Rozi'" higher than "Acer campestre"?
> > > > > > I am sure there is an obvious way to achieve this that I
> > > > > > have
> > > > > > yet
> > > > > > failed to find.
> > > > > >
> > > > > >
> > > > > > -Matthias
> > > > > >
> > > > > >
> > > > > > ---------------------------------------------------------
> > > > > > ----
> > > > > > --------
> > > > > > To unsubscribe, e-mail:
> > > > > > java-user-unsubscribe@lucene.apache.org
> > > > > > For additional commands, e-mail:
> > > > > > java-user-help@lucene.apache.org
> > > > > >
> > > > > -----------------------------------------------------------
> > > > > ----
> > > > > ------
> > > > > To unsubscribe, e-mail:
> > > > > java-user-unsubscribe@lucene.apache.org
> > > > > For additional commands, e-mail:
> > > > > java-user-help@lucene.apache.org
> > > > >
> > > > >
> > > ---------------------------------------------------------------
> > > ------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> >
> > -----------------------------------------------------------------
> > ----
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Best fuzzy match on multiple terms [ In reply to ]
Hi Boris,

Query parsing and scoring/ranking are completely separated processes
so I'd debug those problems separately.
For debugging fuzzy query, Query.rewrite() method would be a good
first step (by which you can see all unrolled terms generated by fuzzy
query).
I'm not sure about what is your problem, but in many cases you also
need to take care of analyzers to get desirable or tweaked search
results.

JFYI, using Luke (a GUI tool for inspecting your Lucene
indexes/analyzers/search queries) is a convenient way for that, if
you'd like.
https://github.com/DmitryKey/luke (This has been integrated into
Lucene since 8.1, but you can download older versions from the github
repo.)

e.g.
https://twitter.com/moco_beta/status/1139754595800928256
https://twitter.com/moco_beta/status/1139758109457391616

Enjoy.
Tomoko

2019?6?15?(?) 3:09 Matthias Müller <matthias_mueller@tu-dresden.de>:
>
> Hi Boris,
>
> "Acer campestre 'Rozi'" now receives a higher score with DFISimilarity
> and BM25Similarity (with tuned 'b') instead of the standard BM25.
>
> It really iswas a scoring/normalization issue: While "Rozi" gets a
> higher score, "Acer" and "campestere" received lower values and the
> combined result was fractions of a score below the desired hit.
>
> -Matthias
>
>
>
> Am Freitag, den 14.06.2019, 10:41 -0400 schrieb baris.kazar@oracle.com:
> > These are great suggestions, i was going to suggest explain plan of
> > query, too.
> >
> > i really wonder in Your case why 'Rozi' entry does not get higher
> > score.
> >
> > Is there any effect from " ' " chars?
> >
> >
> > In my case i have sort of reverse situation:
> >
> > my query is maink~2 (mains was a special case where i still
> > investigate)
> >
> > i would expect the second result below to be the first result as it
> > is
> > shorter and closest hit and first result to be the second result.
> >
> > NASHUA in results: MAIN DUNSTABLE NASHUA HILLSBOROUGH NEW HAMPSHIRE
> > UNITED STATES in the 0 th result
> > NASHUA in results: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED
> > STATES
> > in the 1 th result
> >
> >
> > Best regards
> >
> >
> > On 6/14/19 6:45 AM, Matthias Müller wrote:
> > > Hi Namgyu and Tomoko,
> > >
> > > your hint towards Explanation was very helpful and I was not aware
> > > of
> > > this feature.
> > >
> > > I have now experimented with different scoring functions and it
> > > seems
> > > that DFISimilarity and BM25Similarity (with lower 'b') produce
> > > results
> > > in the direction I prefer, though not perfect for some cases [1].
> > >
> > > The fuzzy term queries probably generate hardly predictable
> > > similarities on additional fields. These add scores to the overall
> > > result and also affect normalization.
> > >
> > > Positively, the preferred matches are somewhere in the top ranks.
> > > So
> > > maybe rule-based assessment of the top N hits might help me achieve
> > > what I want.
> > >
> > >
> > > - Matthias
> > >
> > >
> > > [1]:
> > > "Abelia xgrandiflora" -> "Abelia xgrandiflora 'Wevo1' BELLA DONNA"
> > > (score=13.7869625)
> > > instead of the direct match
> > > "Abelia xgrandiflora" -> "Abelia xgrandiflora" (score=13.74585)
> > >
> > > Am Freitag, den 14.06.2019, 16:32 +0900 schrieb Tomoko Uchida:
> > > > Hi Matthias,
> > > >
> > > > What similarity class are you using.
> > > > Just a guess... but possibly one reason is document (field)
> > > > length
> > > > normalization. Generally speaking shorter documents would get
> > > > higher
> > > > scores than longer documents. (I saw that classic TFIDF
> > > > similarity
> > > > tends to give much higher scores to shorter documents. Newer
> > > > version
> > > > of lucene uses BM25 similarity as default, that moderates the
> > > > tendency
> > > > and has a tuning parameter 'b' to control the normalization
> > > > effect.)
> > > > See also:
> > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.elastic.co_guide_en_elasticsearch_guide_current_pluggable-2Dsimilarites.html&d=DwIDaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=EQ--nOw2fv4xC2jDVd61qmWey2RW5y71Jx5-esA5Epo&s=xgCA5llK_2kxvxRc4arpgbd1rhgRrSkOqD5j57CA-6Q&e=
> > > >
> > > > As Namgyu Kim said, explain() API could help you to examine the
> > > > details.
> > > >
> > > > Tomoko
> > > >
> > > > 2019?6?14?(?) 1:27 Namgyu Kim <kng0828@gmail.com>:
> > > > > Dear Matthias,
> > > > >
> > > > > First you need to know about the Lucene's ranking concept.
> > > > > Lucene's basic ranking is BM25 and it depends on your index
> > > > > status.
> > > > > (
> > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__en.wikipedia.org_wiki_Okapi-5FBM25&d=DwIDaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=EQ--nOw2fv4xC2jDVd61qmWey2RW5y71Jx5-esA5Epo&s=3M7Yh2-tiEHd8DVhJc5fBeVfE65WvnaXsphnx2pCdfg&e=
> > > > > )
> > > > > There can be many reasons.
> > > > > One of thing that I can guess is your index has a lot of 'rozi'
> > > > > term so it
> > > > > is getting worthless.
> > > > > It is called IDF(Inverse Document Frequency).
> > > > > Anyway, if you want to be a micro controller, you need to
> > > > > understand the
> > > > > BM25 expression.
> > > > >
> > > > > And Lucene can tell you how your score came out.
> > > > > Explanation can be used to get it.
> > > > > I attach the sample code.
> > > > > ======================================
> > > > > IndexSearcher searcher = new IndexSearcher(reader);
> > > > > TopDocs docs = searcher.search(q, hitsPerPage);
> > > > > ScoreDoc[] hits = docs.scoreDocs;
> > > > >
> > > > > for (int i = 0; i < hits.length; ++i) {
> > > > > int docId = hits[i].doc;
> > > > > Explanation explanation = searcher.explain(q, docId);
> > > > > // You can see how the score is calculated
> > > > > System.out.println("Explanation : " +
> > > > > explanation.toString());
> > > > > }
> > > > > ======================================
> > > > >
> > > > > I hope it helps :D
> > > > >
> > > > > Best regards,
> > > > > Namgyu Kim
> > > > >
> > > > > P.S. For BM25, the default value in Lucene is k1 = 1.2, b =
> > > > > 0.75.
> > > > >
> > > > > 2019? 6? 14? (?) ?? 12:54, <baris.kazar@oracle.com>?? ??:
> > > > >
> > > > > > i would suggest trying (indexing and searching) without === '
> > > > > > ===
> > > > > > s and
> > > > > > see You can find it first.
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > >
> > > > > > On 6/13/19 11:25 AM, Matthias Müller wrote:
> > > > > > > I am currently matching botanic names (with possible mis-
> > > > > > > spellings)
> > > > > > > against an indexed referenced list with Lucene. After quick
> > > > > > > progress in
> > > > > > > the beginning, I am struggeling with the proper query
> > > > > > > design to
> > > > > > > achieve
> > > > > > > a ranking result I want.
> > > > > > >
> > > > > > > Here is an example:
> > > > > > >
> > > > > > > Search term: Acer campestre 'Rozi'
> > > > > > >
> > > > > > > Tokenized (decomposed) representation:
> > > > > > > acer
> > > > > > > campestre
> > > > > > > rozi
> > > > > > >
> > > > > > > Top 10 hits:
> > > > > > > {value=Acer campestre, score=12.288989}
> > > > > > > {value=Acer campestre 'Rozi', score=11.955223} // <- why is
> > > > > > > it
> > > > > > > 2nd?
> > > > > > > {value=Acer campestre 'Arends', score=10.640412}
> > > > > > > {value=Acer campestre subsp. leiocarpon, score=10.640412}
> > > > > > > {value=Acer campestre 'Carnival', score=10.640412}
> > > > > > > {value=Acer campestre 'Commodore', score=10.640412}
> > > > > > > {value=Acer campestre 'Nanum', score=10.640412}
> > > > > > > {value=Acer campestre 'Elsrijk', score=10.640412}
> > > > > > > {value=Acer campestre 'Fastigiatum', score=10.640412}
> > > > > > > {value=Acer campestre 'Geessink', score=10.640412}]
> > > > > > >
> > > > > > >
> > > > > > > And here is how I create my queries:
> > > > > > >
> > > > > > > final BooleanQuery.Builder builder = new
> > > > > > > BooleanQuery.Builder();
> > > > > > > // add individual tokens to query
> > > > > > > for (String token : fuzzyTokens) {
> > > > > > > final Term term = new Term(NAME_TOKENS.name(),
> > > > > > > token);
> > > > > > > final FuzzyQuery fq = new FuzzyQuery(term);
> > > > > > > builder.add(fq, BooleanClause.Occur.SHOULD);
> > > > > > > }
> > > > > > > return builder.build();
> > > > > > > }
> > > > > > >
> > > > > > >
> > > > > > > Input names are analyzed with a StandardTokenizer and
> > > > > > > Lowercase
> > > > > > > filter
> > > > > > > when they are added to the IndexWriter.
> > > > > > >
> > > > > > >
> > > > > > > My question: How can I get a ranking that scores
> > > > > > > "Acer campestre 'Rozi'" higher than "Acer campestre"?
> > > > > > > I am sure there is an obvious way to achieve this that I
> > > > > > > have
> > > > > > > yet
> > > > > > > failed to find.
> > > > > > >
> > > > > > >
> > > > > > > -Matthias
> > > > > > >
> > > > > > >
> > > > > > > ---------------------------------------------------------
> > > > > > > ----
> > > > > > > --------
> > > > > > > To unsubscribe, e-mail:
> > > > > > > java-user-unsubscribe@lucene.apache.org
> > > > > > > For additional commands, e-mail:
> > > > > > > java-user-help@lucene.apache.org
> > > > > > >
> > > > > > -----------------------------------------------------------
> > > > > > ----
> > > > > > ------
> > > > > > To unsubscribe, e-mail:
> > > > > > java-user-unsubscribe@lucene.apache.org
> > > > > > For additional commands, e-mail:
> > > > > > java-user-help@lucene.apache.org
> > > > > >
> > > > > >
> > > > ---------------------------------------------------------------
> > > > ------
> > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > > >
> > >
> > > -----------------------------------------------------------------
> > > ----
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org