Mailing List Archive

Can I use DFISimilarity for search on an an index written with BM25Similarity ?
Hi,

I'm working on literature texts (French).

My users are interested in relevance tweaking to have the most suggested
texts (for their taste) in top results.

Change similarity at query time is less expensive than reindex all.

I checked that BM25 needs to write “norms“ to keep document length.

Have I missed something ? DFISimilarity seems to write and use norms
from SimilarityBase, where it is written

computeNorms  «Encodes the document length in the same way as {@link
BM25Similarity}»

For my first experiences, it seems that results with DFISimilarity at
query time are the same with an index encoded with default
BM25Similarity or DFI.

Can some gurus confirm with their experience ?

Thanks in advance (and lucene is really a good piece of software).

--
Frédéric


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Can I use DFISimilarity for search on an an index written with BM25Similarity ? [ In reply to ]
Yes, you can use DFISimilarity with an index constructed with
BM25Similarity. No need to reindex.

On Fri, Jun 14, 2019 at 1:05 PM Frédéric Glorieux <emploi@fictif.org> wrote:
>
> Hi,
>
> I'm working on literature texts (French).
>
> My users are interested in relevance tweaking to have the most suggested
> texts (for their taste) in top results.
>
> Change similarity at query time is less expensive than reindex all.
>
> I checked that BM25 needs to write “norms“ to keep document length.
>
> Have I missed something ? DFISimilarity seems to write and use norms
> from SimilarityBase, where it is written
>
> computeNorms «Encodes the document length in the same way as {@link
> BM25Similarity}»
>
> For my first experiences, it seems that results with DFISimilarity at
> query time are the same with an index encoded with default
> BM25Similarity or DFI.
>
> Can some gurus confirm with their experience ?
>
> Thanks in advance (and lucene is really a good piece of software).
>
> --
> Frédéric
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


--
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Can I use DFISimilarity for search on an an index written with BM25Similarity ? [ In reply to ]
Confirmed.

It is not a full lucene app and it is in French, but this query is very
good to observe the size bias in scoring of unequal docs (chapters of books)

http://obvil.lip6.fr/alix/snip.jsp?q=dédicace&sort=lmd

BM25 (or LMD) are incredibly robust to document size.

This can change deeply the relevance of MoreLikeThis queries and other
semantic calculations, great.

--
Frédéric


Le 14/06/2019 à 18:30, Adrien Grand a écrit :
> Yes, you can use DFISimilarity with an index constructed with
> BM25Similarity. No need to reindex.
>
> On Fri, Jun 14, 2019 at 1:05 PM Frédéric Glorieux <emploi@fictif.org> wrote:
>> Hi,
>>
>> I'm working on literature texts (French).
>>
>> My users are interested in relevance tweaking to have the most suggested
>> texts (for their taste) in top results.
>>
>> Change similarity at query time is less expensive than reindex all.
>>
>> I checked that BM25 needs to write “norms“ to keep document length.
>>
>> Have I missed something ? DFISimilarity seems to write and use norms
>> from SimilarityBase, where it is written
>>
>> computeNorms «Encodes the document length in the same way as {@link
>> BM25Similarity}»
>>
>> For my first experiences, it seems that results with DFISimilarity at
>> query time are the same with an index encoded with default
>> BM25Similarity or DFI.
>>
>> Can some gurus confirm with their experience ?
>>
>> Thanks in advance (and lucene is really a good piece of software).
>>
>> --
>> Frédéric
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org