Thanks for your feedback!
To better understand your answer I would like to consider the following code / example:
My code currently kind of looks like
int k = 3;
String question = "How old is Michael?";
IndexSearcher searcher =new IndexSearcher(indexReader);
float[] queryVector = getEmbedding(question);
Query query =new KnnVectorQuery(VECTOR_FIELD, queryVector, k, filter);
TopDocs topDocs = searcher.search(query, k);
The data (from topDocs) passed to the re-ranker kind of looks like
"question": "How old is Michael?",
"docs":[
"Michael lives in Switzerland"
"Michael was born 1969",
"Michael has three children"
]
So the expected returned result set would be
["2", "1", "3"] or ["2", "3", "1"]
basically saying the answer "Michael was born 1969" is the best answer
to the question "How old is Michael?".
So how would the code look like by providing something like a
VectorRerankField / FastVectorField?
Thanks
Michael
Am 10.02.23 um 17:02 schrieb Robert Muir:
> I think it would be good to provide something like a VectorRerankField
> (sorry for the bad name, maybe FastVectorField would be amusing too),
> that just stores vectors as docvalues (no HNSW) and has a
> newRescorer() method that implements
> org.apache.lucene.search.Rescorer. Then its easy to do as that
> document describes, pull top 500 hits with BM25 and rerank them with
> your vectors, very fast, only 500 calculations required, no HNSW or
> anything needed. Of course you could use a vector search instead of a
> BM25 search as the initial search to pull the top 500 hits too.
>
> So it could meet both use-cases and provide a really performant option
> for users that want to integrate vector search.
>
> On Fri, Feb 10, 2023 at 10:21 AM Michael Wechner
> <michael.wechner@wyona.com> wrote:
>> Hi
>>
>> I use the vector search of Lucene, whereas the embeddings I get from
>> SentenceBERT for example.
>>
>> According to
>>
>> https://www.sbert.net/examples/applications/retrieve_rerank/README.html
>>
>> a re-ranking with a cross-encoder after the vector search (bi-encoding)
>> can improve the ranking.
>>
>> Would it make sense to add this kind of functionality to Lucene or is
>> somebody already working on something similar?
>>
>> Thanks
>>
>> Michael
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail:java-user-help@lucene.apache.org
>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail:java-user-help@lucene.apache.org
>