Mailing List Archive

Token type similarity
Hello,

I wonder how token type are taken in account in similarity scoring.

From my test it appears that lucene do a scoring on the term text
and the term type separately.

For instance, with the documents (with term text/type)
d1: w1/t1 w2/t1 w3/t2
d2: w1/t2 w2/t1 w3/t1

and the search w1/t1, I get the same score for d1 and d2

Is there a way to improve the score of d1 because the same token
hat the right token text and type ?

Thanks

Paul Bédaride