Hellooo,
Suppose a user enters ‘box of shoes’ in my search box. I have two documents
titled ‘box of clothes’ and ‘box of socks’. I’ve figured out through a
separate algorithm that ‘socks’ is more similar to ‘shoes’ than clothes.
I even have a numeric score for the similarity: for socks it’s 0.8 and for
clothes is 0.65
How can I feed this info to lucene to help it rank socks higher than
clothes?
I still want the usual tf-idf rules to apply. Ie’box’ and ‘of’ occur in a
lot of documents but ‘socks’ and ‘clothes’ are rarer so they should be
given more importance.
So I don’t want to have to overwrite the similarity class. I just want to
be able to pass in the info that ‘socks’ and ‘clothes’ are both kinda like
synonyms for shoes, but socks is more similar to shoes than clothes. May be
create a boost using the similarity score which doesn’t artificially boost
frequent / less important terms.
If I just provided them as regular synonyms, they they will both be
considered equal in weight.
Thanks.
Suppose a user enters ‘box of shoes’ in my search box. I have two documents
titled ‘box of clothes’ and ‘box of socks’. I’ve figured out through a
separate algorithm that ‘socks’ is more similar to ‘shoes’ than clothes.
I even have a numeric score for the similarity: for socks it’s 0.8 and for
clothes is 0.65
How can I feed this info to lucene to help it rank socks higher than
clothes?
I still want the usual tf-idf rules to apply. Ie’box’ and ‘of’ occur in a
lot of documents but ‘socks’ and ‘clothes’ are rarer so they should be
given more importance.
So I don’t want to have to overwrite the similarity class. I just want to
be able to pass in the info that ‘socks’ and ‘clothes’ are both kinda like
synonyms for shoes, but socks is more similar to shoes than clothes. May be
create a boost using the similarity score which doesn’t artificially boost
frequent / less important terms.
If I just provided them as regular synonyms, they they will both be
considered equal in weight.
Thanks.