Mailing List Archive

Re: Negation search help
I badly need some help on this one. Someone please give some direction.


Regards
Amitesh



--
Sent from: https://lucene.472066.n3.nabble.com/Lucene-Java-Users-f532864.html

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Negation search help [ In reply to ]
Hi Amitesh

I don't have statistical proof , but I think it doesn't help on mailing
lists with volunteeers to write "I badly need some help", because it
seems to me the contrary will happen, that people will not help at all.

I think there are various reasons for this behaviour, which is
interesting from a pyschological point of view and it would be
interesting to study it in more detail.

Coming to your actual question:

https://lucene.472066.n3.nabble.com/Negation-search-help-td4471842.html

It seems to me that your use case makes an assumption that the search
algorithm has an understanding of what the user wants, which can be
tricky and depends on the scope/context/requirements, but also on how
much information the query contains.

Can you describe your use case with a more realistic scenario?

Thanks

Michael


Am 28.04.21 um 17:02 schrieb amitesh116:
> I badly need some help on this one. Someone please give some direction.
>
>
> Regards
> Amitesh
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Lucene-Java-Users-f532864.html
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Negation search help [ In reply to ]
Besides the possible statistical significance of your phrasing there are a
number of other things you can do to improve your chances of getting good
answers:

http://www.catb.org/~esr/faqs/smart-questions.html

On Wed, Apr 28, 2021 at 3:53 PM Michael Wechner <michael.wechner@wyona.com>
wrote:

> Hi Amitesh
>
> I don't have statistical proof , but I think it doesn't help on mailing
> lists with volunteeers to write "I badly need some help", because it
> seems to me the contrary will happen, that people will not help at all.
>
> I think there are various reasons for this behaviour, which is
> interesting from a pyschological point of view and it would be
> interesting to study it in more detail.
>
> Coming to your actual question:
>
> https://lucene.472066.n3.nabble.com/Negation-search-help-td4471842.html
>
> It seems to me that your use case makes an assumption that the search
> algorithm has an understanding of what the user wants, which can be
> tricky and depends on the scope/context/requirements, but also on how
> much information the query contains.
>
> Can you describe your use case with a more realistic scenario?
>
> Thanks
>
> Michael
>
>
> Am 28.04.21 um 17:02 schrieb amitesh116:
> > I badly need some help on this one. Someone please give some direction.
> >
> >
> > Regards
> > Amitesh
> >
> >
> >
> > --
> > Sent from:
> https://lucene.472066.n3.nabble.com/Lucene-Java-Users-f532864.html
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

--
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)
Re: Negation search help [ In reply to ]
Hi Gus, Thank you your reply!

In my search system; users are complaining that they get results with
negation terms when don't expect. As explained in my original post. User
don't want to get documents having a term like "Non Vitamin K" when they
search for "Vitamin K".

But because each terms are analyzed and tokenized, currently this system
does return both the docs. There are few more negation prefixes that we are
trying to handle.

Further, I did find a way to solve this problem.

I assigned tokenStream to each field and avoid use of analyzer. This way I
am able to control the full string assigned to a field for tokenization.
Also, I wish to clarify that though we wish to handle the negation searches
as above, end user are not expected to notice the difference in text ( i.e.
a Non Vitamin K will be stored as Non Vitamin K only but its tokenized form
would be tweaked so that it does not results in a search of "Vitamin K")

I will take cue from your advise and avoid such desperation in future.

Regards
Amitesh



--
Sent from: https://lucene.472066.n3.nabble.com/Lucene-Java-Users-f532864.html

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Negation search help [ In reply to ]
Hi Amitesh

Thanks for the more concrete examples.

Unfortunately I do not know how to solve this better with Lucene itself
in a more general context, but did you ever consider using BERT in
combination with Lucene/Solr

https://blog.google/products/search/search-language-understanding-bert/
https://dmitry-kan.medium.com/neural-search-with-bert-and-solr-ea5ead060b28
<https://dmitry-kan.medium.com/neural-search-with-bert-and-solr-ea5ead060b28>
https://medium.com/swlh/fun-with-apache-lucene-and-bert-embeddings-c2c496baa559
<https://medium.com/swlh/fun-with-apache-lucene-and-bert-embeddings-c2c496baa559>

HTH

Michael

Am 28.04.21 um 23:29 schrieb amitesh116:
> Hi Gus, Thank you your reply!
>
> In my search system; users are complaining that they get results with
> negation terms when don't expect. As explained in my original post. User
> don't want to get documents having a term like "Non Vitamin K" when they
> search for "Vitamin K".
>
> But because each terms are analyzed and tokenized, currently this system
> does return both the docs. There are few more negation prefixes that we are
> trying to handle.
>
> Further, I did find a way to solve this problem.
>
> I assigned tokenStream to each field and avoid use of analyzer. This way I
> am able to control the full string assigned to a field for tokenization.
> Also, I wish to clarify that though we wish to handle the negation searches
> as above, end user are not expected to notice the difference in text ( i.e.
> a Non Vitamin K will be stored as Non Vitamin K only but its tokenized form
> would be tweaked so that it does not results in a search of "Vitamin K")
>
> I will take cue from your advise and avoid such desperation in future.
>
> Regards
> Amitesh
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Lucene-Java-Users-f532864.html
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Negation search help [ In reply to ]
Thank you Michael!

I solved this requirement by setting the tokenStream at the field level and
not leaving it to the analyzer. This gives control over altering the full
text before tokenization using custom methods.
This has memory overhead which is handled by writing the documents one at a
time as against earlier approach of writing all documents in one go.

If required, I can share code snippets to show what I did?

Regards
Amitesh



--
Sent from: https://lucene.472066.n3.nabble.com/Lucene-Java-Users-f532864.html

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Negation search help [ In reply to ]
Yes, it would be great if you could share code snippets. Maybe it will
help others or maybe someone will have a suggestion to improve or an
alternative.

All the best

Michael

Am 29.04.21 um 14:35 schrieb amitesh116:
> Thank you Michael!
>
> I solved this requirement by setting the tokenStream at the field level and
> not leaving it to the analyzer. This gives control over altering the full
> text before tokenization using custom methods.
> This has memory overhead which is handled by writing the documents one at a
> time as against earlier approach of writing all documents in one go.
>
> If required, I can share code snippets to show what I did?
>
> Regards
> Amitesh
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Lucene-Java-Users-f532864.html
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Negation search help [ In reply to ]
//Method to create document
private static Document createDocumentTextField(HashMap<String, String>
fields) {
Document document = new Document();
for (String key : fields.keySet()) {
String val = fields.get(key);
Field f = new TextField(key, val, Field.Store.YES);
TokenStream result = getTokenStreamByVal(val);
f.setTokenStream(result);
document.add(f);
}
return document;
}

//Method handling cleanup and return tokenStream
private static TokenStream getTokenStreamByVal(String val) {
final StandardTokenizer source = new StandardTokenizer();
String NEGATION_PATTERN = "\\b(non|anti)( )";
Pattern negationPattern =
Pattern.compile(NEGATION_PATTERN,Pattern.CASE_INSENSITIVE);
String cleanedVal = return negationPattern.matcher(input).replaceAll("$1");
source.setReader(new
StringReader(StringUtil.cleanNegationSearchPrefix(val)));
TokenStream result = new StandardFilter(source);
//Add as many filters as required for your specific requirements
return result;
}



--
Sent from: https://lucene.472066.n3.nabble.com/Lucene-Java-Users-f532864.html

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Negation search help [ In reply to ]
During this change I had to change the way I store indexes. This change
results in too many .cfs and .fdt files generated against earlier.
Previously there were 5-7 files in index folder, now it has grown to 40+.
Does it affect having change in the way how indexes are stored internally
with this change. Notice that earlier all documents were added at once to
writer as against now, one document at a time

Earlier:

List<Document> documents = new ArrayList<>();
for (HashMap<String,String> c: dataList) {
documents.add(IndexUtil.createDocument(c));
}
writer.deleteAll();
writer.addDocuments(documents);
writer.commit();

Now:

writer.deleteAll();
for (HashMap<String,String> c: dataList) {
writer.addDocument(IndexUtil.createDocument(c));
}
writer.commit();



--
Sent from: https://lucene.472066.n3.nabble.com/Lucene-Java-Users-f532864.html

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org