Mailing List Archive

Behaviour change of Query.parse(String query) in 8.7.0 vs 2.9.4
Hello Team,

Recently we have started upgrading our Lucene core libraries to version
8.7.0 from v2.9.4.

Some of our tests started failing which were searching on a keyword in the
format 'abc-def_1-2014'.

In v2.9.4 this was parsed into a Phrase query with 3 terms "abc", "def_1"
and "2014" respectively.

We are using a CharTokenizer to create the terms.

So once we use this query to search in the index it fetches documents which
have all of these terms. Hence we were getting the expected results.

After we upgraded to 8.7.0 we started getting more results than expected.
While investigating we found that in 8.7.0 the default query returned by
Query.parse() method is Boolean Query. Hence the above keyword gets
tokenized and 3 boolean clauses are created with occur as SHOULD. Hence the
searcher gives us all the documents which has a matching term out of "abc",
"def_1" and "2014".

Hence we get more documents than what was expected earlier.

Please help with this how to look for exact match.




--
Sent from: https://lucene.472066.n3.nabble.com/Lucene-Java-Developer-f564358.html

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Behaviour change of Query.parse(String query) in 8.7.0 vs 2.9.4 [ In reply to ]
Hello Team,

Recently we have started upgrading our Lucene core libraries to version
8.7.0 from v2.9.4.

Some of our tests started failing which were searching on a keyword in the
format 'abc-def_1-2014'.

In v2.9.4 this was parsed into a Phrase query with 3 terms "abc", "def_1"
and "2014" respectively.

We are using a CharTokenizer to create the terms.

So once we use this query to search in the index it fetches documents which
have all of these terms. Hence we were getting the expected results.

After we upgraded to 8.7.0 we started getting more results than expected.
While investigating we found that in 8.7.0 the default query returned by
Query.parse() method is Boolean Query. Hence the above keyword gets
tokenized and 3 boolean clauses are created with occur as SHOULD. Hence the
searcher gives us all the documents which has a matching term out of "abc",
"def_1" and "2014".

Hence we get more documents than what was expected earlier.

Please help with this how to look for exact match.




--
Sent from: https://lucene.472066.n3.nabble.com/Lucene-Java-Developer-f564358.html

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: Behaviour change of Query.parse(String query) in 8.7.0 vs 2.9.4 [ In reply to ]
I believe that this is related to the fact that the classic query parser no
longer splits on whitespace and instead relies on the analyzer to compute
query terms. So if your tokenizer makes no difference between spaces and
dashes, you would indeed get a disjunction.

If you need to restore this behavior, I believe that you need to call
`queryParser.setSplitOnWhitespace(true)` and then
`queryParser.setAutoGeneratePhraseQueries(true)` on your query parser
before passing the query string to it.

On Wed, Feb 3, 2021 at 8:56 AM jitesh129 <jitesh129@gmail.com> wrote:

> Hello Team,
>
> Recently we have started upgrading our Lucene core libraries to version
> 8.7.0 from v2.9.4.
>
> Some of our tests started failing which were searching on a keyword in the
> format 'abc-def_1-2014'.
>
> In v2.9.4 this was parsed into a Phrase query with 3 terms "abc", "def_1"
> and "2014" respectively.
>
> We are using a CharTokenizer to create the terms.
>
> So once we use this query to search in the index it fetches documents which
> have all of these terms. Hence we were getting the expected results.
>
> After we upgraded to 8.7.0 we started getting more results than expected.
> While investigating we found that in 8.7.0 the default query returned by
> Query.parse() method is Boolean Query. Hence the above keyword gets
> tokenized and 3 boolean clauses are created with occur as SHOULD. Hence the
> searcher gives us all the documents which has a matching term out of "abc",
> "def_1" and "2014".
>
> Hence we get more documents than what was expected earlier.
>
> Please help with this how to look for exact match.
>
>
>
>
> --
> Sent from:
> https://lucene.472066.n3.nabble.com/Lucene-Java-Developer-f564358.html
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

--
Adrien
Re: Behaviour change of Query.parse(String query) in 8.7.0 vs 2.9.4 [ In reply to ]
Thanks Adrien.



--
Sent from: https://lucene.472066.n3.nabble.com/Lucene-Java-Developer-f564358.html

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org