Just so that it's not overlooked. I suggest a cleanup of the
(flexible?) query parser syntax in LUCENE-9528.
In short, the current javacc code is a tangled mess that is hard to
read, modify and make sense of.
https://issues.apache.org/jira/browse/LUCENE-9528
For example, these are all valid queries at the moment (flex qp):
1. assertQueryEquals("term~0.7", null, "term~1");
2. assertQueryEquals("term^3~", null, "(term~2)^3.0");
3. assertEquals(re, qp.parse("/http/~0.5", df));
The thing is:
1) fuzzy (and slop) are integers. They shouldn't parse and accept
floats, it's incorrect and misleading.
2) operator order in this case should matter: fuzzy should apply
first, boost to any other expression underneath (it has a wider
application than just term queries). This arbitrary-order syntax is
hardcoded in the parser and is wrong. This parses, for example:
term~3^3~4 and results in this query:
<fuzzy field='field' similarity='4.0' term='term'/>
3) Operators that don't apply to certain types of clauses should cause
parser exceptions. Can you guess what the query "/http/~0.5" parses
to? Looks like a regexp with a fuzzy factor, right? No, it parses to:
<fuzzy field='field' similarity='0.5' term='/http/'/>
because regexps don't allow fuzziness.
LUCENE-9528 cleans most of the above. The drawback: it is not a
backwards-compatible change (arguably this fixes parser errors, not
behavior).
Speak up if you have an opinion about not changing the above.
Dawid
[1] https://en.wikipedia.org/wiki/Tears_in_rain_monologue
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
(flexible?) query parser syntax in LUCENE-9528.
In short, the current javacc code is a tangled mess that is hard to
read, modify and make sense of.
https://issues.apache.org/jira/browse/LUCENE-9528
For example, these are all valid queries at the moment (flex qp):
1. assertQueryEquals("term~0.7", null, "term~1");
2. assertQueryEquals("term^3~", null, "(term~2)^3.0");
3. assertEquals(re, qp.parse("/http/~0.5", df));
The thing is:
1) fuzzy (and slop) are integers. They shouldn't parse and accept
floats, it's incorrect and misleading.
2) operator order in this case should matter: fuzzy should apply
first, boost to any other expression underneath (it has a wider
application than just term queries). This arbitrary-order syntax is
hardcoded in the parser and is wrong. This parses, for example:
term~3^3~4 and results in this query:
<fuzzy field='field' similarity='4.0' term='term'/>
3) Operators that don't apply to certain types of clauses should cause
parser exceptions. Can you guess what the query "/http/~0.5" parses
to? Looks like a regexp with a fuzzy factor, right? No, it parses to:
<fuzzy field='field' similarity='0.5' term='/http/'/>
because regexps don't allow fuzziness.
LUCENE-9528 cleans most of the above. The drawback: it is not a
backwards-compatible change (arguably this fixes parser errors, not
behavior).
Speak up if you have an opinion about not changing the above.
Dawid
[1] https://en.wikipedia.org/wiki/Tears_in_rain_monologue
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org