Mailing List Archive: Suggested query parser syntax change fuzzy and boost operators (term^3~2)

Just so that it's not overlooked. I suggest a cleanup of the
(flexible?) query parser syntax in LUCENE-9528.

In short, the current javacc code is a tangled mess that is hard to
read, modify and make sense of.

https://issues.apache.org/jira/browse/LUCENE-9528

For example, these are all valid queries at the moment (flex qp):

1. assertQueryEquals("term~0.7", null, "term~1");
2. assertQueryEquals("term^3~", null, "(term~2)^3.0");
3. assertEquals(re, qp.parse("/http/~0.5", df));

The thing is:

1) fuzzy (and slop) are integers. They shouldn't parse and accept
floats, it's incorrect and misleading.
2) operator order in this case should matter: fuzzy should apply
first, boost to any other expression underneath (it has a wider
application than just term queries). This arbitrary-order syntax is
hardcoded in the parser and is wrong. This parses, for example:
term~3^3~4 and results in this query:
<fuzzy field='field' similarity='4.0' term='term'/>
3) Operators that don't apply to certain types of clauses should cause
parser exceptions. Can you guess what the query "/http/~0.5" parses
to? Looks like a regexp with a fuzzy factor, right? No, it parses to:

<fuzzy field='field' similarity='0.5' term='/http/'/>

because regexps don't allow fuzziness.

LUCENE-9528 cleans most of the above. The drawback: it is not a
backwards-compatible change (arguably this fixes parser errors, not
behavior).

Speak up if you have an opinion about not changing the above.

Dawid

[1] https://en.wikipedia.org/wiki/Tears_in_rain_monologue

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

+1 to be more strict about the order of operators. That's a bug fix imo.

Le jeu. 17 sept. 2020 à 08:58, Dawid Weiss <dawid.weiss@gmail.com> a écrit :

> Just so that it's not overlooked. I suggest a cleanup of the
> (flexible?) query parser syntax in LUCENE-9528.
>
> In short, the current javacc code is a tangled mess that is hard to
> read, modify and make sense of.
>
> https://issues.apache.org/jira/browse/LUCENE-9528
>
> For example, these are all valid queries at the moment (flex qp):
>
> 1. assertQueryEquals("term~0.7", null, "term~1");
> 2. assertQueryEquals("term^3~", null, "(term~2)^3.0");
> 3. assertEquals(re, qp.parse("/http/~0.5", df));
>
> The thing is:
>
> 1) fuzzy (and slop) are integers. They shouldn't parse and accept
> floats, it's incorrect and misleading.
> 2) operator order in this case should matter: fuzzy should apply
> first, boost to any other expression underneath (it has a wider
> application than just term queries). This arbitrary-order syntax is
> hardcoded in the parser and is wrong. This parses, for example:
> term~3^3~4 and results in this query:
> <fuzzy field='field' similarity='4.0' term='term'/>
> 3) Operators that don't apply to certain types of clauses should cause
> parser exceptions. Can you guess what the query "/http/~0.5" parses
> to? Looks like a regexp with a fuzzy factor, right? No, it parses to:
>
> <fuzzy field='field' similarity='0.5' term='/http/'/>
>
> because regexps don't allow fuzziness.
>
> LUCENE-9528 cleans most of the above. The drawback: it is not a
> backwards-compatible change (arguably this fixes parser errors, not
> behavior).
>
> Speak up if you have an opinion about not changing the above.
>
> Dawid
>
> [1] https://en.wikipedia.org/wiki/Tears_in_rain_monologue
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>