Mailing List Archive

Lucene Query Parser Syntax Specification
I'm building a yacc/lex pair for the Apache Lucene Query Parser Syntax.
1) Has this already been done?

I'm using the specification published on 4/11/2012.
2) Is there a newer version of the specification?

I'm hitting shift/reduce conflicts and some constructs that I think should
be accepted by one implementation aren't. This probably means I am not
interpreting the specification correctly.
3) Is there a more format spec for the syntax than the above document?

Thanks in advance for any guidance.

Cheers, Scott
Re: Lucene Query Parser Syntax Specification [ In reply to ]
> 1) Has this already been done?

There are a number of query parsers in Lucene. Some of them are
implemented as javacc grammars - these should be convertible to
yacc/lex (do you really need C implementations though?).

> 2) Is there a newer version of the specification?

The source code is the de-facto specification.

D.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: Lucene Query Parser Syntax Specification [ In reply to ]
>> The source code is the de-facto specification

Fair enough although it does beg the question of which parser source code,
there being no shortage of Lucene/Solr/etc. query parsers, parser releases,
and parser versions at github. Anyway, below is my de jure yacc. I think
it covers everything in the 2012 specification and rounds out the special
cases a little.

Your comments are solicited and will be greatly appreciated.

Cheers, Scott

P.S. yacc/bison can generate parsers in programming languages other than C
including Java.

query : query TOK_AND query
| query TOK_OR query
| TOK_NOT query
| '(' query ')'
| term
term:
TOK_ALPHA |
TOK_WILD |
TOK_ALPHA ':' TOK_ALPHA |
TOK_ALPHA ':' TOK_WILD |
TOK_ALPHA '~' |
TOK_ALPHA '~' TOK_NUM |
TOK_ALPHA '^' TOK_NUM |
TOK_ALPHA ':' TOK_ALPHA '~' |
TOK_ALPHA ':' TOK_ALPHA '~' TOK_NUM |
TOK_ALPHA ':' TOK_ALPHA '^' TOK_NUM |
'"' TOK_ALPHA TOK_ALPHA '"' '~' TOK_NUM |
TOK_ALPHA ':' '[' TOK_NUM TOK_TO TOK_NUM ']' |
TOK_ALPHA ':' '{' TOK_ALPHA TOK_TO TOK_ALPHA '}' |
'+'TOK_ALPHA |
'-'TOK_ALPHA
Re: Lucene Query Parser Syntax Specification [ In reply to ]
There is no single query parser that covers all of Lucene's API
(various Query class implementations). The existing "query parsers"
cover subsets of the API - historically they vary depending on what
people needed, what kind of audience was targeted (technical users,
general public)...Many factors.

There is no "single" Lucene syntax that would cover everything -
people pick what they need and write query parsers that work for them,
typically. If you take a look at Elasticsearch, their primary "query"
is a structured DSL covering typical Query classes, not any plain text
representation. Perhaps this is the closest to what a "query parser"
for Lucene API should be.

Dawid

On Wed, Nov 11, 2020 at 9:54 PM Scott Guthery <sguthery@gmail.com> wrote:
>
> >> The source code is the de-facto specification
>
> Fair enough although it does beg the question of which parser source code, there being no shortage of Lucene/Solr/etc. query parsers, parser releases, and parser versions at github. Anyway, below is my de jure yacc. I think it covers everything in the 2012 specification and rounds out the special cases a little.
>
> Your comments are solicited and will be greatly appreciated.
>
> Cheers, Scott
>
> P.S. yacc/bison can generate parsers in programming languages other than C including Java.
>
> query : query TOK_AND query
> | query TOK_OR query
> | TOK_NOT query
> | '(' query ')'
> | term
> term:
> TOK_ALPHA |
> TOK_WILD |
> TOK_ALPHA ':' TOK_ALPHA |
> TOK_ALPHA ':' TOK_WILD |
> TOK_ALPHA '~' |
> TOK_ALPHA '~' TOK_NUM |
> TOK_ALPHA '^' TOK_NUM |
> TOK_ALPHA ':' TOK_ALPHA '~' |
> TOK_ALPHA ':' TOK_ALPHA '~' TOK_NUM |
> TOK_ALPHA ':' TOK_ALPHA '^' TOK_NUM |
> '"' TOK_ALPHA TOK_ALPHA '"' '~' TOK_NUM |
> TOK_ALPHA ':' '[' TOK_NUM TOK_TO TOK_NUM ']' |
> TOK_ALPHA ':' '{' TOK_ALPHA TOK_TO TOK_ALPHA '}' |
> '+'TOK_ALPHA |
> '-'TOK_ALPHA

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
RE: Lucene Query Parser Syntax Specification [ In reply to ]
I wish something like this existed for streaming expressions.
To have highlighting and validation in an editor would be great!

Sent from Mail for Windows 10

From: Scott Guthery
Sent: 11 November 2020 23:54
To: dev@lucene.apache.org
Subject: Re: Lucene Query Parser Syntax Specification

>> The source code is the de-facto specification 

Fair enough although it does beg the question of which parser source code, there being no shortage of Lucene/Solr/etc. query parsers, parser releases, and parser versions at github.  Anyway, below is my de jure yacc.  I think it covers everything in the 2012 specification and rounds out the special cases a little.

Your comments are solicited and will be greatly appreciated.

Cheers, Scott

P.S.  yacc/bison can generate parsers in programming languages other than C including Java.

query : query TOK_AND query
      | query TOK_OR query
      | TOK_NOT query
      |  '('  query ')'  
      | term 
term: 
TOK_ALPHA   | 
TOK_WILD    |
TOK_ALPHA ':' TOK_ALPHA |
TOK_ALPHA ':' TOK_WILD  |
TOK_ALPHA '~' |
TOK_ALPHA '~' TOK_NUM |
TOK_ALPHA '^' TOK_NUM |
TOK_ALPHA ':' TOK_ALPHA '~'   |
TOK_ALPHA ':' TOK_ALPHA '~' TOK_NUM  |
TOK_ALPHA ':' TOK_ALPHA '^' TOK_NUM  |
'"' TOK_ALPHA TOK_ALPHA '"' '~' TOK_NUM  |
TOK_ALPHA ':' '[' TOK_NUM TOK_TO TOK_NUM ']' |
TOK_ALPHA ':' '{' TOK_ALPHA TOK_TO TOK_ALPHA '}' |
'+'TOK_ALPHA  |
'-'TOK_ALPHA 
Re: Lucene Query Parser Syntax Specification [ In reply to ]
>>> I wish something like this existed for streaming expressions.

I'm not sure what you mean by 'streaming expressions' but yacc/bison and
lex/flex can be applied to bytes coming from any stream and hitch syntax
and semantic actions onto anything they recognize. There is, naturally,
the proviso that they have to wait for things they recognize and, depending
on your grammar, be able to look ahead which may entail waiting for the
next thing they recognize. For more details, check out the flex and bison
docs at gnu.org. I might add that they can handle a much richer set of
grammars than javacc and turn out a more complete parser.

As always, your mileage may vary.

Cheers, Scott
Re: Lucene Query Parser Syntax Specification [ In reply to ]
I have had this thought regarding IDE support too. I've had expressions
that when formatted for legibility are over 100 lines long, and adding
something in the middle that changes indenting is truly painful at that
point. At the moment I've got several irons in the fire already and can't
possibly take that on. The current implementation
(org.apache.solr.client.solrj.io.stream.expr.StreamExpressionParser) is
hand coded, and not generated from a grammar. So one would probably want to
correct that first so that syntax changes can be identified and adjusted in
downstream syntax highlighters relatively easily. Unfortunately when I
looked at this for Intellij briefly Intellij is favoring antlr, but javacc
and jflex are what we tend to use in the solr codebase.

-Gus

On Thu, Nov 12, 2020 at 7:02 AM ufuk y?lmaz <uyilmaz@vivaldi.net.invalid>
wrote:

> I wish something like this existed for streaming expressions.
>
> To have highlighting and validation in an editor would be great!
>
>
>
> Sent from Mail <https://go.microsoft.com/fwlink/?LinkId=550986> for
> Windows 10
>
>
>
> *From: *Scott Guthery <sguthery@gmail.com>
> *Sent: *11 November 2020 23:54
> *To: *dev@lucene.apache.org
> *Subject: *Re: Lucene Query Parser Syntax Specification
>
>
>
> >> The source code is the de-facto specification
>
>
>
> Fair enough although it does beg the question of which parser source code,
> there being no shortage of Lucene/Solr/etc. query parsers, parser releases,
> and parser versions at github. Anyway, below is my de jure yacc. I think
> it covers everything in the 2012 specification and rounds out the special
> cases a little.
>
>
>
> Your comments are solicited and will be greatly appreciated.
>
>
>
> Cheers, Scott
>
>
>
> P.S. yacc/bison can generate parsers in programming languages other than
> C including Java.
>
>
>
> query : query TOK_AND query
> | query TOK_OR query
> | TOK_NOT query
> | '(' query ')'
> | term
> term:
>
> TOK_ALPHA |
>
> TOK_WILD |
> TOK_ALPHA ':' TOK_ALPHA |
> TOK_ALPHA ':' TOK_WILD |
> TOK_ALPHA '~' |
> TOK_ALPHA '~' TOK_NUM |
> TOK_ALPHA '^' TOK_NUM |
> TOK_ALPHA ':' TOK_ALPHA '~' |
> TOK_ALPHA ':' TOK_ALPHA '~' TOK_NUM |
> TOK_ALPHA ':' TOK_ALPHA '^' TOK_NUM |
> '"' TOK_ALPHA TOK_ALPHA '"' '~' TOK_NUM |
> TOK_ALPHA ':' '[' TOK_NUM TOK_TO TOK_NUM ']' |
> TOK_ALPHA ':' '{' TOK_ALPHA TOK_TO TOK_ALPHA '}' |
> '+'TOK_ALPHA |
> '-'TOK_ALPHA
>
>
>


--
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)
Re: Lucene Query Parser Syntax Specification [ In reply to ]
I'd like to engage someone to develop a stand-alone GUI for a Lucene
database. I'm thinking something on the order of Luke but simpler and
more tailored to the database itself. It is a database of US patents
since 1976.

If you know of someone who might be interested would you kindly have
them get in touch with me (sguthery@gmail.com) or send along their
contact information.

Many thanks for your consideration. Apologies if this posting
violates the list's guidelines.

Cheers, Scott

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org