Mailing List Archive

Consider Removing the `@` Special Character from RegExp
Hi All,

In looking at the Java Docs, our Lucene team noticed that the `@` symbol is
a reserved character in the Lucene regular expression syntax.

In re-visiting the page in curiosity, I found that the symbol was
[Optional] for "any string." This came at a surprise because there's a very
common way to achieve "any string" in `.*`. Is there any compelling reason
to preserve this tiny vector of complexity? I suspect there may be some
differences in the constructions of the finite automata produced by `.*`
and `@` but I am not sure.

If insignificant or non-existent, I suggest we remove `@` from the regular
expression syntax.

--
Marcus Eagan
Re: Consider Removing the `@` Special Character from RegExp [ In reply to ]
I think it's already an optional feature; if you construct the regexp with
explicit syntax flags you can get an instance that won't consider '@'
special. Haven't actually had a need to do that so I'm assuming it works as
documented.

/** Syntax flag, enables anystring (<code>@</code>). */
public static final int ANYSTRING = 0x0008;



On Thu, Jan 21, 2021 at 9:21 PM Marcus Eagan <marcuseagan@gmail.com> wrote:

> Hi All,
>
> In looking at the Java Docs, our Lucene team noticed that the `@` symbol
> is a reserved character in the Lucene regular expression syntax.
>
> In re-visiting the page in curiosity, I found that the symbol was
> [Optional] for "any string." This came at a surprise because there's a very
> common way to achieve "any string" in `.*`. Is there any compelling reason
> to preserve this tiny vector of complexity? I suspect there may be some
> differences in the constructions of the finite automata produced by `.*`
> and `@` but I am not sure.
>
> If insignificant or non-existent, I suggest we remove `@` from the regular
> expression syntax.
>
> --
> Marcus Eagan
>
>

--
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)
Re: Consider Removing the `@` Special Character from RegExp [ In reply to ]
That's right. It's optional. I think we should remove it unless we have a
good reason to keep it. I just think that it's maddening and unnecessary.
Perhaps, I am the only one?

On Fri, Jan 22, 2021 at 7:54 AM Gus Heck <gus.heck@gmail.com> wrote:

> I think it's already an optional feature; if you construct the regexp with
> explicit syntax flags you can get an instance that won't consider '@'
> special. Haven't actually had a need to do that so I'm assuming it works as
> documented.
>
> /** Syntax flag, enables anystring (<code>@</code>). */
> public static final int ANYSTRING = 0x0008;
>
>
>
> On Thu, Jan 21, 2021 at 9:21 PM Marcus Eagan <marcuseagan@gmail.com>
> wrote:
>
>> Hi All,
>>
>> In looking at the Java Docs, our Lucene team noticed that the `@` symbol
>> is a reserved character in the Lucene regular expression syntax.
>>
>> In re-visiting the page in curiosity, I found that the symbol was
>> [Optional] for "any string." This came at a surprise because there's a very
>> common way to achieve "any string" in `.*`. Is there any compelling reason
>> to preserve this tiny vector of complexity? I suspect there may be some
>> differences in the constructions of the finite automata produced by `.*`
>> and `@` but I am not sure.
>>
>> If insignificant or non-existent, I suggest we remove `@` from the
>> regular expression syntax.
>>
>> --
>> Marcus Eagan
>>
>>
>
> --
> http://www.needhamsoftware.com (work)
> http://www.the111shift.com (play)
>


--
Marcus Eagan