Mailing List Archive

Fuzzy-phrase query with "holes" using intervals?
Hmm... Is there any way to express a query for a phrase-like sequence of tokens:

a b c d

but with potential "holes" (one or more terms missing):

- b c d
a - c d
a b - d
...

I've experimented with ordered(term("a"), term(b), ...), gaps and
atLeast but I can't get it to work. I could expand terms into several
queries manually but the number of potential subsets is quite large,
hence the question. Thanks for tips.

Dawid

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: Fuzzy-phrase query with "holes" using intervals? [ In reply to ]
I think you need a sort of ‘ordered atLeast’ here. Currently atLeast() is a mixture of a disjunction and an unordered interval, it should be possible to add something that adds additional constraints to the sets that it finds. I think you’d need to write some code though, I can’t see a way of doing it with the current group of interval operators.

> On 17 Sep 2020, at 14:20, Dawid Weiss <dawid.weiss@gmail.com> wrote:
>
> Hmm... Is there any way to express a query for a phrase-like sequence of tokens:
>
> a b c d
>
> but with potential "holes" (one or more terms missing):
>
> - b c d
> a - c d
> a b - d
> ...
>
> I've experimented with ordered(term("a"), term(b), ...), gaps and
> atLeast but I can't get it to work. I could expand terms into several
> queries manually but the number of potential subsets is quite large,
> hence the question. Thanks for tips.
>
> Dawid
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: Fuzzy-phrase query with "holes" using intervals? [ In reply to ]
Thanks Alan. I don't think my foo is strong enough to dive deep into
implementing intervals... yet. :) I'll try to clean up what's active
on my plate and maybe later I'll return to this.

Dawid

On Thu, Sep 17, 2020 at 3:53 PM Alan Woodward <romseygeek@gmail.com> wrote:
>
> I think you need a sort of ‘ordered atLeast’ here. Currently atLeast() is a mixture of a disjunction and an unordered interval, it should be possible to add something that adds additional constraints to the sets that it finds. I think you’d need to write some code though, I can’t see a way of doing it with the current group of interval operators.
>
> > On 17 Sep 2020, at 14:20, Dawid Weiss <dawid.weiss@gmail.com> wrote:
> >
> > Hmm... Is there any way to express a query for a phrase-like sequence of tokens:
> >
> > a b c d
> >
> > but with potential "holes" (one or more terms missing):
> >
> > - b c d
> > a - c d
> > a b - d
> > ...
> >
> > I've experimented with ordered(term("a"), term(b), ...), gaps and
> > atLeast but I can't get it to work. I could expand terms into several
> > queries manually but the number of potential subsets is quite large,
> > hence the question. Thanks for tips.
> >
> > Dawid
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: dev-help@lucene.apache.org
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org