Mailing List Archive

ContainingIntervalsSource alternative
Hello everyone,

I am experimenting with Interval Queries to get phrase match count within
parts of an indexed field.

ContainingIntervalsSource seemed to be the way to go but, it only considers
at most a single match per region.
Example:
Field value: "[a b c d e a c] e f g h [a c k]" (opening and closing square
braces are not part of the text, but shows region of the field I am
interested in)

Within the regions in the field I am trying to find all phrase match
positions for say 'a c' with slop=1
First region has 2 matches and the second region has 1.
ContainingIntervalsSource produces an iterator that produces the first
match position for the first region ([a b c d e a c]). but there are two
matches in this region. It seems this behavior is by design. Is it possible
to accomplish this with the existing interval sources or one should write a
custom one for this?

On a related note, does it make sense for ContainingIntervalsSource to
produce multiple match positions for the first segment?

Thanks,
Elbek.
Re: ContainingIntervalsSource alternative [ In reply to ]
Hi Elbek,
Maybe go with ContainedByIntervalsSource? ContainingIntervalsSource is
actually the big source filtered by small source, and
ContainedByIntervalsSource is the opposite so it should give the expect
behavior?

Best
Patrick

elbek kamoliddinov <elbek.dev@gmail.com> ?2021?6?2??? ??2:55???

> Hello everyone,
>
> I am experimenting with Interval Queries to get phrase match count within
> parts of an indexed field.
>
> ContainingIntervalsSource seemed to be the way to go but, it only considers
> at most a single match per region.
> Example:
> Field value: "[a b c d e a c] e f g h [a c k]" (opening and closing square
> braces are not part of the text, but shows region of the field I am
> interested in)
>
> Within the regions in the field I am trying to find all phrase match
> positions for say 'a c' with slop=1
> First region has 2 matches and the second region has 1.
> ContainingIntervalsSource produces an iterator that produces the first
> match position for the first region ([a b c d e a c]). but there are two
> matches in this region. It seems this behavior is by design. Is it possible
> to accomplish this with the existing interval sources or one should write a
> custom one for this?
>
> On a related note, does it make sense for ContainingIntervalsSource to
> produce multiple match positions for the first segment?
>
> Thanks,
> Elbek.
>
Re: ContainingIntervalsSource alternative [ In reply to ]
Ahh, Exactly what I was looking for. Thanks Patrick.

On Wed, Jun 2, 2021 at 6:16 PM Patrick Zhai <zhai7631@gmail.com> wrote:

> Hi Elbek,
> Maybe go with ContainedByIntervalsSource? ContainingIntervalsSource is
> actually the big source filtered by small source, and
> ContainedByIntervalsSource is the opposite so it should give the expect
> behavior?
>
> Best
> Patrick
>
> elbek kamoliddinov <elbek.dev@gmail.com> ?2021?6?2??? ??2:55???
>
> > Hello everyone,
> >
> > I am experimenting with Interval Queries to get phrase match count within
> > parts of an indexed field.
> >
> > ContainingIntervalsSource seemed to be the way to go but, it only
> considers
> > at most a single match per region.
> > Example:
> > Field value: "[a b c d e a c] e f g h [a c k]" (opening and closing
> square
> > braces are not part of the text, but shows region of the field I am
> > interested in)
> >
> > Within the regions in the field I am trying to find all phrase match
> > positions for say 'a c' with slop=1
> > First region has 2 matches and the second region has 1.
> > ContainingIntervalsSource produces an iterator that produces the first
> > match position for the first region ([a b c d e a c]). but there are two
> > matches in this region. It seems this behavior is by design. Is it
> possible
> > to accomplish this with the existing interval sources or one should
> write a
> > custom one for this?
> >
> > On a related note, does it make sense for ContainingIntervalsSource to
> > produce multiple match positions for the first segment?
> >
> > Thanks,
> > Elbek.
> >
>