Mailing List Archive: ContainingIntervalsSource alternative

ContainingIntervalsSource alternative

Jun 2, 2021, 2:55 PM

Post #1 of 3 (305 views)

Hello everyone,

I am experimenting with Interval Queries to get phrase match count within
parts of an indexed field.

ContainingIntervalsSource seemed to be the way to go but, it only considers
at most a single match per region.
Example:
Field value: "[a b c d e a c] e f g h [a c k]" (opening and closing square
braces are not part of the text, but shows region of the field I am
interested in)

Within the regions in the field I am trying to find all phrase match
positions for say 'a c' with slop=1
First region has 2 matches and the second region has 1.
ContainingIntervalsSource produces an iterator that produces the first
match position for the first region ([a b c d e a c]). but there are two
matches in this region. It seems this behavior is by design. Is it possible
to accomplish this with the existing interval sources or one should write a
custom one for this?

On a related note, does it make sense for ContainingIntervalsSource to
produce multiple match positions for the first segment?

Thanks,
Elbek.

Re: ContainingIntervalsSource alternative [ In reply to ]

zhai7631 at gmail

Jun 2, 2021, 3:16 PM

Post #2 of 3 (305 views)

Permalink

Hi Elbek,
Maybe go with ContainedByIntervalsSource? ContainingIntervalsSource is
actually the big source filtered by small source, and
ContainedByIntervalsSource is the opposite so it should give the expect
behavior?

Best
Patrick

elbek kamoliddinov <elbek.dev@gmail.com> ?2021?6?2??? ??2:55???

> Hello everyone,
>
> I am experimenting with Interval Queries to get phrase match count within
> parts of an indexed field.
>
> ContainingIntervalsSource seemed to be the way to go but, it only considers
> at most a single match per region.
> Example:
> Field value: "[a b c d e a c] e f g h [a c k]" (opening and closing square
> braces are not part of the text, but shows region of the field I am
> interested in)
>
> Within the regions in the field I am trying to find all phrase match
> positions for say 'a c' with slop=1
> First region has 2 matches and the second region has 1.
> ContainingIntervalsSource produces an iterator that produces the first
> match position for the first region ([a b c d e a c]). but there are two
> matches in this region. It seems this behavior is by design. Is it possible
> to accomplish this with the existing interval sources or one should write a
> custom one for this?
>
> On a related note, does it make sense for ContainingIntervalsSource to
> produce multiple match positions for the first segment?
>
> Thanks,
> Elbek.
>

Re: ContainingIntervalsSource alternative [ In reply to ]

elbek.dev at gmail

Jun 2, 2021, 9:38 PM

Post #3 of 3 (305 views)

Permalink

Ahh, Exactly what I was looking for. Thanks Patrick.

On Wed, Jun 2, 2021 at 6:16 PM Patrick Zhai <zhai7631@gmail.com> wrote:

> Hi Elbek,
> Maybe go with ContainedByIntervalsSource? ContainingIntervalsSource is
> actually the big source filtered by small source, and
> ContainedByIntervalsSource is the opposite so it should give the expect
> behavior?
>
> Best
> Patrick
>
> elbek kamoliddinov <elbek.dev@gmail.com> ?2021?6?2??? ??2:55???
>
> > Hello everyone,
> >
> > I am experimenting with Interval Queries to get phrase match count within
> > parts of an indexed field.
> >
> > ContainingIntervalsSource seemed to be the way to go but, it only
> considers
> > at most a single match per region.
> > Example:
> > Field value: "[a b c d e a c] e f g h [a c k]" (opening and closing
> square
> > braces are not part of the text, but shows region of the field I am
> > interested in)
> >
> > Within the regions in the field I am trying to find all phrase match
> > positions for say 'a c' with slop=1
> > First region has 2 matches and the second region has 1.
> > ContainingIntervalsSource produces an iterator that produces the first
> > match position for the first region ([a b c d e a c]). but there are two
> > matches in this region. It seems this behavior is by design. Is it
> possible
> > to accomplish this with the existing interval sources or one should
> write a
> > custom one for this?
> >
> > On a related note, does it make sense for ContainingIntervalsSource to
> > produce multiple match positions for the first segment?
> >
> > Thanks,
> > Elbek.
> >
>