Mailing List Archive

Re: Heap Size Space and Span Queries
Developers,
Is it expected for Spans? Can IntervalsQuery help here?

On Wed, Dec 14, 2022 at 5:41 PM Sjoerd Smeets <ssmeets@gmail.com> wrote:

> Hi,
>
> I've implemented a Span Query parser and when running the below query, I'm
> seeing Heap Size Space messages on certain shards:
>
> o.a.s.s.HttpSolrCall null:java.lang.RuntimeException:
> java.lang.OutOfMemoryError: Java heap space
>
> The span query that I'm running is the following:
>
> ((spanNear([unstemmed_text:charge, unstemmed_text:account], 4, false)
> spanNear([unstemmed_text:pledge, unstemmed_text:account], 4, false))
> spanNear([unstemmed_text:pledge, unstemmed_text:deposit], 4, false))
> spanNear([unstemmed_text:charge, unstemmed_text:deposit], 4, false)
>
> The heap size at the moment is set to 48Gb. We are running 4 shards in 1
> JVM and the 4 shards combined have 24M docs evenly distributed across the
> shards. We do use the collapse feature as well.
>
> This is on Solr 8.6.0
>
> What are the considerations for running Span Queries and heap sizes?
>
> Any suggestions are welcome
>
> Sjoerd
>


--
Sincerely yours
Mikhail Khludnev
Re: Heap Size Space and Span Queries [ In reply to ]
I don't think that nested boolean disjunctions consisting of isolated
spanNear queries at the leaves should have memory issues (as opposed
to nested spanNear queries around disjunctions, which might well do).
Am I misreading the string representation of that query? A little bit
more explicit information about how the query is built, so that we can
be certain of what we're dealing with, would be helpful.

It'd certainly be worth trying IntervalsQuery -- but part of what
makes me think I must be missing something in interpreting the string
representation of the query provided: it seems that simple phrase
queries would suffice here in place of spanNear?

Regarding SpanQuery vs. IntervalsQuery performance and
characteristics, there's some possibly-relevant discussion on
LUCENE-9204:

https://issues.apache.org/jira/browse/LUCENE-9204?focusedCommentId=17352589#comment-17352589

Michael


On Wed, Dec 14, 2022 at 1:27 PM Mikhail Khludnev <mkhl@apache.org> wrote:
>
> Developers,
> Is it expected for Spans? Can IntervalsQuery help here?
>
> On Wed, Dec 14, 2022 at 5:41 PM Sjoerd Smeets <ssmeets@gmail.com> wrote:
>>
>> Hi,
>>
>> I've implemented a Span Query parser and when running the below query, I'm
>> seeing Heap Size Space messages on certain shards:
>>
>> o.a.s.s.HttpSolrCall null:java.lang.RuntimeException:
>> java.lang.OutOfMemoryError: Java heap space
>>
>> The span query that I'm running is the following:
>>
>> ((spanNear([unstemmed_text:charge, unstemmed_text:account], 4, false)
>> spanNear([unstemmed_text:pledge, unstemmed_text:account], 4, false))
>> spanNear([unstemmed_text:pledge, unstemmed_text:deposit], 4, false))
>> spanNear([unstemmed_text:charge, unstemmed_text:deposit], 4, false)
>>
>> The heap size at the moment is set to 48Gb. We are running 4 shards in 1
>> JVM and the 4 shards combined have 24M docs evenly distributed across the
>> shards. We do use the collapse feature as well.
>>
>> This is on Solr 8.6.0
>>
>> What are the considerations for running Span Queries and heap sizes?
>>
>> Any suggestions are welcome
>>
>> Sjoerd
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: Heap Size Space and Span Queries [ In reply to ]
Michael, thanks for stepping in!

> it seems that simple phrase
queries would suffice here in place of spanNear?

I think it wouldn't. It seems to me 4 is slop, and false is inOrder.
Sjoerd, can you comment about particualt span queries you uses?
Also, do you have any heap dump summary to confirm high memory consumption
by spans?

On Thu, Dec 15, 2022 at 5:33 PM Michael Gibney <michael@michaelgibney.net>
wrote:

> I don't think that nested boolean disjunctions consisting of isolated
> spanNear queries at the leaves should have memory issues (as opposed
> to nested spanNear queries around disjunctions, which might well do).
> Am I misreading the string representation of that query? A little bit
> more explicit information about how the query is built, so that we can
> be certain of what we're dealing with, would be helpful.
>
> It'd certainly be worth trying IntervalsQuery -- but part of what
> makes me think I must be missing something in interpreting the string
> representation of the query provided: it seems that simple phrase
> queries would suffice here in place of spanNear?
>
> Regarding SpanQuery vs. IntervalsQuery performance and
> characteristics, there's some possibly-relevant discussion on
> LUCENE-9204:
>
>
> https://issues.apache.org/jira/browse/LUCENE-9204?focusedCommentId=17352589#comment-17352589
>
> Michael
>
>
> On Wed, Dec 14, 2022 at 1:27 PM Mikhail Khludnev <mkhl@apache.org> wrote:
> >
> > Developers,
> > Is it expected for Spans? Can IntervalsQuery help here?
> >
> > On Wed, Dec 14, 2022 at 5:41 PM Sjoerd Smeets <ssmeets@gmail.com> wrote:
> >>
> >> Hi,
> >>
> >> I've implemented a Span Query parser and when running the below query,
> I'm
> >> seeing Heap Size Space messages on certain shards:
> >>
> >> o.a.s.s.HttpSolrCall null:java.lang.RuntimeException:
> >> java.lang.OutOfMemoryError: Java heap space
> >>
> >> The span query that I'm running is the following:
> >>
> >> ((spanNear([unstemmed_text:charge, unstemmed_text:account], 4, false)
> >> spanNear([unstemmed_text:pledge, unstemmed_text:account], 4, false))
> >> spanNear([unstemmed_text:pledge, unstemmed_text:deposit], 4, false))
> >> spanNear([unstemmed_text:charge, unstemmed_text:deposit], 4, false)
> >>
> >> The heap size at the moment is set to 48Gb. We are running 4 shards in 1
> >> JVM and the 4 shards combined have 24M docs evenly distributed across
> the
> >> shards. We do use the collapse feature as well.
> >>
> >> This is on Solr 8.6.0
> >>
> >> What are the considerations for running Span Queries and heap sizes?
> >>
> >> Any suggestions are welcome
> >>
> >> Sjoerd
> >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

--
Sincerely yours
Mikhail Khludnev
Re: Heap Size Space and Span Queries [ In reply to ]
Hi
I scratched a simple qparser plugin to experiment with intervals in Solr.
https://github.com/mkhludnev/solr-flexible-qparser
I pushed the jar under releases, and described how to use it in README.md.
Sjoerd,
if spans really blows all heap, you can give a try with intervals with this
plugin. Notice the minimum Solr version required.

On Wed, Dec 14, 2022 at 9:26 PM Mikhail Khludnev <mkhl@apache.org> wrote:

> Developers,
> Is it expected for Spans? Can IntervalsQuery help here?
>
> On Wed, Dec 14, 2022 at 5:41 PM Sjoerd Smeets <ssmeets@gmail.com> wrote:
>
>> Hi,
>>
>> I've implemented a Span Query parser and when running the below query, I'm
>> seeing Heap Size Space messages on certain shards:
>>
>> o.a.s.s.HttpSolrCall null:java.lang.RuntimeException:
>> java.lang.OutOfMemoryError: Java heap space
>>
>> The span query that I'm running is the following:
>>
>> ((spanNear([unstemmed_text:charge, unstemmed_text:account], 4, false)
>> spanNear([unstemmed_text:pledge, unstemmed_text:account], 4, false))
>> spanNear([unstemmed_text:pledge, unstemmed_text:deposit], 4, false))
>> spanNear([unstemmed_text:charge, unstemmed_text:deposit], 4, false)
>>
>> The heap size at the moment is set to 48Gb. We are running 4 shards in 1
>> JVM and the 4 shards combined have 24M docs evenly distributed across the
>> shards. We do use the collapse feature as well.
>>
>> This is on Solr 8.6.0
>>
>> What are the considerations for running Span Queries and heap sizes?
>>
>> Any suggestions are welcome
>>
>> Sjoerd
>>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


--
Sincerely yours
Mikhail Khludnev
Re: Heap Size Space and Span Queries [ In reply to ]
> It seems to me 4 is slop, and false is inOrder
Yes, sorry I misspoke; I was wondering whether it'd be possible to
replace the uses of SpanNear in this case with something like `"term1
term2"~4` -- this should build a standard `PhraseQuery`, which does
support the concept of slop, and I think the default (only) behavior
of PhraseQuery analogous to SpanNear `inOrder` is equivalent to
`inOrder=false`.

But really I was more asking the question because I'm wondering
whether the SpanNears are wrapped in SpanOr query or something (in a
way that's not explicit from the provided string representation of the
query)?

On Thu, Dec 15, 2022 at 5:01 PM Mikhail Khludnev <mkhl@apache.org> wrote:
>
> Hi
> I scratched a simple qparser plugin to experiment with intervals in Solr.
> https://github.com/mkhludnev/solr-flexible-qparser
> I pushed the jar under releases, and described how to use it in README.md.
> Sjoerd,
> if spans really blows all heap, you can give a try with intervals with this plugin. Notice the minimum Solr version required.
>
> On Wed, Dec 14, 2022 at 9:26 PM Mikhail Khludnev <mkhl@apache.org> wrote:
>>
>> Developers,
>> Is it expected for Spans? Can IntervalsQuery help here?
>>
>> On Wed, Dec 14, 2022 at 5:41 PM Sjoerd Smeets <ssmeets@gmail.com> wrote:
>>>
>>> Hi,
>>>
>>> I've implemented a Span Query parser and when running the below query, I'm
>>> seeing Heap Size Space messages on certain shards:
>>>
>>> o.a.s.s.HttpSolrCall null:java.lang.RuntimeException:
>>> java.lang.OutOfMemoryError: Java heap space
>>>
>>> The span query that I'm running is the following:
>>>
>>> ((spanNear([unstemmed_text:charge, unstemmed_text:account], 4, false)
>>> spanNear([unstemmed_text:pledge, unstemmed_text:account], 4, false))
>>> spanNear([unstemmed_text:pledge, unstemmed_text:deposit], 4, false))
>>> spanNear([unstemmed_text:charge, unstemmed_text:deposit], 4, false)
>>>
>>> The heap size at the moment is set to 48Gb. We are running 4 shards in 1
>>> JVM and the 4 shards combined have 24M docs evenly distributed across the
>>> shards. We do use the collapse feature as well.
>>>
>>> This is on Solr 8.6.0
>>>
>>> What are the considerations for running Span Queries and heap sizes?
>>>
>>> Any suggestions are welcome
>>>
>>> Sjoerd
>>
>>
>>
>> --
>> Sincerely yours
>> Mikhail Khludnev
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: Heap Size Space and Span Queries [ In reply to ]
Spans seem to have the problem of creating huge "List<Something>" during
query iteration to track some stuff. I never understood the code, but to
me it was always crazy to have Lists populated during execution. We
replaced all SpanQueries by Intervals in patent search and speed is much
faster and heap usage is tiny.

A span/phrase with inOrder=false can always replaced by a phrase with
slop. The slop is always without order, as it is an "edit distance" only
(see documentation). If you need in order, an interval is required.

Phrases are only in order for "slop=0". Compare to "slop=1" which means
"next to each other" and is no longer in order.

Uwe

Am 15.12.2022 um 16:44 schrieb Mikhail Khludnev:
> Michael, thanks for stepping in!
>
> >  it seems that simple phrase
> queries would suffice here in place of spanNear?
>
> I think it wouldn't. It seems to me 4 is slop, and false is inOrder.
> Sjoerd, can you comment about particualt span queries you uses?
> Also, do you have any heap dump summary to confirm high memory
> consumption by spans?
>
> On Thu, Dec 15, 2022 at 5:33 PM Michael Gibney
> <michael@michaelgibney.net> wrote:
>
> I don't think that nested boolean disjunctions consisting of isolated
> spanNear queries at the leaves should have memory issues (as opposed
> to nested spanNear queries around disjunctions, which might well do).
> Am I misreading the string representation of that query? A little bit
> more explicit information about how the query is built, so that we can
> be certain of what we're dealing with, would be helpful.
>
> It'd certainly be worth trying IntervalsQuery -- but part of what
> makes me think I must be missing something in interpreting the string
> representation of the query provided: it seems that simple phrase
> queries would suffice here in place of spanNear?
>
> Regarding SpanQuery vs. IntervalsQuery performance and
> characteristics, there's some possibly-relevant discussion on
> LUCENE-9204:
>
> https://issues.apache.org/jira/browse/LUCENE-9204?focusedCommentId=17352589#comment-17352589
>
> Michael
>
>
> On Wed, Dec 14, 2022 at 1:27 PM Mikhail Khludnev <mkhl@apache.org>
> wrote:
> >
> > Developers,
> > Is it expected for Spans? Can IntervalsQuery help here?
> >
> > On Wed, Dec 14, 2022 at 5:41 PM Sjoerd Smeets
> <ssmeets@gmail.com> wrote:
> >>
> >> Hi,
> >>
> >> I've implemented a Span Query parser and when running the below
> query, I'm
> >> seeing Heap Size Space messages on certain shards:
> >>
> >> o.a.s.s.HttpSolrCall null:java.lang.RuntimeException:
> >> java.lang.OutOfMemoryError: Java heap space
> >>
> >> The span query that I'm running is the following:
> >>
> >> ((spanNear([unstemmed_text:charge, unstemmed_text:account], 4,
> false)
> >> spanNear([unstemmed_text:pledge, unstemmed_text:account], 4,
> false))
> >> spanNear([unstemmed_text:pledge, unstemmed_text:deposit], 4,
> false))
> >> spanNear([unstemmed_text:charge, unstemmed_text:deposit], 4, false)
> >>
> >> The heap size at the moment is set to 48Gb. We are running 4
> shards in 1
> >> JVM and the 4 shards combined have 24M docs evenly distributed
> across the
> >> shards. We do use the collapse feature as well.
> >>
> >> This is on Solr 8.6.0
> >>
> >> What are the considerations for running Span Queries and heap
> sizes?
> >>
> >> Any suggestions are welcome
> >>
> >> Sjoerd
> >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev

--
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail:uwe@thetaphi.de