Mailing List Archive

Reproducible crash matching phrases
I've been able to reproduce a crash we are seeing in our product with
newer Lucene versions.

I'm attaching a small Java code that reproduces this. It might look
weird, it's the result of removing every custom thing we are applying to
the query while still seeing the bug.

This is the crash I see with this code (with assertions disabled it
crashes in a different place):

Exception in thread "main" java.lang.AssertionError
??? at
org.apache.lucene.codecs.lucene84.Lucene84PostingsReader$EverythingEnum.nextPosition(Lucene84PostingsReader.java:940)
??? at
org.apache.lucene.search.PhrasePositions.nextPosition(PhrasePositions.java:57)
??? at
org.apache.lucene.search.PhrasePositions.firstPosition(PhrasePositions.java:46)
??? at
org.apache.lucene.search.SloppyPhraseMatcher.initSimple(SloppyPhraseMatcher.java:368)
??? at
org.apache.lucene.search.SloppyPhraseMatcher.initPhrasePositions(SloppyPhraseMatcher.java:356)
??? at
org.apache.lucene.search.SloppyPhraseMatcher.reset(SloppyPhraseMatcher.java:153)
??? at
org.apache.lucene.search.PhraseScorer$1.matches(PhraseScorer.java:49)
??? at
org.apache.lucene.search.DoubleValuesSource$WeightDoubleValuesSource$1.advanceExact(DoubleValuesSource.java:631)
??? at
org.apache.lucene.queries.function.FunctionScoreQuery$QueryBoostValuesSource$1.advanceExact(FunctionScoreQuery.java:343)
??? at
org.apache.lucene.search.DoubleValues$1.advanceExact(DoubleValues.java:53)
??? at
org.apache.lucene.search.DoubleValues$1.advanceExact(DoubleValues.java:53)
??? at
org.apache.lucene.queries.function.FunctionScoreQuery$MultiplicativeBoostValuesSource$1.advanceExact(FunctionScoreQuery.java:270)
??? at
org.apache.lucene.queries.function.FunctionScoreQuery$FunctionScoreWeight$1.score(FunctionScoreQuery.java:228)
??? at
org.apache.lucene.search.DoubleValuesSource$2.doubleValue(DoubleValuesSource.java:344)
??? at
org.apache.lucene.search.DoubleValues$1.doubleValue(DoubleValues.java:48)
??? at
org.apache.lucene.queries.function.FunctionScoreQuery$MultiplicativeBoostValuesSource$1.doubleValue(FunctionScoreQuery.java:265)
??? at
org.apache.lucene.queries.function.FunctionScoreQuery$FunctionScoreWeight$1.score(FunctionScoreQuery.java:229)
??? at
org.apache.lucene.search.TopScoreDocCollector$SimpleTopScoreDocCollector$1.collect(TopScoreDocCollector.java:76)
??? at
org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:276)
??? at
org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:232)
??? at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:39)
??? at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:661)
??? at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:445)
??? at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:574)
??? at
org.apache.lucene.search.IndexSearcher.searchAfter(IndexSearcher.java:421)
??? at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:432)
??? at com.wolfram.textsearch.LuceneCrash.main(LuceneCrash.java:48)

Interestingly, the bug does not happen if the index is created on a
ByteBuffersDirectory.

I hope this is useful!

Thanks!
Re: Reproducible crash matching phrases [ In reply to ]
: I've been able to reproduce a crash we are seeing in our product with newer
: Lucene versions.

Can you be specific? What exact versions of Lucene are you using that
reproduces this failure? If you know of other "older" versions where you
can't reproduce the problem, that info would also be helpful...


I tried running your test code against the current branch_8x and was
unable to trigger any sort of failure. I also tried using 8.4.1 based on
the stack trace indicating that you must be using a version of lucene no
older then 8.4 given the codec in use -- and was also unable to reproduce
any sort of problem.

Also note that as written your LuceneCrash code leaves an index on disk
which is re-used the next time the code is run: does the problem reproduce
for you if you manually "rm -r /tmp/xxx" and run it again, or is the
problem specific to having some "cruft" documents left in the index from
previous runs? Can you zip up the contents of /tmp/xxx on your machine
and attache it ti a new jira?


: Interestingly, the bug does not happen if the index is created on a
: ByteBuffersDirectory.

That makes it seem like the bug might be filesystem specific -- what impl
does the FSDirectory.open() call in your code return?



-Hoss
http://www.lucidworks.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Reproducible crash matching phrases [ In reply to ]
This happens on Lucene 8.8. I deleted the index and now I don't see the
problem. =( I'll post an updated version of the code shortly.

Thanks!

El 10/2/21 a las 19:01, Chris Hostetter escribi?:
> : I've been able to reproduce a crash we are seeing in our product with newer
> : Lucene versions.
>
> Can you be specific? What exact versions of Lucene are you using that
> reproduces this failure? If you know of other "older" versions where you
> can't reproduce the problem, that info would also be helpful...
>
>
> I tried running your test code against the current branch_8x and was
> unable to trigger any sort of failure. I also tried using 8.4.1 based on
> the stack trace indicating that you must be using a version of lucene no
> older then 8.4 given the codec in use -- and was also unable to reproduce
> any sort of problem.
>
> Also note that as written your LuceneCrash code leaves an index on disk
> which is re-used the next time the code is run: does the problem reproduce
> for you if you manually "rm -r /tmp/xxx" and run it again, or is the
> problem specific to having some "cruft" documents left in the index from
> previous runs? Can you zip up the contents of /tmp/xxx on your machine
> and attache it ti a new jira?
>
>
> : Interestingly, the bug does not happen if the index is created on a
> : ByteBuffersDirectory.
>
> That makes it seem like the bug might be filesystem specific -- what impl
> does the FSDirectory.open() call in your code return?
>
>
>
> -Hoss
> http://www.lucidworks.com/
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Reproducible crash matching phrases [ In reply to ]
The crash happens if you instead add these two fields:

??? ??? ??? doc.add(new TextField("ExampleText", "periodic function",
Field.Store.NO));
??? ??? ??? doc.add(new TextField("ExampleText", "plot of the original
function", Field.Store.NO));

I'm attaching an updated file as well this this changes.

This happens in Lucene 8.8.0 (and probably since 8.4.0).

Thanks!

El 10/2/21 a las 19:11, Nicol?s Lichtmaier escribi?:
> This happens on Lucene 8.8. I deleted the index and now I don't see
> the problem. =( I'll post an updated version of the code shortly.
>
> Thanks!
>
> El 10/2/21 a las 19:01, Chris Hostetter escribi?:
>> : I've been able to reproduce a crash we are seeing in our product
>> with newer
>> : Lucene versions.
>>
>> Can you be specific?? What exact versions of Lucene are you using that
>> reproduces this failure?? If you know of other "older" versions where
>> you
>> can't reproduce the problem, that info would also be helpful...
>>
>>
>> I tried running your test code against the current branch_8x and was
>> unable to trigger any sort of failure.? I also tried using 8.4.1
>> based on
>> the stack trace indicating that you must be using a version of lucene no
>> older then 8.4 given the codec in use -- and was also unable to
>> reproduce
>> any sort of problem.
>>
>> Also note that as written your LuceneCrash code leaves an index on disk
>> which is re-used the next time the code is run: does the problem
>> reproduce
>> for you if you manually "rm -r /tmp/xxx" and run it again, or is the
>> problem specific to having some "cruft" documents left in the index from
>> previous runs?? Can you zip up the contents of /tmp/xxx on your machine
>> and attache it ti a new jira?
>>
>>
>> : Interestingly, the bug does not happen if the index is created on a
>> : ByteBuffersDirectory.
>>
>> That makes it seem like the bug might be filesystem specific -- what
>> impl
>> does the FSDirectory.open() call in your code return?
>>
>>
>>
>> -Hoss
>> http://www.lucidworks.com/
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
Re: Reproducible crash matching phrases [ In reply to ]
: I'm attaching an updated file as well this this changes.
:
: This happens in Lucene 8.8.0 (and probably since 8.4.0).

Ok -- cool ... with the udpated code i was able to reproduce on branch_8x,
and with 8.8 & 8.7 (but not 8.4) -- I've distilled your patch into a test
case and attached to a new jira...

https://issues.apache.org/jira/browse/LUCENE-9762

FYI: with this updated code the error *DOES* reproduce for me regardless
of Directory type -- i suspect your original comment about it not failing
if you used ByteBuffersDirectory was because that would have been a
"clean" index everytime, and the old code was only failing with your
existing index on disk.

let's see if the folks with the low level expertise can figure out what's
going wrong here.


-Hoss
http://www.lucidworks.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Reproducible crash matching phrases [ In reply to ]
Great! Thanks!

Yes, I realized that this is actually directory-type independent.

El 10/2/21 a las 20:19, Chris Hostetter escribi?:
> : I'm attaching an updated file as well this this changes.
> :
> : This happens in Lucene 8.8.0 (and probably since 8.4.0).
>
> Ok -- cool ... with the udpated code i was able to reproduce on branch_8x,
> and with 8.8 & 8.7 (but not 8.4) -- I've distilled your patch into a test
> case and attached to a new jira...
>
> https://issues.apache.org/jira/browse/LUCENE-9762
>
> FYI: with this updated code the error *DOES* reproduce for me regardless
> of Directory type -- i suspect your original comment about it not failing
> if you used ByteBuffersDirectory was because that would have been a
> "clean" index everytime, and the old code was only failing with your
> existing index on disk.
>
> let's see if the folks with the low level expertise can figure out what's
> going wrong here.
>
>
> -Hoss
> http://www.lucidworks.com/
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org