Mailing List Archive

org.apache.lucene.search.BooleanWeight.bulkScorer() and BulkScorer.score()
Hi,-
I performance profiled my application via jvisualvm on Java
and saw that 75% of the search process from
org.apache.lucene.search.IndexSearcher.search() are spent on
these units:
org.apache.lucene.search.BooleanWeight.bulkScorer() and BulkScorer.score()
Is there any study or project to speed up these please?

Best regards
Re: org.apache.lucene.search.BooleanWeight.bulkScorer() and BulkScorer.score() [ In reply to ]
Is your profiler reporting inclusive or exclusive costs for each function?
Ie. does it exclude time spent in functions that are called within a
function? I'm asking because it makes total sense for IndexSearcher#search
to spend most of its time is BulkScorer#score, which coordinates the whole
matching+scoring process.

Having much time spent in BooleanWeight#bulkScorer is a bit surprising
however. This suggests that you have too many segments in your index (since
the bulk scorer needs to be recreated for every segment) or that your
average query matches a very low number of documents (so that Lucene spends
more time figuring out how best to find the matches versus actually finding
these matches).

On Sat, Oct 2, 2021 at 5:57 AM Baris Kazar <baris.kazar@oracle.com> wrote:

> Hi,-
> I performance profiled my application via jvisualvm on Java
> and saw that 75% of the search process from
> org.apache.lucene.search.IndexSearcher.search() are spent on
> these units:
> org.apache.lucene.search.BooleanWeight.bulkScorer() and BulkScorer.score()
> Is there any study or project to speed up these please?
>
> Best regards
>
>

--
Adrien
Re: org.apache.lucene.search.BooleanWeight.bulkScorer() and BulkScorer.score() [ In reply to ]
Hi Adrien,-
Thanks. Let me see next week the components (units, methods) within BulkScorer#score to see what takes most time among its called methods.

Jvisualvm reports for a method whole time including the time spent in the called methods and when you go down the execution tree it goes until the very last called method.

Regarding the second paragraph above:
when will there be too many segments in the Lucene index? i have 1 text field and 1 stored (non indexed) field.

I most of the time get a couple of thousands hits and i ask for top 20 of them. Could this be leading to
BooleanWeight#bulkScorer spending time?

Both of these units:
BooleanWeight#bulkScorer and BulkScorer#score spend equal amounts of time and totally make up
75% of IndexSearcher#search as i mentioned before.

Thanks for the swift reply
I appreciate very much


Best regards
________________________________
From: Adrien Grand <jpountz@gmail.com>
Sent: Saturday, October 2, 2021 1:44:40 AM
To: Lucene Users Mailing List <java-user@lucene.apache.org>
Cc: Baris Kazar <baris.kazar@oracle.com>
Subject: Re: org.apache.lucene.search.BooleanWeight.bulkScorer() and BulkScorer.score()

Is your profiler reporting inclusive or exclusive costs for each function? Ie. does it exclude time spent in functions that are called within a function? I'm asking because it makes total sense for IndexSearcher#search to spend most of its time is BulkScorer#score, which coordinates the whole matching+scoring process.

Having much time spent in BooleanWeight#bulkScorer is a bit surprising however. This suggests that you have too many segments in your index (since the bulk scorer needs to be recreated for every segment) or that your average query matches a very low number of documents (so that Lucene spends more time figuring out how best to find the matches versus actually finding these matches).

On Sat, Oct 2, 2021 at 5:57 AM Baris Kazar <baris.kazar@oracle.com<mailto:baris.kazar@oracle.com>> wrote:
Hi,-
I performance profiled my application via jvisualvm on Java
and saw that 75% of the search process from
org.apache.lucene.search.IndexSearcher.search() are spent on
these units:
org.apache.lucene.search.BooleanWeight.bulkScorer() and BulkScorer.score()
Is there any study or project to speed up these please?

Best regards



--
Adrien
Re: org.apache.lucene.search.BooleanWeight.bulkScorer() and BulkScorer.score() [ In reply to ]
Hi, -
I did more experiments and this time i looked into these methods:
org.apache.lucene.search.BooleanWeight.bulkScorer() and BulkScorer.score()


Lets start with BooleanWeight.bulkScorer() with its call tree and time spent:


BooleanWeight.bulkScorer()
-->> Weight.bulkScorer()
-->>-->> BooleanWeight.scorer()
-->>-->>-->>BooleanWeight.scorerSupplier()
-->>-->>-->>-->> Weight.scorerSupplier()
-->>-->>-->>-->>-->> TermQuery$Termweight.scorer()
-->>-->>-->>-->>-->>-->> org.apache.lucene.codecs.blocktree.SegmentTermsEnum.impacts()
-->>-->>-->>-->>-->>-->>-->> org.apache.lucene.codecs.lucene84.Lucene84PostingsReader.impacts()
-->>-->>-->>-->>-->>-->>-->>-->> org.apache.lucene.codecs.lucene84.Lucene84PostingsReader$BlockImpactsDocEnums.init()
-->>-->>-->>-->>-->>-->>-->>-->>-->> org.apache.lucene.codecs.lucene84.Lucene84SkipReader.init()
-->>-->>-->>-->>-->>-->>-->>-->>-->>-->> org.apache.lucene.codecs.MultiLevelSkipListReader.init()
-->>-->>-->>-->>-->>-->>-->>-->>-->>-->>-->> org.apache.lucene.codecs.MultiLevelSkipListReader.loadSkipLevels()
-->>-->>-->>-->>-->>-->>-->>-->>-->>-->>-->>-->> org.apache.lucene.store.DataInput.readVLong() (constittutes %100 of BooleanWeight.bulkScorer() time here)



Next: BulkScorer.score() with its call tree and time spent:



BulkScorer.score()
-->> Weight$DefaultBulkScorer.score()
-->>-->> Weight$DefaultBulkScorer.scoreAll()
-->>-->>-->> WANDScorer$1.nextDoc()
-->>-->>-->>-->> WANDScorer$1.advance()
-->>-->>-->>-->>-->> WANDScorer.access$300() (constitutes %65 of BulkScorer.score() time here)
-->>-->>-->>-->>-->> WANDScorer.access$100() (constitutes %30 of BulkScorer.score() time here)
-->>-->>-->>-->>-->> WANDScorer.access$400() (constitutes %5 of BulkScorer.score() time here)

Best regards

________________________________
From: Baris Kazar <baris.kazar@oracle.com>
Sent: Saturday, October 2, 2021 3:14 PM
To: Adrien Grand <jpountz@gmail.com>; Lucene Users Mailing List <java-user@lucene.apache.org>
Cc: Baris Kazar <baris.kazar@oracle.com>
Subject: Re: org.apache.lucene.search.BooleanWeight.bulkScorer() and BulkScorer.score()

Hi Adrien,-
Thanks. Let me see next week the components (units, methods) within BulkScorer#score to see what takes most time among its called methods.

Jvisualvm reports for a method whole time including the time spent in the called methods and when you go down the execution tree it goes until the very last called method.

Regarding the second paragraph above:
when will there be too many segments in the Lucene index? i have 1 text field and 1 stored (non indexed) field.

I most of the time get a couple of thousands hits and i ask for top 20 of them. Could this be leading to
BooleanWeight#bulkScorer spending time?

Both of these units:
BooleanWeight#bulkScorer and BulkScorer#score spend equal amounts of time and totally make up
75% of IndexSearcher#search as i mentioned before.

Thanks for the swift reply
I appreciate very much


Best regards
________________________________
From: Adrien Grand <jpountz@gmail.com>
Sent: Saturday, October 2, 2021 1:44:40 AM
To: Lucene Users Mailing List <java-user@lucene.apache.org>
Cc: Baris Kazar <baris.kazar@oracle.com>
Subject: Re: org.apache.lucene.search.BooleanWeight.bulkScorer() and BulkScorer.score()

Is your profiler reporting inclusive or exclusive costs for each function? Ie. does it exclude time spent in functions that are called within a function? I'm asking because it makes total sense for IndexSearcher#search to spend most of its time is BulkScorer#score, which coordinates the whole matching+scoring process.

Having much time spent in BooleanWeight#bulkScorer is a bit surprising however. This suggests that you have too many segments in your index (since the bulk scorer needs to be recreated for every segment) or that your average query matches a very low number of documents (so that Lucene spends more time figuring out how best to find the matches versus actually finding these matches).

On Sat, Oct 2, 2021 at 5:57 AM Baris Kazar <baris.kazar@oracle.com<mailto:baris.kazar@oracle.com>> wrote:
Hi,-
I performance profiled my application via jvisualvm on Java
and saw that 75% of the search process from
org.apache.lucene.search.IndexSearcher.search() are spent on
these units:
org.apache.lucene.search.BooleanWeight.bulkScorer() and BulkScorer.score()
Is there any study or project to speed up these please?

Best regards



--
Adrien
Re: org.apache.lucene.search.BooleanWeight.bulkScorer() and BulkScorer.score() [ In reply to ]
Hmm we should fix these access$ accessors by fixing the visibility of some
fields.

These breakdowns do not necessarily signal that something is wrong. Is the
query executing fast overall?

On Mon, Oct 4, 2021 at 11:57 PM Baris Kazar <baris.kazar@oracle.com> wrote:

> Hi, -
> I did more experiments and this time i looked into these methods:
> org.apache.lucene.search.BooleanWeight.bulkScorer() and BulkScorer.score()
>
>
> Lets start with BooleanWeight.bulkScorer() with its call tree and time
> spent:
>
>
> BooleanWeight.bulkScorer()
> -->> Weight.bulkScorer()
> -->>-->> BooleanWeight.scorer()
> -->>-->>-->>BooleanWeight.scorerSupplier()
> -->>-->>-->>-->> Weight.scorerSupplier()
> -->>-->>-->>-->>-->> TermQuery$Termweight.scorer()
> -->>-->>-->>-->>-->>-->>
> org.apache.lucene.codecs.blocktree.SegmentTermsEnum.impacts()
> -->>-->>-->>-->>-->>-->>-->>
> org.apache.lucene.codecs.lucene84.Lucene84PostingsReader.impacts()
> -->>-->>-->>-->>-->>-->>-->>-->>
> org.apache.lucene.codecs.lucene84.Lucene84PostingsReader$BlockImpactsDocEnums.init()
> -->>-->>-->>-->>-->>-->>-->>-->>-->>
> org.apache.lucene.codecs.lucene84.Lucene84SkipReader.init()
> -->>-->>-->>-->>-->>-->>-->>-->>-->>-->>
> org.apache.lucene.codecs.MultiLevelSkipListReader.init()
> -->>-->>-->>-->>-->>-->>-->>-->>-->>-->>-->>
> org.apache.lucene.codecs.MultiLevelSkipListReader.loadSkipLevels()
> -->>-->>-->>-->>-->>-->>-->>-->>-->>-->>-->>-->>
> org.apache.lucene.store.DataInput.readVLong() (constittutes %100 of
> BooleanWeight.bulkScorer() time here)
>
>
>
> Next: BulkScorer.score() with its call tree and time spent:
>
>
>
> BulkScorer.score()
> -->> Weight$DefaultBulkScorer.score()
> -->>-->> Weight$DefaultBulkScorer.scoreAll()
> -->>-->>-->> WANDScorer$1.nextDoc()
> -->>-->>-->>-->> WANDScorer$1.advance()
> -->>-->>-->>-->>-->> WANDScorer.access$300() (constitutes %65 of
> BulkScorer.score() time here)
> -->>-->>-->>-->>-->> WANDScorer.access$100() (constitutes %30 of
> BulkScorer.score() time here)
> -->>-->>-->>-->>-->> WANDScorer.access$400() (constitutes %5 of
> BulkScorer.score() time here)
>
> Best regards
>
> ________________________________
> From: Baris Kazar <baris.kazar@oracle.com>
> Sent: Saturday, October 2, 2021 3:14 PM
> To: Adrien Grand <jpountz@gmail.com>; Lucene Users Mailing List <
> java-user@lucene.apache.org>
> Cc: Baris Kazar <baris.kazar@oracle.com>
> Subject: Re: org.apache.lucene.search.BooleanWeight.bulkScorer() and
> BulkScorer.score()
>
> Hi Adrien,-
> Thanks. Let me see next week the components (units, methods) within
> BulkScorer#score to see what takes most time among its called methods.
>
> Jvisualvm reports for a method whole time including the time spent in the
> called methods and when you go down the execution tree it goes until the
> very last called method.
>
> Regarding the second paragraph above:
> when will there be too many segments in the Lucene index? i have 1 text
> field and 1 stored (non indexed) field.
>
> I most of the time get a couple of thousands hits and i ask for top 20 of
> them. Could this be leading to
> BooleanWeight#bulkScorer spending time?
>
> Both of these units:
> BooleanWeight#bulkScorer and BulkScorer#score spend equal amounts of time
> and totally make up
> 75% of IndexSearcher#search as i mentioned before.
>
> Thanks for the swift reply
> I appreciate very much
>
>
> Best regards
> ________________________________
> From: Adrien Grand <jpountz@gmail.com>
> Sent: Saturday, October 2, 2021 1:44:40 AM
> To: Lucene Users Mailing List <java-user@lucene.apache.org>
> Cc: Baris Kazar <baris.kazar@oracle.com>
> Subject: Re: org.apache.lucene.search.BooleanWeight.bulkScorer() and
> BulkScorer.score()
>
> Is your profiler reporting inclusive or exclusive costs for each function?
> Ie. does it exclude time spent in functions that are called within a
> function? I'm asking because it makes total sense for IndexSearcher#search
> to spend most of its time is BulkScorer#score, which coordinates the whole
> matching+scoring process.
>
> Having much time spent in BooleanWeight#bulkScorer is a bit surprising
> however. This suggests that you have too many segments in your index (since
> the bulk scorer needs to be recreated for every segment) or that your
> average query matches a very low number of documents (so that Lucene spends
> more time figuring out how best to find the matches versus actually finding
> these matches).
>
> On Sat, Oct 2, 2021 at 5:57 AM Baris Kazar <baris.kazar@oracle.com<mailto:
> baris.kazar@oracle.com>> wrote:
> Hi,-
> I performance profiled my application via jvisualvm on Java
> and saw that 75% of the search process from
> org.apache.lucene.search.IndexSearcher.search() are spent on
> these units:
> org.apache.lucene.search.BooleanWeight.bulkScorer() and BulkScorer.score()
> Is there any study or project to speed up these please?
>
> Best regards
>
>
>
> --
> Adrien
>


--
Adrien
Re: org.apache.lucene.search.BooleanWeight.bulkScorer() and BulkScorer.score() [ In reply to ]
Hi Adrien,-
Thanks for taking a look at it and sure, that will be very nice to fix those accessors.
It is ok in terms of speed and i want more faster though.
Is there anything else i should look at to help make it faster?
Best regards

________________________________
From: Adrien Grand <jpountz@gmail.com>
Sent: Tuesday, October 5, 2021 3:18 PM
To: Lucene Users Mailing List <java-user@lucene.apache.org>
Cc: Baris Kazar <baris.kazar@oracle.com>
Subject: Re: org.apache.lucene.search.BooleanWeight.bulkScorer() and BulkScorer.score()

Hmm we should fix these access$ accessors by fixing the visibility of some fields.

These breakdowns do not necessarily signal that something is wrong. Is the query executing fast overall?

On Mon, Oct 4, 2021 at 11:57 PM Baris Kazar <baris.kazar@oracle.com<mailto:baris.kazar@oracle.com>> wrote:
Hi, -
I did more experiments and this time i looked into these methods:
org.apache.lucene.search.BooleanWeight.bulkScorer() and BulkScorer.score()


Lets start with BooleanWeight.bulkScorer() with its call tree and time spent:


BooleanWeight.bulkScorer()
-->> Weight.bulkScorer()
-->>-->> BooleanWeight.scorer()
-->>-->>-->>BooleanWeight.scorerSupplier()
-->>-->>-->>-->> Weight.scorerSupplier()
-->>-->>-->>-->>-->> TermQuery$Termweight.scorer()
-->>-->>-->>-->>-->>-->> org.apache.lucene.codecs.blocktree.SegmentTermsEnum.impacts()
-->>-->>-->>-->>-->>-->>-->> org.apache.lucene.codecs.lucene84.Lucene84PostingsReader.impacts()
-->>-->>-->>-->>-->>-->>-->>-->> org.apache.lucene.codecs.lucene84.Lucene84PostingsReader$BlockImpactsDocEnums.init()
-->>-->>-->>-->>-->>-->>-->>-->>-->> org.apache.lucene.codecs.lucene84.Lucene84SkipReader.init()
-->>-->>-->>-->>-->>-->>-->>-->>-->>-->> org.apache.lucene.codecs.MultiLevelSkipListReader.init()
-->>-->>-->>-->>-->>-->>-->>-->>-->>-->>-->> org.apache.lucene.codecs.MultiLevelSkipListReader.loadSkipLevels()
-->>-->>-->>-->>-->>-->>-->>-->>-->>-->>-->>-->> org.apache.lucene.store.DataInput.readVLong() (constittutes %100 of BooleanWeight.bulkScorer() time here)



Next: BulkScorer.score() with its call tree and time spent:



BulkScorer.score()
-->> Weight$DefaultBulkScorer.score()
-->>-->> Weight$DefaultBulkScorer.scoreAll()
-->>-->>-->> WANDScorer$1.nextDoc()
-->>-->>-->>-->> WANDScorer$1.advance()
-->>-->>-->>-->>-->> WANDScorer.access$300() (constitutes %65 of BulkScorer.score() time here)
-->>-->>-->>-->>-->> WANDScorer.access$100() (constitutes %30 of BulkScorer.score() time here)
-->>-->>-->>-->>-->> WANDScorer.access$400() (constitutes %5 of BulkScorer.score() time here)

Best regards

________________________________
From: Baris Kazar <baris.kazar@oracle.com<mailto:baris.kazar@oracle.com>>
Sent: Saturday, October 2, 2021 3:14 PM
To: Adrien Grand <jpountz@gmail.com<mailto:jpountz@gmail.com>>; Lucene Users Mailing List <java-user@lucene.apache.org<mailto:java-user@lucene.apache.org>>
Cc: Baris Kazar <baris.kazar@oracle.com<mailto:baris.kazar@oracle.com>>
Subject: Re: org.apache.lucene.search.BooleanWeight.bulkScorer() and BulkScorer.score()

Hi Adrien,-
Thanks. Let me see next week the components (units, methods) within BulkScorer#score to see what takes most time among its called methods.

Jvisualvm reports for a method whole time including the time spent in the called methods and when you go down the execution tree it goes until the very last called method.

Regarding the second paragraph above:
when will there be too many segments in the Lucene index? i have 1 text field and 1 stored (non indexed) field.

I most of the time get a couple of thousands hits and i ask for top 20 of them. Could this be leading to
BooleanWeight#bulkScorer spending time?

Both of these units:
BooleanWeight#bulkScorer and BulkScorer#score spend equal amounts of time and totally make up
75% of IndexSearcher#search as i mentioned before.

Thanks for the swift reply
I appreciate very much


Best regards
________________________________
From: Adrien Grand <jpountz@gmail.com<mailto:jpountz@gmail.com>>
Sent: Saturday, October 2, 2021 1:44:40 AM
To: Lucene Users Mailing List <java-user@lucene.apache.org<mailto:java-user@lucene.apache.org>>
Cc: Baris Kazar <baris.kazar@oracle.com<mailto:baris.kazar@oracle.com>>
Subject: Re: org.apache.lucene.search.BooleanWeight.bulkScorer() and BulkScorer.score()

Is your profiler reporting inclusive or exclusive costs for each function? Ie. does it exclude time spent in functions that are called within a function? I'm asking because it makes total sense for IndexSearcher#search to spend most of its time is BulkScorer#score, which coordinates the whole matching+scoring process.

Having much time spent in BooleanWeight#bulkScorer is a bit surprising however. This suggests that you have too many segments in your index (since the bulk scorer needs to be recreated for every segment) or that your average query matches a very low number of documents (so that Lucene spends more time figuring out how best to find the matches versus actually finding these matches).

On Sat, Oct 2, 2021 at 5:57 AM Baris Kazar <baris.kazar@oracle.com<mailto:baris.kazar@oracle.com><mailto:baris.kazar@oracle.com<mailto:baris.kazar@oracle.com>>> wrote:
Hi,-
I performance profiled my application via jvisualvm on Java
and saw that 75% of the search process from
org.apache.lucene.search.IndexSearcher.search() are spent on
these units:
org.apache.lucene.search.BooleanWeight.bulkScorer() and BulkScorer.score()
Is there any study or project to speed up these please?

Best regards



--
Adrien


--
Adrien
Re: org.apache.lucene.search.BooleanWeight.bulkScorer() and BulkScorer.score() [ In reply to ]
Hi Adrien,-
Is there a best practice paper or Lucene document that shows the
benefit of IndexWriter.forceMerge and merge() methods since You mentioned about too many segments.
and maybe show this concept on a toy dataset as a best practice example.
Best regards
baris

________________________________
From: Baris Kazar <baris.kazar@oracle.com>
Sent: Tuesday, October 5, 2021 3:56 PM
To: Adrien Grand <jpountz@gmail.com>; Lucene Users Mailing List <java-user@lucene.apache.org>; Baris Kazar <baris.kazar@oracle.com>
Subject: Re: org.apache.lucene.search.BooleanWeight.bulkScorer() and BulkScorer.score()

Hi Adrien,-
Thanks for taking a look at it and sure, that will be very nice to fix those accessors.
It is ok in terms of speed and i want more faster though.
Is there anything else i should look at to help make it faster?
Best regards

________________________________
From: Adrien Grand <jpountz@gmail.com>
Sent: Tuesday, October 5, 2021 3:18 PM
To: Lucene Users Mailing List <java-user@lucene.apache.org>
Cc: Baris Kazar <baris.kazar@oracle.com>
Subject: Re: org.apache.lucene.search.BooleanWeight.bulkScorer() and BulkScorer.score()

Hmm we should fix these access$ accessors by fixing the visibility of some fields.

These breakdowns do not necessarily signal that something is wrong. Is the query executing fast overall?

On Mon, Oct 4, 2021 at 11:57 PM Baris Kazar <baris.kazar@oracle.com<mailto:baris.kazar@oracle.com>> wrote:
Hi, -
I did more experiments and this time i looked into these methods:
org.apache.lucene.search.BooleanWeight.bulkScorer() and BulkScorer.score()


Lets start with BooleanWeight.bulkScorer() with its call tree and time spent:


BooleanWeight.bulkScorer()
-->> Weight.bulkScorer()
-->>-->> BooleanWeight.scorer()
-->>-->>-->>BooleanWeight.scorerSupplier()
-->>-->>-->>-->> Weight.scorerSupplier()
-->>-->>-->>-->>-->> TermQuery$Termweight.scorer()
-->>-->>-->>-->>-->>-->> org.apache.lucene.codecs.blocktree.SegmentTermsEnum.impacts()
-->>-->>-->>-->>-->>-->>-->> org.apache.lucene.codecs.lucene84.Lucene84PostingsReader.impacts()
-->>-->>-->>-->>-->>-->>-->>-->> org.apache.lucene.codecs.lucene84.Lucene84PostingsReader$BlockImpactsDocEnums.init()
-->>-->>-->>-->>-->>-->>-->>-->>-->> org.apache.lucene.codecs.lucene84.Lucene84SkipReader.init()
-->>-->>-->>-->>-->>-->>-->>-->>-->>-->> org.apache.lucene.codecs.MultiLevelSkipListReader.init()
-->>-->>-->>-->>-->>-->>-->>-->>-->>-->>-->> org.apache.lucene.codecs.MultiLevelSkipListReader.loadSkipLevels()
-->>-->>-->>-->>-->>-->>-->>-->>-->>-->>-->>-->> org.apache.lucene.store.DataInput.readVLong() (constittutes %100 of BooleanWeight.bulkScorer() time here)



Next: BulkScorer.score() with its call tree and time spent:



BulkScorer.score()
-->> Weight$DefaultBulkScorer.score()
-->>-->> Weight$DefaultBulkScorer.scoreAll()
-->>-->>-->> WANDScorer$1.nextDoc()
-->>-->>-->>-->> WANDScorer$1.advance()
-->>-->>-->>-->>-->> WANDScorer.access$300() (constitutes %65 of BulkScorer.score() time here)
-->>-->>-->>-->>-->> WANDScorer.access$100() (constitutes %30 of BulkScorer.score() time here)
-->>-->>-->>-->>-->> WANDScorer.access$400() (constitutes %5 of BulkScorer.score() time here)

Best regards

________________________________
From: Baris Kazar <baris.kazar@oracle.com<mailto:baris.kazar@oracle.com>>
Sent: Saturday, October 2, 2021 3:14 PM
To: Adrien Grand <jpountz@gmail.com<mailto:jpountz@gmail.com>>; Lucene Users Mailing List <java-user@lucene.apache.org<mailto:java-user@lucene.apache.org>>
Cc: Baris Kazar <baris.kazar@oracle.com<mailto:baris.kazar@oracle.com>>
Subject: Re: org.apache.lucene.search.BooleanWeight.bulkScorer() and BulkScorer.score()

Hi Adrien,-
Thanks. Let me see next week the components (units, methods) within BulkScorer#score to see what takes most time among its called methods.

Jvisualvm reports for a method whole time including the time spent in the called methods and when you go down the execution tree it goes until the very last called method.

Regarding the second paragraph above:
when will there be too many segments in the Lucene index? i have 1 text field and 1 stored (non indexed) field.

I most of the time get a couple of thousands hits and i ask for top 20 of them. Could this be leading to
BooleanWeight#bulkScorer spending time?

Both of these units:
BooleanWeight#bulkScorer and BulkScorer#score spend equal amounts of time and totally make up
75% of IndexSearcher#search as i mentioned before.

Thanks for the swift reply
I appreciate very much


Best regards
________________________________
From: Adrien Grand <jpountz@gmail.com<mailto:jpountz@gmail.com>>
Sent: Saturday, October 2, 2021 1:44:40 AM
To: Lucene Users Mailing List <java-user@lucene.apache.org<mailto:java-user@lucene.apache.org>>
Cc: Baris Kazar <baris.kazar@oracle.com<mailto:baris.kazar@oracle.com>>
Subject: Re: org.apache.lucene.search.BooleanWeight.bulkScorer() and BulkScorer.score()

Is your profiler reporting inclusive or exclusive costs for each function? Ie. does it exclude time spent in functions that are called within a function? I'm asking because it makes total sense for IndexSearcher#search to spend most of its time is BulkScorer#score, which coordinates the whole matching+scoring process.

Having much time spent in BooleanWeight#bulkScorer is a bit surprising however. This suggests that you have too many segments in your index (since the bulk scorer needs to be recreated for every segment) or that your average query matches a very low number of documents (so that Lucene spends more time figuring out how best to find the matches versus actually finding these matches).

On Sat, Oct 2, 2021 at 5:57 AM Baris Kazar <baris.kazar@oracle.com<mailto:baris.kazar@oracle.com><mailto:baris.kazar@oracle.com<mailto:baris.kazar@oracle.com>>> wrote:
Hi,-
I performance profiled my application via jvisualvm on Java
and saw that 75% of the search process from
org.apache.lucene.search.IndexSearcher.search() are spent on
these units:
org.apache.lucene.search.BooleanWeight.bulkScorer() and BulkScorer.score()
Is there any study or project to speed up these please?

Best regards



--
Adrien


--
Adrien