Mailing List Archive

Slower document retrieval in 8.7.0 comparing to 7.5.0
Hi,
We've migrated from 7.5.0 to 8.7.0 and find out that the index "searching"
is significantly (4-5 times) slower in the latest version.
It seems that
org.apache.lucene.search.IndexSearcher#doc(int)
is slower.

Is it possible to have similar performance with 8.7.0?

Best regards,
Martynas
Re: Slower document retrieval in 8.7.0 comparing to 7.5.0 [ In reply to ]
Hello Martynas,

There have indeed been changes related to stored fields in 8.7. What does
your workload look like and how large are your documents on average?

On Thu, Dec 3, 2020 at 3:04 PM Martynas L <martynas.sub@gmail.com> wrote:

> Hi,
> We've migrated from 7.5.0 to 8.7.0 and find out that the index "searching"
> is significantly (4-5 times) slower in the latest version.
> It seems that
> org.apache.lucene.search.IndexSearcher#doc(int)
> is slower.
>
> Is it possible to have similar performance with 8.7.0?
>
> Best regards,
> Martynas
>


--
Adrien
Re: Slower document retrieval in 8.7.0 comparing to 7.5.0 [ In reply to ]
Hello,

I am sorry for the delay.

Not sure what you mean by "workload". We have a performance tests, which
started failing after upgrading to 8.7.0.
So I just tried to query the index (built form the same source) to get all
documents and compare the performance with 7.5.0.

Document "size" is a sum of all stored string lengths (3402519 documents):

doc size 903 - 88s vs 22s

doc size 36 (only one field loaded, used searcher.doc(docID,
Collections.singleton("fieldName"))) - 78s vs 16s

doc size 439 (some fields made not stored) - 46s vs 14.5s

Best regards,
Martynas

On Fri, Dec 4, 2020 at 12:06 AM Adrien Grand <jpountz@gmail.com> wrote:

> Hello Martynas,
>
> There have indeed been changes related to stored fields in 8.7. What does
> your workload look like and how large are your documents on average?
>
> On Thu, Dec 3, 2020 at 3:04 PM Martynas L <martynas.sub@gmail.com> wrote:
>
> > Hi,
> > We've migrated from 7.5.0 to 8.7.0 and find out that the index
> "searching"
> > is significantly (4-5 times) slower in the latest version.
> > It seems that
> > org.apache.lucene.search.IndexSearcher#doc(int)
> > is slower.
> >
> > Is it possible to have similar performance with 8.7.0?
> >
> > Best regards,
> > Martynas
> >
>
>
> --
> Adrien
>
Re: Slower document retrieval in 8.7.0 comparing to 7.5.0 [ In reply to ]
I think it would be useful to have an example of a document and, if
possible, an example of query that takes too long.

On Mon, Dec 21, 2020 at 1:47 PM Martynas L <martynas.sub@gmail.com> wrote:

> Hello,
>
> I am sorry for the delay.
>
> Not sure what you mean by "workload". We have a performance tests, which
> started failing after upgrading to 8.7.0.
> So I just tried to query the index (built form the same source) to get all
> documents and compare the performance with 7.5.0.
>
> Document "size" is a sum of all stored string lengths (3402519 documents):
>
> doc size 903 - 88s vs 22s
>
> doc size 36 (only one field loaded, used searcher.doc(docID,
> Collections.singleton("fieldName"))) - 78s vs 16s
>
> doc size 439 (some fields made not stored) - 46s vs 14.5s
>
> Best regards,
> Martynas
>
> On Fri, Dec 4, 2020 at 12:06 AM Adrien Grand <jpountz@gmail.com> wrote:
>
> > Hello Martynas,
> >
> > There have indeed been changes related to stored fields in 8.7. What does
> > your workload look like and how large are your documents on average?
> >
> > On Thu, Dec 3, 2020 at 3:04 PM Martynas L <martynas.sub@gmail.com>
> wrote:
> >
> > > Hi,
> > > We've migrated from 7.5.0 to 8.7.0 and find out that the index
> > "searching"
> > > is significantly (4-5 times) slower in the latest version.
> > > It seems that
> > > org.apache.lucene.search.IndexSearcher#doc(int)
> > > is slower.
> > >
> > > Is it possible to have similar performance with 8.7.0?
> > >
> > > Best regards,
> > > Martynas
> > >
> >
> >
> > --
> > Adrien
> >
>


--
Vincenzo D'Amore
Re: Slower document retrieval in 8.7.0 comparing to 7.5.0 [ In reply to ]
Query is fast, but document retrieval is "slow".
We call:
1) IndexSearcher#search(Query, Collector) to collect docIDs, and then
2) retrieve documents with IndexSearcher#doc(int).

In our case (1) takes less than 0.5s, while (2) almost 1.5 min (4 times
slower than 7.5.0)

On Tue, Dec 22, 2020 at 3:23 PM Vincenzo D'Amore <v.damore@gmail.com> wrote:

> I think it would be useful to have an example of a document and, if
> possible, an example of query that takes too long.
>
> On Mon, Dec 21, 2020 at 1:47 PM Martynas L <martynas.sub@gmail.com> wrote:
>
> > Hello,
> >
> > I am sorry for the delay.
> >
> > Not sure what you mean by "workload". We have a performance tests, which
> > started failing after upgrading to 8.7.0.
> > So I just tried to query the index (built form the same source) to get
> all
> > documents and compare the performance with 7.5.0.
> >
> > Document "size" is a sum of all stored string lengths (3402519
> documents):
> >
> > doc size 903 - 88s vs 22s
> >
> > doc size 36 (only one field loaded, used searcher.doc(docID,
> > Collections.singleton("fieldName"))) - 78s vs 16s
> >
> > doc size 439 (some fields made not stored) - 46s vs 14.5s
> >
> > Best regards,
> > Martynas
> >
> > On Fri, Dec 4, 2020 at 12:06 AM Adrien Grand <jpountz@gmail.com> wrote:
> >
> > > Hello Martynas,
> > >
> > > There have indeed been changes related to stored fields in 8.7. What
> does
> > > your workload look like and how large are your documents on average?
> > >
> > > On Thu, Dec 3, 2020 at 3:04 PM Martynas L <martynas.sub@gmail.com>
> > wrote:
> > >
> > > > Hi,
> > > > We've migrated from 7.5.0 to 8.7.0 and find out that the index
> > > "searching"
> > > > is significantly (4-5 times) slower in the latest version.
> > > > It seems that
> > > > org.apache.lucene.search.IndexSearcher#doc(int)
> > > > is slower.
> > > >
> > > > Is it possible to have similar performance with 8.7.0?
> > > >
> > > > Best regards,
> > > > Martynas
> > > >
> > >
> > >
> > > --
> > > Adrien
> > >
> >
>
>
> --
> Vincenzo D'Amore
>
Re: Slower document retrieval in 8.7.0 comparing to 7.5.0 [ In reply to ]
Hi,

Please see attached sample.
IndexGenerator - creates a dummy index.
IndexReader - retrieves documents - duration time with 7.5.0 version is
~2s, while ~6s with 8.7.0

Regards,
Martynas

On Tue, Dec 22, 2020 at 3:23 PM Vincenzo D'Amore <v.damore@gmail.com> wrote:

> I think it would be useful to have an example of a document and, if
> possible, an example of query that takes too long.
>
> On Mon, Dec 21, 2020 at 1:47 PM Martynas L <martynas.sub@gmail.com> wrote:
>
> > Hello,
> >
> > I am sorry for the delay.
> >
> > Not sure what you mean by "workload". We have a performance tests, which
> > started failing after upgrading to 8.7.0.
> > So I just tried to query the index (built form the same source) to get
> all
> > documents and compare the performance with 7.5.0.
> >
> > Document "size" is a sum of all stored string lengths (3402519
> documents):
> >
> > doc size 903 - 88s vs 22s
> >
> > doc size 36 (only one field loaded, used searcher.doc(docID,
> > Collections.singleton("fieldName"))) - 78s vs 16s
> >
> > doc size 439 (some fields made not stored) - 46s vs 14.5s
> >
> > Best regards,
> > Martynas
> >
> > On Fri, Dec 4, 2020 at 12:06 AM Adrien Grand <jpountz@gmail.com> wrote:
> >
> > > Hello Martynas,
> > >
> > > There have indeed been changes related to stored fields in 8.7. What
> does
> > > your workload look like and how large are your documents on average?
> > >
> > > On Thu, Dec 3, 2020 at 3:04 PM Martynas L <martynas.sub@gmail.com>
> > wrote:
> > >
> > > > Hi,
> > > > We've migrated from 7.5.0 to 8.7.0 and find out that the index
> > > "searching"
> > > > is significantly (4-5 times) slower in the latest version.
> > > > It seems that
> > > > org.apache.lucene.search.IndexSearcher#doc(int)
> > > > is slower.
> > > >
> > > > Is it possible to have similar performance with 8.7.0?
> > > >
> > > > Best regards,
> > > > Martynas
> > > >
> > >
> > >
> > > --
> > > Adrien
> > >
> >
>
>
> --
> Vincenzo D'Amore
>
Re: Slower document retrieval in 8.7.0 comparing to 7.5.0 [ In reply to ]
Hello,

Are there any comments on this issue?
If there is no workaround, we will be forced to rollback to the 7.5.0
version.

Best regards,
Martynas

On Tue, Jan 12, 2021 at 12:27 PM Martynas L <martynas.sub@gmail.com> wrote:

> Hi,
>
> Please see attached sample.
> IndexGenerator - creates a dummy index.
> IndexReader - retrieves documents - duration time with 7.5.0 version is
> ~2s, while ~6s with 8.7.0
>
> Regards,
> Martynas
>
> On Tue, Dec 22, 2020 at 3:23 PM Vincenzo D'Amore <v.damore@gmail.com>
> wrote:
>
>> I think it would be useful to have an example of a document and, if
>> possible, an example of query that takes too long.
>>
>> On Mon, Dec 21, 2020 at 1:47 PM Martynas L <martynas.sub@gmail.com>
>> wrote:
>>
>> > Hello,
>> >
>> > I am sorry for the delay.
>> >
>> > Not sure what you mean by "workload". We have a performance tests, which
>> > started failing after upgrading to 8.7.0.
>> > So I just tried to query the index (built form the same source) to get
>> all
>> > documents and compare the performance with 7.5.0.
>> >
>> > Document "size" is a sum of all stored string lengths (3402519
>> documents):
>> >
>> > doc size 903 - 88s vs 22s
>> >
>> > doc size 36 (only one field loaded, used searcher.doc(docID,
>> > Collections.singleton("fieldName"))) - 78s vs 16s
>> >
>> > doc size 439 (some fields made not stored) - 46s vs 14.5s
>> >
>> > Best regards,
>> > Martynas
>> >
>> > On Fri, Dec 4, 2020 at 12:06 AM Adrien Grand <jpountz@gmail.com> wrote:
>> >
>> > > Hello Martynas,
>> > >
>> > > There have indeed been changes related to stored fields in 8.7. What
>> does
>> > > your workload look like and how large are your documents on average?
>> > >
>> > > On Thu, Dec 3, 2020 at 3:04 PM Martynas L <martynas.sub@gmail.com>
>> > wrote:
>> > >
>> > > > Hi,
>> > > > We've migrated from 7.5.0 to 8.7.0 and find out that the index
>> > > "searching"
>> > > > is significantly (4-5 times) slower in the latest version.
>> > > > It seems that
>> > > > org.apache.lucene.search.IndexSearcher#doc(int)
>> > > > is slower.
>> > > >
>> > > > Is it possible to have similar performance with 8.7.0?
>> > > >
>> > > > Best regards,
>> > > > Martynas
>> > > >
>> > >
>> > >
>> > > --
>> > > Adrien
>> > >
>> >
>>
>>
>> --
>> Vincenzo D'Amore
>>
>
Re: Slower document retrieval in 8.7.0 comparing to 7.5.0 [ In reply to ]
There is no attachment in the previous email that I can see? Maybe you can
post it online?

On Thu, Jan 21, 2021 at 4:54 PM Martynas L <martynas.sub@gmail.com> wrote:

> Hello,
>
> Are there any comments on this issue?
> If there is no workaround, we will be forced to rollback to the 7.5.0
> version.
>
> Best regards,
> Martynas
>
> On Tue, Jan 12, 2021 at 12:27 PM Martynas L <martynas.sub@gmail.com>
> wrote:
>
> > Hi,
> >
> > Please see attached sample.
> > IndexGenerator - creates a dummy index.
> > IndexReader - retrieves documents - duration time with 7.5.0 version is
> > ~2s, while ~6s with 8.7.0
> >
> > Regards,
> > Martynas
> >
> > On Tue, Dec 22, 2020 at 3:23 PM Vincenzo D'Amore <v.damore@gmail.com>
> > wrote:
> >
> >> I think it would be useful to have an example of a document and, if
> >> possible, an example of query that takes too long.
> >>
> >> On Mon, Dec 21, 2020 at 1:47 PM Martynas L <martynas.sub@gmail.com>
> >> wrote:
> >>
> >> > Hello,
> >> >
> >> > I am sorry for the delay.
> >> >
> >> > Not sure what you mean by "workload". We have a performance tests,
> which
> >> > started failing after upgrading to 8.7.0.
> >> > So I just tried to query the index (built form the same source) to get
> >> all
> >> > documents and compare the performance with 7.5.0.
> >> >
> >> > Document "size" is a sum of all stored string lengths (3402519
> >> documents):
> >> >
> >> > doc size 903 - 88s vs 22s
> >> >
> >> > doc size 36 (only one field loaded, used searcher.doc(docID,
> >> > Collections.singleton("fieldName"))) - 78s vs 16s
> >> >
> >> > doc size 439 (some fields made not stored) - 46s vs 14.5s
> >> >
> >> > Best regards,
> >> > Martynas
> >> >
> >> > On Fri, Dec 4, 2020 at 12:06 AM Adrien Grand <jpountz@gmail.com>
> wrote:
> >> >
> >> > > Hello Martynas,
> >> > >
> >> > > There have indeed been changes related to stored fields in 8.7. What
> >> does
> >> > > your workload look like and how large are your documents on average?
> >> > >
> >> > > On Thu, Dec 3, 2020 at 3:04 PM Martynas L <martynas.sub@gmail.com>
> >> > wrote:
> >> > >
> >> > > > Hi,
> >> > > > We've migrated from 7.5.0 to 8.7.0 and find out that the index
> >> > > "searching"
> >> > > > is significantly (4-5 times) slower in the latest version.
> >> > > > It seems that
> >> > > > org.apache.lucene.search.IndexSearcher#doc(int)
> >> > > > is slower.
> >> > > >
> >> > > > Is it possible to have similar performance with 8.7.0?
> >> > > >
> >> > > > Best regards,
> >> > > > Martynas
> >> > > >
> >> > >
> >> > >
> >> > > --
> >> > > Adrien
> >> > >
> >> >
> >>
> >>
> >> --
> >> Vincenzo D'Amore
> >>
> >
>
Re: Slower document retrieval in 8.7.0 comparing to 7.5.0 [ In reply to ]
Please see the sample at
https://drive.google.com/drive/folders/1ufVZXzkugBAFnuy8HLAY6mbPWzjknrfE

IndexGenerator - creates a dummy index.
IndexReader - retrieves documents - duration time with 7.5.0 version is
~2s, while ~6s with 8.7.0

Regards,
Martynas


On Thu, Jan 21, 2021 at 8:21 PM Rob Audenaerde <rob.audenaerde@gmail.com>
wrote:

> There is no attachment in the previous email that I can see? Maybe you can
> post it online?
>
> On Thu, Jan 21, 2021 at 4:54 PM Martynas L <martynas.sub@gmail.com> wrote:
>
> > Hello,
> >
> > Are there any comments on this issue?
> > If there is no workaround, we will be forced to rollback to the 7.5.0
> > version.
> >
> > Best regards,
> > Martynas
> >
> > On Tue, Jan 12, 2021 at 12:27 PM Martynas L <martynas.sub@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > Please see attached sample.
> > > IndexGenerator - creates a dummy index.
> > > IndexReader - retrieves documents - duration time with 7.5.0 version is
> > > ~2s, while ~6s with 8.7.0
> > >
> > > Regards,
> > > Martynas
> > >
> > > On Tue, Dec 22, 2020 at 3:23 PM Vincenzo D'Amore <v.damore@gmail.com>
> > > wrote:
> > >
> > >> I think it would be useful to have an example of a document and, if
> > >> possible, an example of query that takes too long.
> > >>
> > >> On Mon, Dec 21, 2020 at 1:47 PM Martynas L <martynas.sub@gmail.com>
> > >> wrote:
> > >>
> > >> > Hello,
> > >> >
> > >> > I am sorry for the delay.
> > >> >
> > >> > Not sure what you mean by "workload". We have a performance tests,
> > which
> > >> > started failing after upgrading to 8.7.0.
> > >> > So I just tried to query the index (built form the same source) to
> get
> > >> all
> > >> > documents and compare the performance with 7.5.0.
> > >> >
> > >> > Document "size" is a sum of all stored string lengths (3402519
> > >> documents):
> > >> >
> > >> > doc size 903 - 88s vs 22s
> > >> >
> > >> > doc size 36 (only one field loaded, used searcher.doc(docID,
> > >> > Collections.singleton("fieldName"))) - 78s vs 16s
> > >> >
> > >> > doc size 439 (some fields made not stored) - 46s vs 14.5s
> > >> >
> > >> > Best regards,
> > >> > Martynas
> > >> >
> > >> > On Fri, Dec 4, 2020 at 12:06 AM Adrien Grand <jpountz@gmail.com>
> > wrote:
> > >> >
> > >> > > Hello Martynas,
> > >> > >
> > >> > > There have indeed been changes related to stored fields in 8.7.
> What
> > >> does
> > >> > > your workload look like and how large are your documents on
> average?
> > >> > >
> > >> > > On Thu, Dec 3, 2020 at 3:04 PM Martynas L <martynas.sub@gmail.com
> >
> > >> > wrote:
> > >> > >
> > >> > > > Hi,
> > >> > > > We've migrated from 7.5.0 to 8.7.0 and find out that the index
> > >> > > "searching"
> > >> > > > is significantly (4-5 times) slower in the latest version.
> > >> > > > It seems that
> > >> > > > org.apache.lucene.search.IndexSearcher#doc(int)
> > >> > > > is slower.
> > >> > > >
> > >> > > > Is it possible to have similar performance with 8.7.0?
> > >> > > >
> > >> > > > Best regards,
> > >> > > > Martynas
> > >> > > >
> > >> > >
> > >> > >
> > >> > > --
> > >> > > Adrien
> > >> > >
> > >> >
> > >>
> > >>
> > >> --
> > >> Vincenzo D'Amore
> > >>
> > >
> >
>
Re: Slower document retrieval in 8.7.0 comparing to 7.5.0 [ In reply to ]
Hi Martrynas,

In your sample code you are retrieving all (1 million!) documents from the
index, that surely is not a good match for lucene :)

Is that a good reflection of your use-case?

On Fri, Jan 22, 2021 at 9:52 AM Martynas L <martynas.sub@gmail.com> wrote:

> Please see the sample at
> https://drive.google.com/drive/folders/1ufVZXzkugBAFnuy8HLAY6mbPWzjknrfE
>
> IndexGenerator - creates a dummy index.
> IndexReader - retrieves documents - duration time with 7.5.0 version is
> ~2s, while ~6s with 8.7.0
>
> Regards,
> Martynas
>
>
> On Thu, Jan 21, 2021 at 8:21 PM Rob Audenaerde <rob.audenaerde@gmail.com>
> wrote:
>
> > There is no attachment in the previous email that I can see? Maybe you
> can
> > post it online?
> >
> > On Thu, Jan 21, 2021 at 4:54 PM Martynas L <martynas.sub@gmail.com>
> wrote:
> >
> > > Hello,
> > >
> > > Are there any comments on this issue?
> > > If there is no workaround, we will be forced to rollback to the 7.5.0
> > > version.
> > >
> > > Best regards,
> > > Martynas
> > >
> > > On Tue, Jan 12, 2021 at 12:27 PM Martynas L <martynas.sub@gmail.com>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > Please see attached sample.
> > > > IndexGenerator - creates a dummy index.
> > > > IndexReader - retrieves documents - duration time with 7.5.0 version
> is
> > > > ~2s, while ~6s with 8.7.0
> > > >
> > > > Regards,
> > > > Martynas
> > > >
> > > > On Tue, Dec 22, 2020 at 3:23 PM Vincenzo D'Amore <v.damore@gmail.com
> >
> > > > wrote:
> > > >
> > > >> I think it would be useful to have an example of a document and, if
> > > >> possible, an example of query that takes too long.
> > > >>
> > > >> On Mon, Dec 21, 2020 at 1:47 PM Martynas L <martynas.sub@gmail.com>
> > > >> wrote:
> > > >>
> > > >> > Hello,
> > > >> >
> > > >> > I am sorry for the delay.
> > > >> >
> > > >> > Not sure what you mean by "workload". We have a performance tests,
> > > which
> > > >> > started failing after upgrading to 8.7.0.
> > > >> > So I just tried to query the index (built form the same source) to
> > get
> > > >> all
> > > >> > documents and compare the performance with 7.5.0.
> > > >> >
> > > >> > Document "size" is a sum of all stored string lengths (3402519
> > > >> documents):
> > > >> >
> > > >> > doc size 903 - 88s vs 22s
> > > >> >
> > > >> > doc size 36 (only one field loaded, used searcher.doc(docID,
> > > >> > Collections.singleton("fieldName"))) - 78s vs 16s
> > > >> >
> > > >> > doc size 439 (some fields made not stored) - 46s vs 14.5s
> > > >> >
> > > >> > Best regards,
> > > >> > Martynas
> > > >> >
> > > >> > On Fri, Dec 4, 2020 at 12:06 AM Adrien Grand <jpountz@gmail.com>
> > > wrote:
> > > >> >
> > > >> > > Hello Martynas,
> > > >> > >
> > > >> > > There have indeed been changes related to stored fields in 8.7.
> > What
> > > >> does
> > > >> > > your workload look like and how large are your documents on
> > average?
> > > >> > >
> > > >> > > On Thu, Dec 3, 2020 at 3:04 PM Martynas L <
> martynas.sub@gmail.com
> > >
> > > >> > wrote:
> > > >> > >
> > > >> > > > Hi,
> > > >> > > > We've migrated from 7.5.0 to 8.7.0 and find out that the index
> > > >> > > "searching"
> > > >> > > > is significantly (4-5 times) slower in the latest version.
> > > >> > > > It seems that
> > > >> > > > org.apache.lucene.search.IndexSearcher#doc(int)
> > > >> > > > is slower.
> > > >> > > >
> > > >> > > > Is it possible to have similar performance with 8.7.0?
> > > >> > > >
> > > >> > > > Best regards,
> > > >> > > > Martynas
> > > >> > > >
> > > >> > >
> > > >> > >
> > > >> > > --
> > > >> > > Adrien
> > > >> > >
> > > >> >
> > > >>
> > > >>
> > > >> --
> > > >> Vincenzo D'Amore
> > > >>
> > > >
> > >
> >
>
Re: Slower document retrieval in 8.7.0 comparing to 7.5.0 [ In reply to ]
The accent should not be on retrieved documents number, but on the duration
ratio - 8.7.0 is 3 times slower. I think it will be similar ratio
retrieving any number of documents.

On Fri, Jan 22, 2021 at 1:39 PM Rob Audenaerde <rob.audenaerde@gmail.com>
wrote:

> Hi Martrynas,
>
> In your sample code you are retrieving all (1 million!) documents from the
> index, that surely is not a good match for lucene :)
>
> Is that a good reflection of your use-case?
>
> On Fri, Jan 22, 2021 at 9:52 AM Martynas L <martynas.sub@gmail.com> wrote:
>
> > Please see the sample at
> > https://drive.google.com/drive/folders/1ufVZXzkugBAFnuy8HLAY6mbPWzjknrfE
> >
> > IndexGenerator - creates a dummy index.
> > IndexReader - retrieves documents - duration time with 7.5.0 version is
> > ~2s, while ~6s with 8.7.0
> >
> > Regards,
> > Martynas
> >
> >
> > On Thu, Jan 21, 2021 at 8:21 PM Rob Audenaerde <rob.audenaerde@gmail.com
> >
> > wrote:
> >
> > > There is no attachment in the previous email that I can see? Maybe you
> > can
> > > post it online?
> > >
> > > On Thu, Jan 21, 2021 at 4:54 PM Martynas L <martynas.sub@gmail.com>
> > wrote:
> > >
> > > > Hello,
> > > >
> > > > Are there any comments on this issue?
> > > > If there is no workaround, we will be forced to rollback to the 7.5.0
> > > > version.
> > > >
> > > > Best regards,
> > > > Martynas
> > > >
> > > > On Tue, Jan 12, 2021 at 12:27 PM Martynas L <martynas.sub@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > Please see attached sample.
> > > > > IndexGenerator - creates a dummy index.
> > > > > IndexReader - retrieves documents - duration time with 7.5.0
> version
> > is
> > > > > ~2s, while ~6s with 8.7.0
> > > > >
> > > > > Regards,
> > > > > Martynas
> > > > >
> > > > > On Tue, Dec 22, 2020 at 3:23 PM Vincenzo D'Amore <
> v.damore@gmail.com
> > >
> > > > > wrote:
> > > > >
> > > > >> I think it would be useful to have an example of a document and,
> if
> > > > >> possible, an example of query that takes too long.
> > > > >>
> > > > >> On Mon, Dec 21, 2020 at 1:47 PM Martynas L <
> martynas.sub@gmail.com>
> > > > >> wrote:
> > > > >>
> > > > >> > Hello,
> > > > >> >
> > > > >> > I am sorry for the delay.
> > > > >> >
> > > > >> > Not sure what you mean by "workload". We have a performance
> tests,
> > > > which
> > > > >> > started failing after upgrading to 8.7.0.
> > > > >> > So I just tried to query the index (built form the same source)
> to
> > > get
> > > > >> all
> > > > >> > documents and compare the performance with 7.5.0.
> > > > >> >
> > > > >> > Document "size" is a sum of all stored string lengths (3402519
> > > > >> documents):
> > > > >> >
> > > > >> > doc size 903 - 88s vs 22s
> > > > >> >
> > > > >> > doc size 36 (only one field loaded, used searcher.doc(docID,
> > > > >> > Collections.singleton("fieldName"))) - 78s vs 16s
> > > > >> >
> > > > >> > doc size 439 (some fields made not stored) - 46s vs 14.5s
> > > > >> >
> > > > >> > Best regards,
> > > > >> > Martynas
> > > > >> >
> > > > >> > On Fri, Dec 4, 2020 at 12:06 AM Adrien Grand <jpountz@gmail.com
> >
> > > > wrote:
> > > > >> >
> > > > >> > > Hello Martynas,
> > > > >> > >
> > > > >> > > There have indeed been changes related to stored fields in
> 8.7.
> > > What
> > > > >> does
> > > > >> > > your workload look like and how large are your documents on
> > > average?
> > > > >> > >
> > > > >> > > On Thu, Dec 3, 2020 at 3:04 PM Martynas L <
> > martynas.sub@gmail.com
> > > >
> > > > >> > wrote:
> > > > >> > >
> > > > >> > > > Hi,
> > > > >> > > > We've migrated from 7.5.0 to 8.7.0 and find out that the
> index
> > > > >> > > "searching"
> > > > >> > > > is significantly (4-5 times) slower in the latest version.
> > > > >> > > > It seems that
> > > > >> > > > org.apache.lucene.search.IndexSearcher#doc(int)
> > > > >> > > > is slower.
> > > > >> > > >
> > > > >> > > > Is it possible to have similar performance with 8.7.0?
> > > > >> > > >
> > > > >> > > > Best regards,
> > > > >> > > > Martynas
> > > > >> > > >
> > > > >> > >
> > > > >> > >
> > > > >> > > --
> > > > >> > > Adrien
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >>
> > > > >> --
> > > > >> Vincenzo D'Amore
> > > > >>
> > > > >
> > > >
> > >
> >
>
Re: Slower document retrieval in 8.7.0 comparing to 7.5.0 [ In reply to ]
> I think it will be similar ratio retrieving any number of documents.

I'm not sure this is true, if you retrieve a huge amount of documents you might cause troubles to the GC.

From: java-user@lucene.apache.org At: 01/22/21 12:11:19To: java-user@lucene.apache.org
Subject: Re: Slower document retrieval in 8.7.0 comparing to 7.5.0

The accent should not be on retrieved documents number, but on the duration
ratio - 8.7.0 is 3 times slower. I think it will be similar ratio
retrieving any number of documents.

On Fri, Jan 22, 2021 at 1:39 PM Rob Audenaerde <rob.audenaerde@gmail.com>
wrote:

> Hi Martrynas,
>
> In your sample code you are retrieving all (1 million!) documents from the
> index, that surely is not a good match for lucene :)
>
> Is that a good reflection of your use-case?
>
> On Fri, Jan 22, 2021 at 9:52 AM Martynas L <martynas.sub@gmail.com> wrote:
>
> > Please see the sample at
> > https://drive.google.com/drive/folders/1ufVZXzkugBAFnuy8HLAY6mbPWzjknrfE
> >
> > IndexGenerator - creates a dummy index.
> > IndexReader - retrieves documents - duration time with 7.5.0 version is
> > ~2s, while ~6s with 8.7.0
> >
> > Regards,
> > Martynas
> >
> >
> > On Thu, Jan 21, 2021 at 8:21 PM Rob Audenaerde <rob.audenaerde@gmail.com
> >
> > wrote:
> >
> > > There is no attachment in the previous email that I can see? Maybe you
> > can
> > > post it online?
> > >
> > > On Thu, Jan 21, 2021 at 4:54 PM Martynas L <martynas.sub@gmail.com>
> > wrote:
> > >
> > > > Hello,
> > > >
> > > > Are there any comments on this issue?
> > > > If there is no workaround, we will be forced to rollback to the 7.5.0
> > > > version.
> > > >
> > > > Best regards,
> > > > Martynas
> > > >
> > > > On Tue, Jan 12, 2021 at 12:27 PM Martynas L <martynas.sub@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > Please see attached sample.
> > > > > IndexGenerator - creates a dummy index.
> > > > > IndexReader - retrieves documents - duration time with 7.5.0
> version
> > is
> > > > > ~2s, while ~6s with 8.7.0
> > > > >
> > > > > Regards,
> > > > > Martynas
> > > > >
> > > > > On Tue, Dec 22, 2020 at 3:23 PM Vincenzo D'Amore <
> v.damore@gmail.com
> > >
> > > > > wrote:
> > > > >
> > > > >> I think it would be useful to have an example of a document and,
> if
> > > > >> possible, an example of query that takes too long.
> > > > >>
> > > > >> On Mon, Dec 21, 2020 at 1:47 PM Martynas L <
> martynas.sub@gmail.com>
> > > > >> wrote:
> > > > >>
> > > > >> > Hello,
> > > > >> >
> > > > >> > I am sorry for the delay.
> > > > >> >
> > > > >> > Not sure what you mean by "workload". We have a performance
> tests,
> > > > which
> > > > >> > started failing after upgrading to 8.7.0.
> > > > >> > So I just tried to query the index (built form the same source)
> to
> > > get
> > > > >> all
> > > > >> > documents and compare the performance with 7.5.0.
> > > > >> >
> > > > >> > Document "size" is a sum of all stored string lengths (3402519
> > > > >> documents):
> > > > >> >
> > > > >> > doc size 903 - 88s vs 22s
> > > > >> >
> > > > >> > doc size 36 (only one field loaded, used searcher.doc(docID,
> > > > >> > Collections.singleton("fieldName"))) - 78s vs 16s
> > > > >> >
> > > > >> > doc size 439 (some fields made not stored) - 46s vs 14.5s
> > > > >> >
> > > > >> > Best regards,
> > > > >> > Martynas
> > > > >> >
> > > > >> > On Fri, Dec 4, 2020 at 12:06 AM Adrien Grand <jpountz@gmail.com
> >
> > > > wrote:
> > > > >> >
> > > > >> > > Hello Martynas,
> > > > >> > >
> > > > >> > > There have indeed been changes related to stored fields in
> 8.7.
> > > What
> > > > >> does
> > > > >> > > your workload look like and how large are your documents on
> > > average?
> > > > >> > >
> > > > >> > > On Thu, Dec 3, 2020 at 3:04 PM Martynas L <
> > martynas.sub@gmail.com
> > > >
> > > > >> > wrote:
> > > > >> > >
> > > > >> > > > Hi,
> > > > >> > > > We've migrated from 7.5.0 to 8.7.0 and find out that the
> index
> > > > >> > > "searching"
> > > > >> > > > is significantly (4-5 times) slower in the latest version.
> > > > >> > > > It seems that
> > > > >> > > > org.apache.lucene.search.IndexSearcher#doc(int)
> > > > >> > > > is slower.
> > > > >> > > >
> > > > >> > > > Is it possible to have similar performance with 8.7.0?
> > > > >> > > >
> > > > >> > > > Best regards,
> > > > >> > > > Martynas
> > > > >> > > >
> > > > >> > >
> > > > >> > >
> > > > >> > > --
> > > > >> > > Adrien
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >>
> > > > >> --
> > > > >> Vincenzo D'Amore
> > > > >>
> > > > >
> > > >
> > >
> >
>
Re: Slower document retrieval in 8.7.0 comparing to 7.5.0 [ In reply to ]
Even retrieving single document 8.7.0 is more than x2 slower

On Fri, Jan 22, 2021 at 2:28 PM Diego Ceccarelli (BLOOMBERG/ LONDON) <
dceccarelli4@bloomberg.net> wrote:

> > I think it will be similar ratio retrieving any number of documents.
>
> I'm not sure this is true, if you retrieve a huge amount of documents you
> might cause troubles to the GC.
>
> From: java-user@lucene.apache.org At: 01/22/21 12:11:19To:
> java-user@lucene.apache.org
> Subject: Re: Slower document retrieval in 8.7.0 comparing to 7.5.0
>
> The accent should not be on retrieved documents number, but on the duration
> ratio - 8.7.0 is 3 times slower. I think it will be similar ratio
> retrieving any number of documents.
>
> On Fri, Jan 22, 2021 at 1:39 PM Rob Audenaerde <rob.audenaerde@gmail.com>
> wrote:
>
> > Hi Martrynas,
> >
> > In your sample code you are retrieving all (1 million!) documents from
> the
> > index, that surely is not a good match for lucene :)
> >
> > Is that a good reflection of your use-case?
> >
> > On Fri, Jan 22, 2021 at 9:52 AM Martynas L <martynas.sub@gmail.com>
> wrote:
> >
> > > Please see the sample at
> > >
> https://drive.google.com/drive/folders/1ufVZXzkugBAFnuy8HLAY6mbPWzjknrfE
> > >
> > > IndexGenerator - creates a dummy index.
> > > IndexReader - retrieves documents - duration time with 7.5.0 version is
> > > ~2s, while ~6s with 8.7.0
> > >
> > > Regards,
> > > Martynas
> > >
> > >
> > > On Thu, Jan 21, 2021 at 8:21 PM Rob Audenaerde <
> rob.audenaerde@gmail.com
> > >
> > > wrote:
> > >
> > > > There is no attachment in the previous email that I can see? Maybe
> you
> > > can
> > > > post it online?
> > > >
> > > > On Thu, Jan 21, 2021 at 4:54 PM Martynas L <martynas.sub@gmail.com>
> > > wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > Are there any comments on this issue?
> > > > > If there is no workaround, we will be forced to rollback to the
> 7.5.0
> > > > > version.
> > > > >
> > > > > Best regards,
> > > > > Martynas
> > > > >
> > > > > On Tue, Jan 12, 2021 at 12:27 PM Martynas L <
> martynas.sub@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Please see attached sample.
> > > > > > IndexGenerator - creates a dummy index.
> > > > > > IndexReader - retrieves documents - duration time with 7.5.0
> > version
> > > is
> > > > > > ~2s, while ~6s with 8.7.0
> > > > > >
> > > > > > Regards,
> > > > > > Martynas
> > > > > >
> > > > > > On Tue, Dec 22, 2020 at 3:23 PM Vincenzo D'Amore <
> > v.damore@gmail.com
> > > >
> > > > > > wrote:
> > > > > >
> > > > > >> I think it would be useful to have an example of a document and,
> > if
> > > > > >> possible, an example of query that takes too long.
> > > > > >>
> > > > > >> On Mon, Dec 21, 2020 at 1:47 PM Martynas L <
> > martynas.sub@gmail.com>
> > > > > >> wrote:
> > > > > >>
> > > > > >> > Hello,
> > > > > >> >
> > > > > >> > I am sorry for the delay.
> > > > > >> >
> > > > > >> > Not sure what you mean by "workload". We have a performance
> > tests,
> > > > > which
> > > > > >> > started failing after upgrading to 8.7.0.
> > > > > >> > So I just tried to query the index (built form the same
> source)
> > to
> > > > get
> > > > > >> all
> > > > > >> > documents and compare the performance with 7.5.0.
> > > > > >> >
> > > > > >> > Document "size" is a sum of all stored string lengths (3402519
> > > > > >> documents):
> > > > > >> >
> > > > > >> > doc size 903 - 88s vs 22s
> > > > > >> >
> > > > > >> > doc size 36 (only one field loaded, used searcher.doc(docID,
> > > > > >> > Collections.singleton("fieldName"))) - 78s vs 16s
> > > > > >> >
> > > > > >> > doc size 439 (some fields made not stored) - 46s vs 14.5s
> > > > > >> >
> > > > > >> > Best regards,
> > > > > >> > Martynas
> > > > > >> >
> > > > > >> > On Fri, Dec 4, 2020 at 12:06 AM Adrien Grand <
> jpountz@gmail.com
> > >
> > > > > wrote:
> > > > > >> >
> > > > > >> > > Hello Martynas,
> > > > > >> > >
> > > > > >> > > There have indeed been changes related to stored fields in
> > 8.7.
> > > > What
> > > > > >> does
> > > > > >> > > your workload look like and how large are your documents on
> > > > average?
> > > > > >> > >
> > > > > >> > > On Thu, Dec 3, 2020 at 3:04 PM Martynas L <
> > > martynas.sub@gmail.com
> > > > >
> > > > > >> > wrote:
> > > > > >> > >
> > > > > >> > > > Hi,
> > > > > >> > > > We've migrated from 7.5.0 to 8.7.0 and find out that the
> > index
> > > > > >> > > "searching"
> > > > > >> > > > is significantly (4-5 times) slower in the latest version.
> > > > > >> > > > It seems that
> > > > > >> > > > org.apache.lucene.search.IndexSearcher#doc(int)
> > > > > >> > > > is slower.
> > > > > >> > > >
> > > > > >> > > > Is it possible to have similar performance with 8.7.0?
> > > > > >> > > >
> > > > > >> > > > Best regards,
> > > > > >> > > > Martynas
> > > > > >> > > >
> > > > > >> > >
> > > > > >> > >
> > > > > >> > > --
> > > > > >> > > Adrien
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > > >>
> > > > > >> --
> > > > > >> Vincenzo D'Amore
> > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
>
>
>
Re: Slower document retrieval in 8.7.0 comparing to 7.5.0 [ In reply to ]
Hi Martynas

How did you measure that?

I ask, because writing a good benchmark is not an easy task, since there
are so many factors (class loading times, JIT effects, etc). You should use
Java Microbenchmark Harness or similar; and set up a random document
retrieval task, with warm-up etc.etc.

(I'm not aware of any big slowdowns, but as you see them, the best way is
to build a robust benchmark and then start comparing)

-Rob


On Fri, Jan 22, 2021 at 3:43 PM Martynas L <martynas.sub@gmail.com> wrote:

> Even retrieving single document 8.7.0 is more than x2 slower
>
> On Fri, Jan 22, 2021 at 2:28 PM Diego Ceccarelli (BLOOMBERG/ LONDON) <
> dceccarelli4@bloomberg.net> wrote:
>
> > > I think it will be similar ratio retrieving any number of documents.
> >
> > I'm not sure this is true, if you retrieve a huge amount of documents you
> > might cause troubles to the GC.
> >
> > From: java-user@lucene.apache.org At: 01/22/21 12:11:19To:
> > java-user@lucene.apache.org
> > Subject: Re: Slower document retrieval in 8.7.0 comparing to 7.5.0
> >
> > The accent should not be on retrieved documents number, but on the
> duration
> > ratio - 8.7.0 is 3 times slower. I think it will be similar ratio
> > retrieving any number of documents.
> >
> > On Fri, Jan 22, 2021 at 1:39 PM Rob Audenaerde <rob.audenaerde@gmail.com
> >
> > wrote:
> >
> > > Hi Martrynas,
> > >
> > > In your sample code you are retrieving all (1 million!) documents from
> > the
> > > index, that surely is not a good match for lucene :)
> > >
> > > Is that a good reflection of your use-case?
> > >
> > > On Fri, Jan 22, 2021 at 9:52 AM Martynas L <martynas.sub@gmail.com>
> > wrote:
> > >
> > > > Please see the sample at
> > > >
> > https://drive.google.com/drive/folders/1ufVZXzkugBAFnuy8HLAY6mbPWzjknrfE
> > > >
> > > > IndexGenerator - creates a dummy index.
> > > > IndexReader - retrieves documents - duration time with 7.5.0 version
> is
> > > > ~2s, while ~6s with 8.7.0
> > > >
> > > > Regards,
> > > > Martynas
> > > >
> > > >
> > > > On Thu, Jan 21, 2021 at 8:21 PM Rob Audenaerde <
> > rob.audenaerde@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > There is no attachment in the previous email that I can see? Maybe
> > you
> > > > can
> > > > > post it online?
> > > > >
> > > > > On Thu, Jan 21, 2021 at 4:54 PM Martynas L <martynas.sub@gmail.com
> >
> > > > wrote:
> > > > >
> > > > > > Hello,
> > > > > >
> > > > > > Are there any comments on this issue?
> > > > > > If there is no workaround, we will be forced to rollback to the
> > 7.5.0
> > > > > > version.
> > > > > >
> > > > > > Best regards,
> > > > > > Martynas
> > > > > >
> > > > > > On Tue, Jan 12, 2021 at 12:27 PM Martynas L <
> > martynas.sub@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > Please see attached sample.
> > > > > > > IndexGenerator - creates a dummy index.
> > > > > > > IndexReader - retrieves documents - duration time with 7.5.0
> > > version
> > > > is
> > > > > > > ~2s, while ~6s with 8.7.0
> > > > > > >
> > > > > > > Regards,
> > > > > > > Martynas
> > > > > > >
> > > > > > > On Tue, Dec 22, 2020 at 3:23 PM Vincenzo D'Amore <
> > > v.damore@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > >> I think it would be useful to have an example of a document
> and,
> > > if
> > > > > > >> possible, an example of query that takes too long.
> > > > > > >>
> > > > > > >> On Mon, Dec 21, 2020 at 1:47 PM Martynas L <
> > > martynas.sub@gmail.com>
> > > > > > >> wrote:
> > > > > > >>
> > > > > > >> > Hello,
> > > > > > >> >
> > > > > > >> > I am sorry for the delay.
> > > > > > >> >
> > > > > > >> > Not sure what you mean by "workload". We have a performance
> > > tests,
> > > > > > which
> > > > > > >> > started failing after upgrading to 8.7.0.
> > > > > > >> > So I just tried to query the index (built form the same
> > source)
> > > to
> > > > > get
> > > > > > >> all
> > > > > > >> > documents and compare the performance with 7.5.0.
> > > > > > >> >
> > > > > > >> > Document "size" is a sum of all stored string lengths
> (3402519
> > > > > > >> documents):
> > > > > > >> >
> > > > > > >> > doc size 903 - 88s vs 22s
> > > > > > >> >
> > > > > > >> > doc size 36 (only one field loaded, used searcher.doc(docID,
> > > > > > >> > Collections.singleton("fieldName"))) - 78s vs 16s
> > > > > > >> >
> > > > > > >> > doc size 439 (some fields made not stored) - 46s vs 14.5s
> > > > > > >> >
> > > > > > >> > Best regards,
> > > > > > >> > Martynas
> > > > > > >> >
> > > > > > >> > On Fri, Dec 4, 2020 at 12:06 AM Adrien Grand <
> > jpountz@gmail.com
> > > >
> > > > > > wrote:
> > > > > > >> >
> > > > > > >> > > Hello Martynas,
> > > > > > >> > >
> > > > > > >> > > There have indeed been changes related to stored fields in
> > > 8.7.
> > > > > What
> > > > > > >> does
> > > > > > >> > > your workload look like and how large are your documents
> on
> > > > > average?
> > > > > > >> > >
> > > > > > >> > > On Thu, Dec 3, 2020 at 3:04 PM Martynas L <
> > > > martynas.sub@gmail.com
> > > > > >
> > > > > > >> > wrote:
> > > > > > >> > >
> > > > > > >> > > > Hi,
> > > > > > >> > > > We've migrated from 7.5.0 to 8.7.0 and find out that the
> > > index
> > > > > > >> > > "searching"
> > > > > > >> > > > is significantly (4-5 times) slower in the latest
> version.
> > > > > > >> > > > It seems that
> > > > > > >> > > > org.apache.lucene.search.IndexSearcher#doc(int)
> > > > > > >> > > > is slower.
> > > > > > >> > > >
> > > > > > >> > > > Is it possible to have similar performance with 8.7.0?
> > > > > > >> > > >
> > > > > > >> > > > Best regards,
> > > > > > >> > > > Martynas
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> > >
> > > > > > >> > > --
> > > > > > >> > > Adrien
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > > >>
> > > > > > >> --
> > > > > > >> Vincenzo D'Amore
> > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> >
>
Re: Slower document retrieval in 8.7.0 comparing to 7.5.0 [ In reply to ]
Just played with my reading sample. I do not have a goal to show the exact
numbers, but it is a fact that document retrieval IndexSearcher.doc(int) is
much slower.
All our performance tests showed performance degradation after changing to
8.7.0, even without measurement we can "see/feel" the operations involving
documents retrieval became slower.



On Fri, Jan 22, 2021 at 4:48 PM Rob Audenaerde <rob.audenaerde@gmail.com>
wrote:

> Hi Martynas
>
> How did you measure that?
>
> I ask, because writing a good benchmark is not an easy task, since there
> are so many factors (class loading times, JIT effects, etc). You should use
> Java Microbenchmark Harness or similar; and set up a random document
> retrieval task, with warm-up etc.etc.
>
> (I'm not aware of any big slowdowns, but as you see them, the best way is
> to build a robust benchmark and then start comparing)
>
> -Rob
>
>
> On Fri, Jan 22, 2021 at 3:43 PM Martynas L <martynas.sub@gmail.com> wrote:
>
> > Even retrieving single document 8.7.0 is more than x2 slower
> >
> > On Fri, Jan 22, 2021 at 2:28 PM Diego Ceccarelli (BLOOMBERG/ LONDON) <
> > dceccarelli4@bloomberg.net> wrote:
> >
> > > > I think it will be similar ratio retrieving any number of documents.
> > >
> > > I'm not sure this is true, if you retrieve a huge amount of documents
> you
> > > might cause troubles to the GC.
> > >
> > > From: java-user@lucene.apache.org At: 01/22/21 12:11:19To:
> > > java-user@lucene.apache.org
> > > Subject: Re: Slower document retrieval in 8.7.0 comparing to 7.5.0
> > >
> > > The accent should not be on retrieved documents number, but on the
> > duration
> > > ratio - 8.7.0 is 3 times slower. I think it will be similar ratio
> > > retrieving any number of documents.
> > >
> > > On Fri, Jan 22, 2021 at 1:39 PM Rob Audenaerde <
> rob.audenaerde@gmail.com
> > >
> > > wrote:
> > >
> > > > Hi Martrynas,
> > > >
> > > > In your sample code you are retrieving all (1 million!) documents
> from
> > > the
> > > > index, that surely is not a good match for lucene :)
> > > >
> > > > Is that a good reflection of your use-case?
> > > >
> > > > On Fri, Jan 22, 2021 at 9:52 AM Martynas L <martynas.sub@gmail.com>
> > > wrote:
> > > >
> > > > > Please see the sample at
> > > > >
> > >
> https://drive.google.com/drive/folders/1ufVZXzkugBAFnuy8HLAY6mbPWzjknrfE
> > > > >
> > > > > IndexGenerator - creates a dummy index.
> > > > > IndexReader - retrieves documents - duration time with 7.5.0
> version
> > is
> > > > > ~2s, while ~6s with 8.7.0
> > > > >
> > > > > Regards,
> > > > > Martynas
> > > > >
> > > > >
> > > > > On Thu, Jan 21, 2021 at 8:21 PM Rob Audenaerde <
> > > rob.audenaerde@gmail.com
> > > > >
> > > > > wrote:
> > > > >
> > > > > > There is no attachment in the previous email that I can see?
> Maybe
> > > you
> > > > > can
> > > > > > post it online?
> > > > > >
> > > > > > On Thu, Jan 21, 2021 at 4:54 PM Martynas L <
> martynas.sub@gmail.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > > Hello,
> > > > > > >
> > > > > > > Are there any comments on this issue?
> > > > > > > If there is no workaround, we will be forced to rollback to the
> > > 7.5.0
> > > > > > > version.
> > > > > > >
> > > > > > > Best regards,
> > > > > > > Martynas
> > > > > > >
> > > > > > > On Tue, Jan 12, 2021 at 12:27 PM Martynas L <
> > > martynas.sub@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > Please see attached sample.
> > > > > > > > IndexGenerator - creates a dummy index.
> > > > > > > > IndexReader - retrieves documents - duration time with 7.5.0
> > > > version
> > > > > is
> > > > > > > > ~2s, while ~6s with 8.7.0
> > > > > > > >
> > > > > > > > Regards,
> > > > > > > > Martynas
> > > > > > > >
> > > > > > > > On Tue, Dec 22, 2020 at 3:23 PM Vincenzo D'Amore <
> > > > v.damore@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > >> I think it would be useful to have an example of a document
> > and,
> > > > if
> > > > > > > >> possible, an example of query that takes too long.
> > > > > > > >>
> > > > > > > >> On Mon, Dec 21, 2020 at 1:47 PM Martynas L <
> > > > martynas.sub@gmail.com>
> > > > > > > >> wrote:
> > > > > > > >>
> > > > > > > >> > Hello,
> > > > > > > >> >
> > > > > > > >> > I am sorry for the delay.
> > > > > > > >> >
> > > > > > > >> > Not sure what you mean by "workload". We have a
> performance
> > > > tests,
> > > > > > > which
> > > > > > > >> > started failing after upgrading to 8.7.0.
> > > > > > > >> > So I just tried to query the index (built form the same
> > > source)
> > > > to
> > > > > > get
> > > > > > > >> all
> > > > > > > >> > documents and compare the performance with 7.5.0.
> > > > > > > >> >
> > > > > > > >> > Document "size" is a sum of all stored string lengths
> > (3402519
> > > > > > > >> documents):
> > > > > > > >> >
> > > > > > > >> > doc size 903 - 88s vs 22s
> > > > > > > >> >
> > > > > > > >> > doc size 36 (only one field loaded, used
> searcher.doc(docID,
> > > > > > > >> > Collections.singleton("fieldName"))) - 78s vs 16s
> > > > > > > >> >
> > > > > > > >> > doc size 439 (some fields made not stored) - 46s vs 14.5s
> > > > > > > >> >
> > > > > > > >> > Best regards,
> > > > > > > >> > Martynas
> > > > > > > >> >
> > > > > > > >> > On Fri, Dec 4, 2020 at 12:06 AM Adrien Grand <
> > > jpountz@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > >> >
> > > > > > > >> > > Hello Martynas,
> > > > > > > >> > >
> > > > > > > >> > > There have indeed been changes related to stored fields
> in
> > > > 8.7.
> > > > > > What
> > > > > > > >> does
> > > > > > > >> > > your workload look like and how large are your documents
> > on
> > > > > > average?
> > > > > > > >> > >
> > > > > > > >> > > On Thu, Dec 3, 2020 at 3:04 PM Martynas L <
> > > > > martynas.sub@gmail.com
> > > > > > >
> > > > > > > >> > wrote:
> > > > > > > >> > >
> > > > > > > >> > > > Hi,
> > > > > > > >> > > > We've migrated from 7.5.0 to 8.7.0 and find out that
> the
> > > > index
> > > > > > > >> > > "searching"
> > > > > > > >> > > > is significantly (4-5 times) slower in the latest
> > version.
> > > > > > > >> > > > It seems that
> > > > > > > >> > > > org.apache.lucene.search.IndexSearcher#doc(int)
> > > > > > > >> > > > is slower.
> > > > > > > >> > > >
> > > > > > > >> > > > Is it possible to have similar performance with 8.7.0?
> > > > > > > >> > > >
> > > > > > > >> > > > Best regards,
> > > > > > > >> > > > Martynas
> > > > > > > >> > > >
> > > > > > > >> > >
> > > > > > > >> > >
> > > > > > > >> > > --
> > > > > > > >> > > Adrien
> > > > > > > >> > >
> > > > > > > >> >
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> --
> > > > > > > >> Vincenzo D'Amore
> > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> > >
> >
>
Re: Slower document retrieval in 8.7.0 comparing to 7.5.0 [ In reply to ]
Hey Martynas,

Can your tests be published? I'm curious!

If you implemented your document retrieval / benchmarking - are you running it several times for each version and getting the median time?

As Rob pointed out, benchmarking is a very tricky topic :) if you are retrieving the document content from Lucene you are also dealing with how the operating system decides to load the content in memory.

Cheers,
Diego

From: java-user@lucene.apache.org At: 01/22/21 15:22:32To: java-user@lucene.apache.org
Subject: Re: Slower document retrieval in 8.7.0 comparing to 7.5.0

Just played with my reading sample. I do not have a goal to show the exact
numbers, but it is a fact that document retrieval IndexSearcher.doc(int) is
much slower.
All our performance tests showed performance degradation after changing to
8.7.0, even without measurement we can "see/feel" the operations involving
documents retrieval became slower.


On Fri, Jan 22, 2021 at 4:48 PM Rob Audenaerde <rob.audenaerde@gmail.com>
wrote:

> Hi Martynas
>
> How did you measure that?
>
> I ask, because writing a good benchmark is not an easy task, since there
> are so many factors (class loading times, JIT effects, etc). You should use
> Java Microbenchmark Harness or similar; and set up a random document
> retrieval task, with warm-up etc.etc.
>
> (I'm not aware of any big slowdowns, but as you see them, the best way is
> to build a robust benchmark and then start comparing)
>
> -Rob
>
>
> On Fri, Jan 22, 2021 at 3:43 PM Martynas L <martynas.sub@gmail.com> wrote:
>
> > Even retrieving single document 8.7.0 is more than x2 slower
> >
> > On Fri, Jan 22, 2021 at 2:28 PM Diego Ceccarelli (BLOOMBERG/ LONDON) <
> > dceccarelli4@bloomberg.net> wrote:
> >
> > > > I think it will be similar ratio retrieving any number of documents.
> > >
> > > I'm not sure this is true, if you retrieve a huge amount of documents
> you
> > > might cause troubles to the GC.
> > >
> > > From: java-user@lucene.apache.org At: 01/22/21 12:11:19To:
> > > java-user@lucene.apache.org
> > > Subject: Re: Slower document retrieval in 8.7.0 comparing to 7.5.0
> > >
> > > The accent should not be on retrieved documents number, but on the
> > duration
> > > ratio - 8.7.0 is 3 times slower. I think it will be similar ratio
> > > retrieving any number of documents.
> > >
> > > On Fri, Jan 22, 2021 at 1:39 PM Rob Audenaerde <
> rob.audenaerde@gmail.com
> > >
> > > wrote:
> > >
> > > > Hi Martrynas,
> > > >
> > > > In your sample code you are retrieving all (1 million!) documents
> from
> > > the
> > > > index, that surely is not a good match for lucene :)
> > > >
> > > > Is that a good reflection of your use-case?
> > > >
> > > > On Fri, Jan 22, 2021 at 9:52 AM Martynas L <martynas.sub@gmail.com>
> > > wrote:
> > > >
> > > > > Please see the sample at
> > > > >
> > >
> https://drive.google.com/drive/folders/1ufVZXzkugBAFnuy8HLAY6mbPWzjknrfE
> > > > >
> > > > > IndexGenerator - creates a dummy index.
> > > > > IndexReader - retrieves documents - duration time with 7.5.0
> version
> > is
> > > > > ~2s, while ~6s with 8.7.0
> > > > >
> > > > > Regards,
> > > > > Martynas
> > > > >
> > > > >
> > > > > On Thu, Jan 21, 2021 at 8:21 PM Rob Audenaerde <
> > > rob.audenaerde@gmail.com
> > > > >
> > > > > wrote:
> > > > >
> > > > > > There is no attachment in the previous email that I can see?
> Maybe
> > > you
> > > > > can
> > > > > > post it online?
> > > > > >
> > > > > > On Thu, Jan 21, 2021 at 4:54 PM Martynas L <
> martynas.sub@gmail.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > > Hello,
> > > > > > >
> > > > > > > Are there any comments on this issue?
> > > > > > > If there is no workaround, we will be forced to rollback to the
> > > 7.5.0
> > > > > > > version.
> > > > > > >
> > > > > > > Best regards,
> > > > > > > Martynas
> > > > > > >
> > > > > > > On Tue, Jan 12, 2021 at 12:27 PM Martynas L <
> > > martynas.sub@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > Please see attached sample.
> > > > > > > > IndexGenerator - creates a dummy index.
> > > > > > > > IndexReader - retrieves documents - duration time with 7.5.0
> > > > version
> > > > > is
> > > > > > > > ~2s, while ~6s with 8.7.0
> > > > > > > >
> > > > > > > > Regards,
> > > > > > > > Martynas
> > > > > > > >
> > > > > > > > On Tue, Dec 22, 2020 at 3:23 PM Vincenzo D'Amore <
> > > > v.damore@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > >> I think it would be useful to have an example of a document
> > and,
> > > > if
> > > > > > > >> possible, an example of query that takes too long.
> > > > > > > >>
> > > > > > > >> On Mon, Dec 21, 2020 at 1:47 PM Martynas L <
> > > > martynas.sub@gmail.com>
> > > > > > > >> wrote:
> > > > > > > >>
> > > > > > > >> > Hello,
> > > > > > > >> >
> > > > > > > >> > I am sorry for the delay.
> > > > > > > >> >
> > > > > > > >> > Not sure what you mean by "workload". We have a
> performance
> > > > tests,
> > > > > > > which
> > > > > > > >> > started failing after upgrading to 8.7.0.
> > > > > > > >> > So I just tried to query the index (built form the same
> > > source)
> > > > to
> > > > > > get
> > > > > > > >> all
> > > > > > > >> > documents and compare the performance with 7.5.0.
> > > > > > > >> >
> > > > > > > >> > Document "size" is a sum of all stored string lengths
> > (3402519
> > > > > > > >> documents):
> > > > > > > >> >
> > > > > > > >> > doc size 903 - 88s vs 22s
> > > > > > > >> >
> > > > > > > >> > doc size 36 (only one field loaded, used
> searcher.doc(docID,
> > > > > > > >> > Collections.singleton("fieldName"))) - 78s vs 16s
> > > > > > > >> >
> > > > > > > >> > doc size 439 (some fields made not stored) - 46s vs 14.5s
> > > > > > > >> >
> > > > > > > >> > Best regards,
> > > > > > > >> > Martynas
> > > > > > > >> >
> > > > > > > >> > On Fri, Dec 4, 2020 at 12:06 AM Adrien Grand <
> > > jpountz@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > >> >
> > > > > > > >> > > Hello Martynas,
> > > > > > > >> > >
> > > > > > > >> > > There have indeed been changes related to stored fields
> in
> > > > 8.7.
> > > > > > What
> > > > > > > >> does
> > > > > > > >> > > your workload look like and how large are your documents
> > on
> > > > > > average?
> > > > > > > >> > >
> > > > > > > >> > > On Thu, Dec 3, 2020 at 3:04 PM Martynas L <
> > > > > martynas.sub@gmail.com
> > > > > > >
> > > > > > > >> > wrote:
> > > > > > > >> > >
> > > > > > > >> > > > Hi,
> > > > > > > >> > > > We've migrated from 7.5.0 to 8.7.0 and find out that
> the
> > > > index
> > > > > > > >> > > "searching"
> > > > > > > >> > > > is significantly (4-5 times) slower in the latest
> > version.
> > > > > > > >> > > > It seems that
> > > > > > > >> > > > org.apache.lucene.search.IndexSearcher#doc(int)
> > > > > > > >> > > > is slower.
> > > > > > > >> > > >
> > > > > > > >> > > > Is it possible to have similar performance with 8.7.0?
> > > > > > > >> > > >
> > > > > > > >> > > > Best regards,
> > > > > > > >> > > > Martynas
> > > > > > > >> > > >
> > > > > > > >> > >
> > > > > > > >> > >
> > > > > > > >> > > --
> > > > > > > >> > > Adrien
> > > > > > > >> > >
> > > > > > > >> >
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> --
> > > > > > > >> Vincenzo D'Amore
> > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> > >
> >
>
Re: Slower document retrieval in 8.7.0 comparing to 7.5.0 [ In reply to ]
I did some testing for you :)

I modified your code to run in a JMH benchmark; and changed the number of
retrieved docs to 1000 out of 1M in the index. This is what I got:

Lucene 7.5
Benchmark Mode Cnt Score Error Units
DocRetrievalBenchmark.retrieveDocuments thrpt 4 37.147 ± 6.218 ops/s

Lucene 8.7
Benchmark Mode Cnt Score Error Units
DocRetrievalBenchmark.retrieveDocuments thrpt 4 18.680 ± 5.755 ops/s

This is much in line with your observations, (lucene 8.7 seems almost twice
as slow) so something is going on when running out-of-the-box.

The code can be found : (not really beautiful, but gets the job done. If
you want to switch lucene-versions, edit the pom and make sure to set the
proper index version)
https://gist.github.com/d2a-raudenaerde/93a490e5b0d17b2fa88862473429aeb3

JMH details:
# JMH version: 1.21
# VM version: JDK 11.0.9.1, OpenJDK 64-Bit Server VM,
11.0.9.1+1-Ubuntu-0ubuntu1.20.04
# VM invoker: /usr/lib/jvm/java-11-openjdk-amd64/bin/java
# VM options: -Xms2G -Xmx2G
# Warmup: 2 iterations, 10 s each
# Measurement: 4 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: org.audenaerde.lucene.DocRetrievalBenchmark.retrieveDocuments


On Fri, Jan 22, 2021 at 4:22 PM Martynas L <martynas.sub@gmail.com> wrote:

> Just played with my reading sample. I do not have a goal to show the exact
> numbers, but it is a fact that document retrieval IndexSearcher.doc(int) is
> much slower.
> All our performance tests showed performance degradation after changing to
> 8.7.0, even without measurement we can "see/feel" the operations involving
> documents retrieval became slower.
>
>
>
> On Fri, Jan 22, 2021 at 4:48 PM Rob Audenaerde <rob.audenaerde@gmail.com>
> wrote:
>
> > Hi Martynas
> >
> > How did you measure that?
> >
> > I ask, because writing a good benchmark is not an easy task, since there
> > are so many factors (class loading times, JIT effects, etc). You should
> use
> > Java Microbenchmark Harness or similar; and set up a random document
> > retrieval task, with warm-up etc.etc.
> >
> > (I'm not aware of any big slowdowns, but as you see them, the best way is
> > to build a robust benchmark and then start comparing)
> >
> > -Rob
> >
> >
> > On Fri, Jan 22, 2021 at 3:43 PM Martynas L <martynas.sub@gmail.com>
> wrote:
> >
> > > Even retrieving single document 8.7.0 is more than x2 slower
> > >
> > > On Fri, Jan 22, 2021 at 2:28 PM Diego Ceccarelli (BLOOMBERG/ LONDON) <
> > > dceccarelli4@bloomberg.net> wrote:
> > >
> > > > > I think it will be similar ratio retrieving any number of
> documents.
> > > >
> > > > I'm not sure this is true, if you retrieve a huge amount of documents
> > you
> > > > might cause troubles to the GC.
> > > >
> > > > From: java-user@lucene.apache.org At: 01/22/21 12:11:19To:
> > > > java-user@lucene.apache.org
> > > > Subject: Re: Slower document retrieval in 8.7.0 comparing to 7.5.0
> > > >
> > > > The accent should not be on retrieved documents number, but on the
> > > duration
> > > > ratio - 8.7.0 is 3 times slower. I think it will be similar ratio
> > > > retrieving any number of documents.
> > > >
> > > > On Fri, Jan 22, 2021 at 1:39 PM Rob Audenaerde <
> > rob.audenaerde@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > Hi Martrynas,
> > > > >
> > > > > In your sample code you are retrieving all (1 million!) documents
> > from
> > > > the
> > > > > index, that surely is not a good match for lucene :)
> > > > >
> > > > > Is that a good reflection of your use-case?
> > > > >
> > > > > On Fri, Jan 22, 2021 at 9:52 AM Martynas L <martynas.sub@gmail.com
> >
> > > > wrote:
> > > > >
> > > > > > Please see the sample at
> > > > > >
> > > >
> > https://drive.google.com/drive/folders/1ufVZXzkugBAFnuy8HLAY6mbPWzjknrfE
> > > > > >
> > > > > > IndexGenerator - creates a dummy index.
> > > > > > IndexReader - retrieves documents - duration time with 7.5.0
> > version
> > > is
> > > > > > ~2s, while ~6s with 8.7.0
> > > > > >
> > > > > > Regards,
> > > > > > Martynas
> > > > > >
> > > > > >
> > > > > > On Thu, Jan 21, 2021 at 8:21 PM Rob Audenaerde <
> > > > rob.audenaerde@gmail.com
> > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > There is no attachment in the previous email that I can see?
> > Maybe
> > > > you
> > > > > > can
> > > > > > > post it online?
> > > > > > >
> > > > > > > On Thu, Jan 21, 2021 at 4:54 PM Martynas L <
> > martynas.sub@gmail.com
> > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > > Hello,
> > > > > > > >
> > > > > > > > Are there any comments on this issue?
> > > > > > > > If there is no workaround, we will be forced to rollback to
> the
> > > > 7.5.0
> > > > > > > > version.
> > > > > > > >
> > > > > > > > Best regards,
> > > > > > > > Martynas
> > > > > > > >
> > > > > > > > On Tue, Jan 12, 2021 at 12:27 PM Martynas L <
> > > > martynas.sub@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > Please see attached sample.
> > > > > > > > > IndexGenerator - creates a dummy index.
> > > > > > > > > IndexReader - retrieves documents - duration time with
> 7.5.0
> > > > > version
> > > > > > is
> > > > > > > > > ~2s, while ~6s with 8.7.0
> > > > > > > > >
> > > > > > > > > Regards,
> > > > > > > > > Martynas
> > > > > > > > >
> > > > > > > > > On Tue, Dec 22, 2020 at 3:23 PM Vincenzo D'Amore <
> > > > > v.damore@gmail.com
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > >> I think it would be useful to have an example of a
> document
> > > and,
> > > > > if
> > > > > > > > >> possible, an example of query that takes too long.
> > > > > > > > >>
> > > > > > > > >> On Mon, Dec 21, 2020 at 1:47 PM Martynas L <
> > > > > martynas.sub@gmail.com>
> > > > > > > > >> wrote:
> > > > > > > > >>
> > > > > > > > >> > Hello,
> > > > > > > > >> >
> > > > > > > > >> > I am sorry for the delay.
> > > > > > > > >> >
> > > > > > > > >> > Not sure what you mean by "workload". We have a
> > performance
> > > > > tests,
> > > > > > > > which
> > > > > > > > >> > started failing after upgrading to 8.7.0.
> > > > > > > > >> > So I just tried to query the index (built form the same
> > > > source)
> > > > > to
> > > > > > > get
> > > > > > > > >> all
> > > > > > > > >> > documents and compare the performance with 7.5.0.
> > > > > > > > >> >
> > > > > > > > >> > Document "size" is a sum of all stored string lengths
> > > (3402519
> > > > > > > > >> documents):
> > > > > > > > >> >
> > > > > > > > >> > doc size 903 - 88s vs 22s
> > > > > > > > >> >
> > > > > > > > >> > doc size 36 (only one field loaded, used
> > searcher.doc(docID,
> > > > > > > > >> > Collections.singleton("fieldName"))) - 78s vs 16s
> > > > > > > > >> >
> > > > > > > > >> > doc size 439 (some fields made not stored) - 46s vs
> 14.5s
> > > > > > > > >> >
> > > > > > > > >> > Best regards,
> > > > > > > > >> > Martynas
> > > > > > > > >> >
> > > > > > > > >> > On Fri, Dec 4, 2020 at 12:06 AM Adrien Grand <
> > > > jpountz@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > >> >
> > > > > > > > >> > > Hello Martynas,
> > > > > > > > >> > >
> > > > > > > > >> > > There have indeed been changes related to stored
> fields
> > in
> > > > > 8.7.
> > > > > > > What
> > > > > > > > >> does
> > > > > > > > >> > > your workload look like and how large are your
> documents
> > > on
> > > > > > > average?
> > > > > > > > >> > >
> > > > > > > > >> > > On Thu, Dec 3, 2020 at 3:04 PM Martynas L <
> > > > > > martynas.sub@gmail.com
> > > > > > > >
> > > > > > > > >> > wrote:
> > > > > > > > >> > >
> > > > > > > > >> > > > Hi,
> > > > > > > > >> > > > We've migrated from 7.5.0 to 8.7.0 and find out that
> > the
> > > > > index
> > > > > > > > >> > > "searching"
> > > > > > > > >> > > > is significantly (4-5 times) slower in the latest
> > > version.
> > > > > > > > >> > > > It seems that
> > > > > > > > >> > > > org.apache.lucene.search.IndexSearcher#doc(int)
> > > > > > > > >> > > > is slower.
> > > > > > > > >> > > >
> > > > > > > > >> > > > Is it possible to have similar performance with
> 8.7.0?
> > > > > > > > >> > > >
> > > > > > > > >> > > > Best regards,
> > > > > > > > >> > > > Martynas
> > > > > > > > >> > > >
> > > > > > > > >> > >
> > > > > > > > >> > >
> > > > > > > > >> > > --
> > > > > > > > >> > > Adrien
> > > > > > > > >> > >
> > > > > > > > >> >
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >> --
> > > > > > > > >> Vincenzo D'Amore
> > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > >
> >
>
Re: Slower document retrieval in 8.7.0 comparing to 7.5.0 [ In reply to ]
Hi!

This slowdown is expected, see LUCENE-9477
<https://issues.apache.org/jira/browse/LUCENE-9447> & LUCENE-9486
<https://issues.apache.org/jira/browse/LUCENE-9486>.The trade-off here is
index size vs fetch time, we have introduced a more aggressive compression
strategy for stored fields with the cost of a small increase in fetch
times. In your example, you can see that the index size has been reduced
around 20%.

If your workflow depends on those fetch times, you can always override the
stored field format through a filter codec and add your custom
compression parameters?

Cheers,

Ignacio




On Sat, Jan 23, 2021 at 8:36 AM Rob Audenaerde <rob.audenaerde@gmail.com>
wrote:

> I did some testing for you :)
>
> I modified your code to run in a JMH benchmark; and changed the number of
> retrieved docs to 1000 out of 1M in the index. This is what I got:
>
> Lucene 7.5
> Benchmark Mode Cnt Score Error Units
> DocRetrievalBenchmark.retrieveDocuments thrpt 4 37.147 ± 6.218 ops/s
>
> Lucene 8.7
> Benchmark Mode Cnt Score Error Units
> DocRetrievalBenchmark.retrieveDocuments thrpt 4 18.680 ± 5.755 ops/s
>
> This is much in line with your observations, (lucene 8.7 seems almost twice
> as slow) so something is going on when running out-of-the-box.
>
> The code can be found : (not really beautiful, but gets the job done. If
> you want to switch lucene-versions, edit the pom and make sure to set the
> proper index version)
> https://gist.github.com/d2a-raudenaerde/93a490e5b0d17b2fa88862473429aeb3
>
> JMH details:
> # JMH version: 1.21
> # VM version: JDK 11.0.9.1, OpenJDK 64-Bit Server VM,
> 11.0.9.1+1-Ubuntu-0ubuntu1.20.04
> # VM invoker: /usr/lib/jvm/java-11-openjdk-amd64/bin/java
> # VM options: -Xms2G -Xmx2G
> # Warmup: 2 iterations, 10 s each
> # Measurement: 4 iterations, 10 s each
> # Timeout: 10 min per iteration
> # Threads: 1 thread, will synchronize iterations
> # Benchmark mode: Throughput, ops/time
> # Benchmark: org.audenaerde.lucene.DocRetrievalBenchmark.retrieveDocuments
>
>
> On Fri, Jan 22, 2021 at 4:22 PM Martynas L <martynas.sub@gmail.com> wrote:
>
> > Just played with my reading sample. I do not have a goal to show the
> exact
> > numbers, but it is a fact that document retrieval IndexSearcher.doc(int)
> is
> > much slower.
> > All our performance tests showed performance degradation after changing
> to
> > 8.7.0, even without measurement we can "see/feel" the operations
> involving
> > documents retrieval became slower.
> >
> >
> >
> > On Fri, Jan 22, 2021 at 4:48 PM Rob Audenaerde <rob.audenaerde@gmail.com
> >
> > wrote:
> >
> > > Hi Martynas
> > >
> > > How did you measure that?
> > >
> > > I ask, because writing a good benchmark is not an easy task, since
> there
> > > are so many factors (class loading times, JIT effects, etc). You should
> > use
> > > Java Microbenchmark Harness or similar; and set up a random document
> > > retrieval task, with warm-up etc.etc.
> > >
> > > (I'm not aware of any big slowdowns, but as you see them, the best way
> is
> > > to build a robust benchmark and then start comparing)
> > >
> > > -Rob
> > >
> > >
> > > On Fri, Jan 22, 2021 at 3:43 PM Martynas L <martynas.sub@gmail.com>
> > wrote:
> > >
> > > > Even retrieving single document 8.7.0 is more than x2 slower
> > > >
> > > > On Fri, Jan 22, 2021 at 2:28 PM Diego Ceccarelli (BLOOMBERG/ LONDON)
> <
> > > > dceccarelli4@bloomberg.net> wrote:
> > > >
> > > > > > I think it will be similar ratio retrieving any number of
> > documents.
> > > > >
> > > > > I'm not sure this is true, if you retrieve a huge amount of
> documents
> > > you
> > > > > might cause troubles to the GC.
> > > > >
> > > > > From: java-user@lucene.apache.org At: 01/22/21 12:11:19To:
> > > > > java-user@lucene.apache.org
> > > > > Subject: Re: Slower document retrieval in 8.7.0 comparing to 7.5.0
> > > > >
> > > > > The accent should not be on retrieved documents number, but on the
> > > > duration
> > > > > ratio - 8.7.0 is 3 times slower. I think it will be similar ratio
> > > > > retrieving any number of documents.
> > > > >
> > > > > On Fri, Jan 22, 2021 at 1:39 PM Rob Audenaerde <
> > > rob.audenaerde@gmail.com
> > > > >
> > > > > wrote:
> > > > >
> > > > > > Hi Martrynas,
> > > > > >
> > > > > > In your sample code you are retrieving all (1 million!) documents
> > > from
> > > > > the
> > > > > > index, that surely is not a good match for lucene :)
> > > > > >
> > > > > > Is that a good reflection of your use-case?
> > > > > >
> > > > > > On Fri, Jan 22, 2021 at 9:52 AM Martynas L <
> martynas.sub@gmail.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > > Please see the sample at
> > > > > > >
> > > > >
> > >
> https://drive.google.com/drive/folders/1ufVZXzkugBAFnuy8HLAY6mbPWzjknrfE
> > > > > > >
> > > > > > > IndexGenerator - creates a dummy index.
> > > > > > > IndexReader - retrieves documents - duration time with 7.5.0
> > > version
> > > > is
> > > > > > > ~2s, while ~6s with 8.7.0
> > > > > > >
> > > > > > > Regards,
> > > > > > > Martynas
> > > > > > >
> > > > > > >
> > > > > > > On Thu, Jan 21, 2021 at 8:21 PM Rob Audenaerde <
> > > > > rob.audenaerde@gmail.com
> > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > There is no attachment in the previous email that I can see?
> > > Maybe
> > > > > you
> > > > > > > can
> > > > > > > > post it online?
> > > > > > > >
> > > > > > > > On Thu, Jan 21, 2021 at 4:54 PM Martynas L <
> > > martynas.sub@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hello,
> > > > > > > > >
> > > > > > > > > Are there any comments on this issue?
> > > > > > > > > If there is no workaround, we will be forced to rollback to
> > the
> > > > > 7.5.0
> > > > > > > > > version.
> > > > > > > > >
> > > > > > > > > Best regards,
> > > > > > > > > Martynas
> > > > > > > > >
> > > > > > > > > On Tue, Jan 12, 2021 at 12:27 PM Martynas L <
> > > > > martynas.sub@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi,
> > > > > > > > > >
> > > > > > > > > > Please see attached sample.
> > > > > > > > > > IndexGenerator - creates a dummy index.
> > > > > > > > > > IndexReader - retrieves documents - duration time with
> > 7.5.0
> > > > > > version
> > > > > > > is
> > > > > > > > > > ~2s, while ~6s with 8.7.0
> > > > > > > > > >
> > > > > > > > > > Regards,
> > > > > > > > > > Martynas
> > > > > > > > > >
> > > > > > > > > > On Tue, Dec 22, 2020 at 3:23 PM Vincenzo D'Amore <
> > > > > > v.damore@gmail.com
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > >> I think it would be useful to have an example of a
> > document
> > > > and,
> > > > > > if
> > > > > > > > > >> possible, an example of query that takes too long.
> > > > > > > > > >>
> > > > > > > > > >> On Mon, Dec 21, 2020 at 1:47 PM Martynas L <
> > > > > > martynas.sub@gmail.com>
> > > > > > > > > >> wrote:
> > > > > > > > > >>
> > > > > > > > > >> > Hello,
> > > > > > > > > >> >
> > > > > > > > > >> > I am sorry for the delay.
> > > > > > > > > >> >
> > > > > > > > > >> > Not sure what you mean by "workload". We have a
> > > performance
> > > > > > tests,
> > > > > > > > > which
> > > > > > > > > >> > started failing after upgrading to 8.7.0.
> > > > > > > > > >> > So I just tried to query the index (built form the
> same
> > > > > source)
> > > > > > to
> > > > > > > > get
> > > > > > > > > >> all
> > > > > > > > > >> > documents and compare the performance with 7.5.0.
> > > > > > > > > >> >
> > > > > > > > > >> > Document "size" is a sum of all stored string lengths
> > > > (3402519
> > > > > > > > > >> documents):
> > > > > > > > > >> >
> > > > > > > > > >> > doc size 903 - 88s vs 22s
> > > > > > > > > >> >
> > > > > > > > > >> > doc size 36 (only one field loaded, used
> > > searcher.doc(docID,
> > > > > > > > > >> > Collections.singleton("fieldName"))) - 78s vs 16s
> > > > > > > > > >> >
> > > > > > > > > >> > doc size 439 (some fields made not stored) - 46s vs
> > 14.5s
> > > > > > > > > >> >
> > > > > > > > > >> > Best regards,
> > > > > > > > > >> > Martynas
> > > > > > > > > >> >
> > > > > > > > > >> > On Fri, Dec 4, 2020 at 12:06 AM Adrien Grand <
> > > > > jpountz@gmail.com
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > >> >
> > > > > > > > > >> > > Hello Martynas,
> > > > > > > > > >> > >
> > > > > > > > > >> > > There have indeed been changes related to stored
> > fields
> > > in
> > > > > > 8.7.
> > > > > > > > What
> > > > > > > > > >> does
> > > > > > > > > >> > > your workload look like and how large are your
> > documents
> > > > on
> > > > > > > > average?
> > > > > > > > > >> > >
> > > > > > > > > >> > > On Thu, Dec 3, 2020 at 3:04 PM Martynas L <
> > > > > > > martynas.sub@gmail.com
> > > > > > > > >
> > > > > > > > > >> > wrote:
> > > > > > > > > >> > >
> > > > > > > > > >> > > > Hi,
> > > > > > > > > >> > > > We've migrated from 7.5.0 to 8.7.0 and find out
> that
> > > the
> > > > > > index
> > > > > > > > > >> > > "searching"
> > > > > > > > > >> > > > is significantly (4-5 times) slower in the latest
> > > > version.
> > > > > > > > > >> > > > It seems that
> > > > > > > > > >> > > > org.apache.lucene.search.IndexSearcher#doc(int)
> > > > > > > > > >> > > > is slower.
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > Is it possible to have similar performance with
> > 8.7.0?
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > Best regards,
> > > > > > > > > >> > > > Martynas
> > > > > > > > > >> > > >
> > > > > > > > > >> > >
> > > > > > > > > >> > >
> > > > > > > > > >> > > --
> > > > > > > > > >> > > Adrien
> > > > > > > > > >> > >
> > > > > > > > > >> >
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >> --
> > > > > > > > > >> Vincenzo D'Amore
> > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
>