Mailing List Archive

RE: [EXT] Re: Efficient sort on SortedDocValues
Ah, of course. Thanks Mikhail. I realized it was a silly question that only made sense to me since my query was MatchAll docs.


-----Original Message-----
From: Mikhail Khludnev <mkhl@apache.org>
Sent: Monday, November 7, 2022 2:44 AM
To: java-user@lucene.apache.org
Subject: [EXT] Re: Efficient sort on SortedDocValues

External Email: Use caution with links and attachments.

Hello, Andrei.
Docs are scored in-order (see Weight.scoreAll(), scoreRange()), just because underneath postings API is in-order. There are a few shortcuts/optimizations, but they only omit some iterations/segments like checking competitive scores and so one.

On Sun, Nov 6, 2022 at 1:35 AM Solodin, Andrei (TR Technology) <andrei.solodin@thomsonreuters.com.invalid> wrote:

> One more thing. While the test case passes now, it still iterates in
> index order. Which means that it still collects ~6.4K docs out of 10k matches.
> This is an improvement, but I am still wondering why it's not possible
> to iterate in the field older. Seems like that would provide
> substantial improvement.
>
> From: Solodin, Andrei (TR Technology)
> Sent: Saturday, November 5, 2022 5:18 PM
> To: java-user@lucene.apache.org
> Subject: RE: Efficient sort on SortedDocValues
>
> I just realized that the problem is that the field needs to be indexed
> as well. Now it works. But I noticed that this only works in Lucene 9.
> Does not work in Lucene 8 (specifically 8.11.2). This must be new
> functionality in Lucene 9?
>
> Thanks
>
>
> From: Solodin, Andrei (TR Technology)
> Sent: Saturday, November 5, 2022 1:07 PM
> To: java-user@lucene.apache.org<mailto:java-user@lucene.apache.org>
> Subject: Efficient sort on SortedDocValues
>
> Hello Lucene community, while looking into how to efficiently sort on
> a field value, I came across a couple of things that I don't quite
> understand. My assumption was that if I execute a search and sort on a
> SortedDocValues field, lucene would only iterate over the docs in the
> order of the field values or at least collect only competitive docs
> (docs that made it into the topN queue). Neither of those things seems
> to be happening. Instead, the iteration is happening in index order
> and all matched docs are collected. Looking at the code, I see that
> the optimizations are only possible if the index is sorted in the
> field order to begin with, which is not possible for our use case. We
> may have dozens of such fields in our index, thus there isn't any one
> field that can be used to sort the index. So I guess my question if
> what I am trying to achieve is possible? I tried to look though Solr
> codebase, but so far couldn't come up with anything. Code example is here
> https://urldefense.com/v3/__https://pastebin.com/i05E2wZy__;!!GFN0sa3rsbfR8OLyAw!c95-xVshQzkFU6u9GzqxAvTcVnO4j12lbVHEgMELOpR9AM2IQAtbX5NMIKynd2k2IkdjRAhIWUx2uOnWEm-cjg$ . I am using 9.4.1. Thanks in advance.
>
> Andrei
>
>

--
Sincerely yours
Mikhail Khludnev
???????????????????????????????????????????????????????????????????????F?V?7V'67&?&R?R???â?f?W6W"?V?7V'67&?&T?V6V?R?6?R??&p?f?"FF?F????6????G2?R???â?f?W6W"?V??V6V?R?6?R??&p?