Hi,
Long.MIN_VALUE and Long.MAX_VALUE are the correct way for longs to sort.
In fact if you have Long.MIN_VALUE in your collection, empty values are
treated the same, but still empty value will appear at the wanted place.
In contrast to the default "0", it is not somewhere in the middle.
Because there is no long that is smaller than Long.MIN_VALUE, the sort
order will be OK.
BTW, Apache Solr is using exactly those values to support missing values
automatically (see sortMissingFirst, sortMissingLast schema options).
In fact, string/bytes sorting has theoretically the same problem,
because NULL is still different that empty. WARNING: If you really want
to compare by byte[] as suggested in your last mail, keep in mind: When
you sort against the raw bytes (using NumericUtils) with SORTED_SET
docvalues type, there is a large overhead on indexing and sorting
performance, especially for the case where you have many different
values in your index (which is likely for numerics).
Uwe
Am 17.11.2022 um 08:47 schrieb Adrien Grand:
> Hi Petko,
>
> Lucene's comparators for numerics have this limitation indeed. We haven't
> got many questions around that in the past, which I would guess is due to
> the fact that most numeric fields do not use the entire long range,
> specifically Long.MIN_VALUE and Long.MAX_VALUE, so using either of these
> works as a way to sort missing values first or last. If you have a field
> that may use Long.MIN_VALUE and long.MAX_VALUE, we do not have a comparator
> that can easily sort missing values first or last reliably out of the box.
>
> The easier option I can think of would consist of using the comparator for
> longs with MIN_VALUE / MAX_VALUE for missing values depending on whether
> you want missing values sorted first or last, and chain it with another
> comparator (via a FieldComparatorSource) which would sort missing values
> before/after existing values. The benefit of this approach is that you
> would automatically benefit from some not-so-trivial features of Lucene's
> comparator such as dynamic pruning.
>
> On Wed, Nov 16, 2022 at 9:16 PM Petko Minkov <pminkov@gmail.com> wrote:
>
>> Hello,
>>
>> When sorting documents by a NumericDocValuesField, how can documents be
>> ordered such that those with missing values can come before anything else
>> in ascending sorts? SortField allows to set a missing value:
>>
>> var sortField = new SortField("price", SortField.Type.LONG);
>> sortField.setMissingValue(null);
>>
>> This null is however converted into a long 0 and documents with missing
>> values are considered equally ordered with documents with an actual 0
>> value. It's possible to set the missing value to Long.MIN_VALUE, but that
>> will have the same problem, just for a different long value.
>>
>> Besides writing a custom comparator, is there any simpler and still
>> performant way to achieve this sort?
>>
>> --Petko
>>
>
--
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de eMail: uwe@thetaphi.de
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org