Mailing List Archive

results sorting
Greetings,

New user to the list. Please forgive a perhaps silly question.

Am wondering if there is any facility to sort search hits by fields in the
Document.

Kind Regards,

Chris Opler



--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
RE: results sorting [ In reply to ]
> From: Chris Opler [mailto:chrisopler@free.fr]
>
> Am wondering if there is any facility to sort search hits by
> fields in the
> Document.

No, there's nothing like this built in to Lucene.

This can be very expensive with large collections, since it requires reading
a Document object for every hit. Reading a Document requires a
random-access disk read. And when someone includes a common word in a
query, there can be lots of hits, far more than will ever be viewed by the
user.

An exception is date sorting, which can be easily implemented using a
HitCollector. Documents are delivered to a hit collector in the order they
were added to the index, so returning the oldest or most recent hits can be
done without reading field values. This is discussed more in:
http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg00228.html
Someday this will be built into Lucene...

To implement efficient field sorting for a large collection you could
construct a fast index of a field (e.g., an in-memory array) and then
implement a HitCollector which uses this. For example, you could construct
an array of floats for a "price" field. Then your hit collector could do
something like:
class MyCollector implements HitCollector {
private float maxPrice = Float.MAX_VALUE;
public final void collect(int doc, float score) {
float price = prices[doc];
if (price <= maxPrice) {
hits.add(price, doc);
if (hits.size() > maxHitCount) {
hits.remove(hits.get(maxPrice));
maxPrice = hits.lastKey();
}
}
}
}

Also, if your collection is small, you can probably afford to simply
enumerate all hit documents and sort them as you wish.

Doug

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
Re: results sorting [ In reply to ]
Doug,

Thank you very much for you detailed response. I'll give your suggestions a
try. I'm *very* impressed so far with lucene. Performance is terrific.

Kind Regards,

Chris Opler

Doug Cutting wrote:

> > From: Chris Opler [mailto:chrisopler@free.fr]
> >
> > Am wondering if there is any facility to sort search hits by
> > fields in the
> > Document.
>
> No, there's nothing like this built in to Lucene.
>
> This can be very expensive with large collections, since it requires reading
> a Document object for every hit. Reading a Document requires a
> random-access disk read. And when someone includes a common word in a
> query, there can be lots of hits, far more than will ever be viewed by the
> user.
>
> An exception is date sorting, which can be easily implemented using a
> HitCollector. Documents are delivered to a hit collector in the order they
> were added to the index, so returning the oldest or most recent hits can be
> done without reading field values. This is discussed more in:
> http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg00228.html
> Someday this will be built into Lucene...
>
> To implement efficient field sorting for a large collection you could
> construct a fast index of a field (e.g., an in-memory array) and then
> implement a HitCollector which uses this. For example, you could construct
> an array of floats for a "price" field. Then your hit collector could do
> something like:
> class MyCollector implements HitCollector {
> private float maxPrice = Float.MAX_VALUE;
> public final void collect(int doc, float score) {
> float price = prices[doc];
> if (price <= maxPrice) {
> hits.add(price, doc);
> if (hits.size() > maxHitCount) {
> hits.remove(hits.get(maxPrice));
> maxPrice = hits.lastKey();
> }
> }
> }
> }
>
> Also, if your collection is small, you can probably afford to simply
> enumerate all hit documents and sort them as you wish.
>
> Doug
>
> --
> To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>

--
=======================
http://www.openwine.org



--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
Re: results sorting [ In reply to ]
Yep, I can confirm this kind of approach works. If you truly have
massive amounts of data and they are in memory, you might also want
to consider making the documents a monolithic byte/char array and
have a computed index (ie a roll ya own array). This was you avoid a
lot of GC work in high transaction systems. Make the byte[] a static
final and that memory should never get looked at by a GC.

Winton


>Doug,
>
>Thank you very much for you detailed response. I'll give your suggestions a
>try. I'm *very* impressed so far with lucene. Performance is terrific.
>
>Kind Regards,
>
>Chris Opler
>
>Doug Cutting wrote:
>
>> > From: Chris Opler [mailto:chrisopler@free.fr]
>> >
>> > Am wondering if there is any facility to sort search hits by
>> > fields in the
>> > Document.
>>
>> No, there's nothing like this built in to Lucene.
>>
>> This can be very expensive with large collections, since it requires reading
>> a Document object for every hit. Reading a Document requires a
>> random-access disk read. And when someone includes a common word in a
>> query, there can be lots of hits, far more than will ever be viewed by the
>> user.
>>
>> An exception is date sorting, which can be easily implemented using a
>> HitCollector. Documents are delivered to a hit collector in the order they
>> were added to the index, so returning the oldest or most recent hits can be
>> done without reading field values. This is discussed more in:
>> http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg00228.html
>> Someday this will be built into Lucene...
>>
>> To implement efficient field sorting for a large collection you could
>> construct a fast index of a field (e.g., an in-memory array) and then
>> implement a HitCollector which uses this. For example, you could construct
>> an array of floats for a "price" field. Then your hit collector could do
>> something like:
>> class MyCollector implements HitCollector {
>> private float maxPrice = Float.MAX_VALUE;
>> public final void collect(int doc, float score) {
>> float price = prices[doc];
>> if (price <= maxPrice) {
>> hits.add(price, doc);
>> if (hits.size() > maxHitCount) {
>> hits.remove(hits.get(maxPrice));
>> maxPrice = hits.lastKey();
>> }
>> }
>> }
>> }
>>
>> Also, if your collection is small, you can probably afford to simply
>> enumerate all hit documents and sort them as you wish.
>>
>> Doug
>>
>> --
>> To unsubscribe, e-mail:
>><mailto:lucene-user-unsubscribe@jakarta.apache.org>
>> For additional commands, e-mail:
>><mailto:lucene-user-help@jakarta.apache.org>
>
>--
>=======================
>http://www.openwine.org
>
>
>
>--
>To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
>For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


--

Winton Davies
Lead Engineer, Overture (NSDQ: OVER)
1820 Gateway Drive, Suite 360
San Mateo, CA 94404
work: (650) 403-2259
cell: (650) 867-1598
http://www.overture.com/


--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>