Mailing List Archive

Restrict number of hits
Is there any way to restrict the number of hits returned by a query?
I would like to have the functionality of getting only the X last
documents which respect to a given query condition. I've seen
something on making an HitCollector class, but if there is a plenty
more than X documents which respect the condition, that doesn't mean
that they would all be processed? How can I avoid that?

Thanks in advance,
Hélder Ribeiro
__________________________________________________________
Queima das Fitas do Porto 5 a 12 de Maio
Reserve bilhetes online em http://queima2002.aeiou.pt

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
Re: Restrict number of hits [ In reply to ]
Hi,

The FAQ answer on this one is

17. How can I restrict the number of hits ?

As far as we know, there is no way at this time (Lucene 1.0, May 2001) to
instruct Lucene to collect only a specified number of hits. However, when
you get the hit list back from the search method, you can ignore the ones
you don't need.



It doesn't seem to be a problem to have many hits because internally only
the first 200 (I think that's the right number), are kept in an active stack
to start. It will grab more if necessary. Also, Lucene is optimized to not
get the Hit information from the index until it is required.

Why are you trying to restrict the total hits?

--Peter


On 5/10/02 10:42 AM, "helrib@aeiou.pt" <helrib@aeiou.pt> wrote:

> there any way to restrict the number of hits returned by a query?
> I would like to have the functionality of getting only the X last
> documents which respect to a given query condition. I've seen
> something on making an HitCollector class, but if there is a plenty
> more than X documents which respect the condition, that doesn't mean
> that they would all be processed? How can I avoid that?
>
> Thanks in advance,
> Hélder Ribeiro


--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
Re: Restrict number of hits [ In reply to ]
> Hi,
>
> The FAQ answer on this one is
>
> 17. How can I restrict the number of hits ?
>
> As far as we know, there is no way at this time (Lucene 1.0, May
2001) to
> instruct Lucene to collect only a specified number of hits.
However, when
> you get the hit list back from the search method, you can ignore
the ones
> you don't need.
>
>
>
> It doesn't seem to be a problem to have many hits because
internally only
> the first 200 (I think that's the right number), are kept in an
active
> stac> k
> to start. It will grab more if necessary. Also, Lucene is
optimized to not
> get the Hit information from the index until it is required.
>
> Why are you trying to restrict the total hits?
I didn't knew that only the first 200 are returned at the beginning.
I'm trying to use lucene not only as a Indexing Engine but also as a
Entry Tracker to get the latest entries that respect some given
conditions. If all the returned documents have the same score, it's
guaranteed that the first 200 returned are the last ones that were
published?

../
Hélder Ribeiro


>
> --Peter
>
>
> On 5/10/02 10:42 AM, "helrib@aeiou.pt" <helrib@aeiou.pt> wrote:
>
> > there any way to restrict the number of hits returned by a query?
> > I would like to have the functionality of getting only the X last
> > documents which respect to a given query condition. I've seen
> > something on making an HitCollector class, but if there is a
plenty
> > more than X documents which respect the condition, that doesn't
mean
> > that they would all be processed? How can I avoid that?
> >
> > Thanks in advance,
> > Hélder Ribeiro

__________________________________________________________
Queima das Fitas do Porto 5 a 12 de Maio
Reserve bilhetes online em http://queima2002.aeiou.pt

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
Re: Restrict number of hits [ In reply to ]
I think it's just the opposite.
The order is the first ones added are the first ones you get fifo.

It's not that Lucene doesn't always find all results, it just optimizes
opening the Document record for each item returned only when they are
necessary.

So you go to the last item returned using doc(indexWanted) and this will
return the document you want.

If they are at the end, this may not be optimized for getting just the
those. You'd have to look.

Either way, it should still be fast.
What is the number of documents you are looking at, and what are your
performance requirements?

--Peter

On 5/11/02 5:41 PM, "helrib@aeiou.pt" <helrib@aeiou.pt> wrote:

>> Hi,
>>
>> The FAQ answer on this one is
>>
>> 17. How can I restrict the number of hits ?
>>
>> As far as we know, there is no way at this time (Lucene 1.0, May
> 2001) to
>> instruct Lucene to collect only a specified number of hits.
> However, when
>> you get the hit list back from the search method, you can ignore
> the ones
>> you don't need.
>>
>>
>>
>> It doesn't seem to be a problem to have many hits because
> internally only
>> the first 200 (I think that's the right number), are kept in an
> active
>> stac> k
>> to start. It will grab more if necessary. Also, Lucene is
> optimized to not
>> get the Hit information from the index until it is required.
>>
>> Why are you trying to restrict the total hits?
> I didn't knew that only the first 200 are returned at the beginning.
> I'm trying to use lucene not only as a Indexing Engine but also as a
> Entry Tracker to get the latest entries that respect some given
> conditions. If all the returned documents have the same score, it's
> guaranteed that the first 200 returned are the last ones that were
> published?
>
> ../
> Hélder Ribeiro
>
>
>>
>> --Peter
>>
>>
>> On 5/10/02 10:42 AM, "helrib@aeiou.pt" <helrib@aeiou.pt> wrote:
>>
>>> there any way to restrict the number of hits returned by a query?
>>> I would like to have the functionality of getting only the X last
>>> documents which respect to a given query condition. I've seen
>>> something on making an HitCollector class, but if there is a
> plenty
>>> more than X documents which respect the condition, that doesn't
> mean
>>> that they would all be processed? How can I avoid that?
>>>
>>> Thanks in advance,
>>> Hélder Ribeiro
>
> __________________________________________________________
> Queima das Fitas do Porto 5 a 12 de Maio
> Reserve bilhetes online em http://queima2002.aeiou.pt
>
> --
> To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
>
>


--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>