Mailing List Archive

Obtaining all results efficiently. Closing a searcher.
Dear Lucene'd,

Suppose I would like to retrieve all docs that are resulting from a query.
I should then use the search() call with the HitCollector argument
which is called back with collect(docNr, score)

Would it be wise to sort by docNr when using IndexReader.doc(docNr) to
get to the stored fields?

And another question. I looked at the source for searcher.close() and
found that it closes its reader. Does that close the index reader used
to perform the search? That would interfere with a strategy to keep
an index reader open as long as possible and share it between threads.

Thanks in advance,
Ype

--

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
RE: Obtaining all results efficiently. Closing a searcher. [ In reply to ]
> From: Ype Kingma [mailto:ykingma@xs4all.nl]
>
> Suppose I would like to retrieve all docs that are resulting
> from a query.
> I should then use the search() call with the HitCollector argument
> which is called back with collect(docNr, score)
>
> Would it be wise to sort by docNr when using IndexReader.doc(docNr) to
> get to the stored fields?

Yes, that would be the most efficient way. Note that HitCollector.collect()
is called with increasing document ids, so you don't actually have to
sort--they arrive in order.

> And another question. I looked at the source for searcher.close() and
> found that it closes its reader. Does that close the index reader used
> to perform the search? That would interfere with a strategy to keep
> an index reader open as long as possible and share it between threads.

You shouldn't close the Searcher. Reuse it too.

I may have confused things in an earlier message. One should keep a single
Searcher which uses a single IndexReader. Thus the code I posted earlier
for caching an IndexReader should really be caching a Searcher. The
IndexReader will be cached implicitly by the IndexSearcher. Thus the
correct code for caching the IndexReader in "latest" should be:

private Searcher searcher;
private long lastModified;
public synchronized Searcher getSearcher() throws IOException {
if (lastModified != IndexReader.lastModified("latest")) {
// there's a new index: open it
lastModified = IndexReader.lastModified("latest");
searcher = new IndexSearcher("latest");
}
return searcher;
}

Doug

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
RE: Obtaining all results efficiently. Closing a searcher. [ In reply to ]
Are you implying ( ... public synchronized Searcher getSearcher()....) to
use this synchronized method in a servlet/jsp thread as well? Is that a
good idea? Your jhtml example doesn't appear to synchronzied. Maybe I'm
missing something though.




Doug Cutting
<DCutting@grandce To: "'Lucene Users List'"
ntral.com> <lucene-user@jakarta.apache.org>
cc:
01/31/2002 02:11 Subject: RE: Obtaining all results
PM efficiently. Closing a searcher.
Please respond to
"Lucene Users
List"






> From: Ype Kingma [mailto:ykingma@xs4all.nl]
>
> Suppose I would like to retrieve all docs that are resulting
> from a query.
> I should then use the search() call with the HitCollector argument
> which is called back with collect(docNr, score)
>
> Would it be wise to sort by docNr when using IndexReader.doc(docNr) to
> get to the stored fields?

Yes, that would be the most efficient way. Note that HitCollector.collect
()
is called with increasing document ids, so you don't actually have to
sort--they arrive in order.

> And another question. I looked at the source for searcher.close() and
> found that it closes its reader. Does that close the index reader used
> to perform the search? That would interfere with a strategy to keep
> an index reader open as long as possible and share it between threads.

You shouldn't close the Searcher. Reuse it too.

I may have confused things in an earlier message. One should keep a single
Searcher which uses a single IndexReader. Thus the code I posted earlier
for caching an IndexReader should really be caching a Searcher. The
IndexReader will be cached implicitly by the IndexSearcher. Thus the
correct code for caching the IndexReader in "latest" should be:

private Searcher searcher;
private long lastModified;
public synchronized Searcher getSearcher() throws IOException {
if (lastModified != IndexReader.lastModified("latest")) {
// there's a new index: open it
lastModified = IndexReader.lastModified("latest");
searcher = new IndexSearcher("latest");
}
return searcher;
}

Doug

--
To unsubscribe, e-mail: <
mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <
mailto:lucene-user-help@jakarta.apache.org>





--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
RE: Obtaining all results efficiently. Closing a searcher. [ In reply to ]
> From: Jonathan_Wasson@dom.com [mailto:Jonathan_Wasson@dom.com]
>
> Are you implying ( ... public synchronized Searcher
> getSearcher()....) to
> use this synchronized method in a servlet/jsp thread as
> well?

Yes.

> Your jhtml example doesn't appear to
> synchronzied. Maybe I'm missing something though.

That's because it uses a Hashtable, which is already synchronized. But in
the method I sent today, when writing the long lastModified, explicit
synchronization is required. (Java does not guarantee that 64-bit stores
are atomic.) It also wouldn't hurt to synchronize in the jhtml example,
although I don't think its required. It would keep multiple threads from
potentially re-opening the same index several times when it is modified,
improving efficiency.

Doug

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>