Jul 7, 2005, 10:12 AM
Post #9 of 10
(6652 views)
Permalink
Hi Erik, excuse me for all my questions. Thank you very much for your speedy
answers, and sorry for my bad english.
I am spanish and I don´t speak english very well.
Well, I have one question more.
Finally I am using IndexReader to return all the documents:
Directory directory = FSDirectory.getDirectory(path, false);
IndexReader reader = IndexReader.open(directory);
for (int start = base; start < end; start++) {
Document doc = reader.document(start);
String
id=doc.get(es.seinet.xtent.searchEngine.lucene.general.Util.ID);
ides.add(id);
}
It works fine and speedy. The only problem is that it is impossible to sort
the results by some metadata (gets all the documents order by title, for
example).
My question is about the parameter maxClauseCount. I think the same that
you. It is not a good idea bump up the limit...
If I use the default vale (1024) and I search, I am getting this error:
[SearchCollection,executeQuery] caught a class
org.apache.lucene.search.BooleanQuery$TooManyClauses
with message: null
Are there any way to search all the documents (210.000 documents) and
internally works only with 1024, returns documents until 1024 and not get
the toomanyclauses error??? I need to work efficiently with collections of
more than 250.000 regitries, and the users normally does complex querys (ej:
DATE:[20050601 to 20050701] AND TITLE:Lucene* ...... ect....)
Ah!! I have seen that you are Erik Hatcher, the author of Lucene In
Action!!!
I don´t understand you about the filter.... well, I will read the charter of
filtering a search :-D
Thanks in advance
Mari Luz
----- Original Message -----
From: "Erik Hatcher" <erik@ehatchersolutions.com>
To: <general@lucene.apache.org>
Sent: Thursday, July 07, 2005 5:53 PM
Subject: Re: OUTOFMEMORY ERROR
On Jul 7, 2005, at 9:40 AM, MariLuz Elola wrote:
> Thanks Erik,
> I was wrong, exactly the query that throws an OutOfMemory error is ==>
> ID:0* -ID:xtent.
> With the query ID:0* I have tried to reproduce the error, but the
> exception doen´t appear.
> Other thing, when the user searchs without using any query, internally I
> am creating the next query ==> ID:0* OR NOT ID:xtent.
That's a hairy query. I definitely do not recommend doing something
like that with prefix queries. Check out using a Filter for some of
this sort of thing also.
> And this query parsed by QueryParser I am obtaining ID:0* -ID:xtent
> (traslated ==> ID:0* AND NOT ID:xtent), isn´t? Is QueryParser working
> wrong???
It depends. By default, QueryParser uses OR as the default operator.
> About maxClauseCount (by default 1024), I am setting this property:
> org.apache.lucene.search.BooleanQuery.maxClauseCount=es.seinet.xtent.s
> earchEngine.lucene.general.Util.MAX_LUCENE_DOCUMENTS;
Bumping up that limit is not necessarily the best thing to do - I
recommend changing your approach to querying all documents rather
than trying to make BooleanQuery happy with an enormously inefficient
query.
Erik
>
> Mari Luz
>
> ----- Original Message ----- From: "Erik Hatcher"
> <erik@ehatchersolutions.com>
> To: <general@lucene.apache.org>
> Sent: Thursday, July 07, 2005 2:46 PM
> Subject: Re: OUTOFMEMORY ERROR
>
>
>
> On Jul 7, 2005, at 6:02 AM, MariLuz Elola wrote:
>
>> The query is ==> ID:0*
>> This query returns all the documents, exactly 210.000 documents.
>> If the user doesn´t specify any criterio in the user interface of
>> searching, the server searchs all the documents.
>>
>
> Doing a prefix query (which ID:0* is) internally builds a
> BooleanQuery OR'ing all unique terms in the ID field that begin with
> a "0". The built in limit is 1,024 clauses in a BooleanQuery.
>
> You will need to re-think your approach. If the goal is to return
> all documents, then use IndexReader to walk them. If the goal is to
> have a general user query expression where ID:0* would be entered you
> will need to account for that possibility with more system resources
> and bumping up the BooleanQuery limit or indexing differently so that
> there are no so many terms being put into the BooleanQuery. It is
> difficult to offer specific advice as I'm not sure what your use
> cases are.
>
> Erik
>
>
>
>
>>
>> Mari Luz
>>
>>
>>
>> Untitled Document ---------------------------------------------------
>> Mari Luz Elola Developer Engineer Caleruega, 67 28033 Madrid (Spain)
>> Tel.: +34 91 768 46 58 mailto:
>> ola@seinet.es ---------------------------------------------------
>> Privileged/ Confidential Information may be contained in this message
>> and is intended solely for the use of the named addressee(s). Access to
>> this e-mail by anyone else is unauthorised. If you are not the intended
>> recipient, any disclosure, copying, distribution or re- use of the
>> information contained in it is prohibited and may be unlawful.
>> Opinions, conclusions and any other information contained in this
>> message that do not relate to the official business of Seinet shall be
>> understood as neither given nor endorsed by it. If you have received
>> this communication in error, please notify us immediately by replying
>> to this mail and deleting it from your computer. Thank you.
>> ----- Original Message ----- From: "Erik Hatcher"
>> <erik@ehatchersolutions.com>
>> To: <general@lucene.apache.org>
>> Sent: Wednesday, July 06, 2005 8:12 PM
>> Subject: Re: OUTOFMEMORY ERROR
>>
>>
>> We'll need some more details to help. What query was it?
>>
>> Erik
>>
>> On Jul 6, 2005, at 1:22 PM, MariLuz Elola wrote:
>>
>>
>>
>>> Hi, I have a problem when I am trying to search a simple query
>>> without sorting into an index with 210.000 documents.
>>> Executing the query several times I am getting the OutOfMemory error.
>>> I am creating an IndexSearcher(pathDir) every search.
>>> I don´t know if it will be necessary to create only one indexSearcher
>>> and caching it,
>>> If I search into an index with only 50.000 documents, the outofMemory
>>> error doen´t appear.
>>> ------------------------
>>> ENVIROMENT DESCRIPTION:
>>> ------------------------
>>>
>>> ---SERVER---
>>> MEMORY 2GB
>>> APP SERVER Jboss3.2.3
>>> JAVA_OPTS -Xmx640M -Xms640M
>>>
>>> ----LUCENE 1.4.3-------
>>> INDEX +- 210.000 documents
>>> EACH DOCUMENT +- 20 fields (metadatas)
>>> SIZE TEXT DOCUMENT 1k
>>>
>>> ------------------------
>>> ERROR:
>>> ------------------------
>>> 18:52:18,657 ERROR [LogInterceptor] Unexpected Error:
>>> java.lang.OutOfMemoryError
>>> 18:52:18,657 ERROR [LogInterceptor] Unexpected Error:
>>> java.lang.OutOfMemoryError
>>> 18:52:18,660 ERROR [STDERR] java.rmi.ServerError: Unexpected Error;
>>> nested exception is:
>>> java.lang.OutOfMemoryError
>>> 18:52:18,661 ERROR [STDERR] at
>>> org.jboss.ejb.plugins.LogInterceptor.handleException
>>> (LogInterceptor.java:374)
>>> 18:52:18,661 ERROR [STDERR] at
>>> org.jboss.ejb.plugins.LogInterceptor.invoke(LogInterceptor.java:195)
>>> 18:52:18,661 ERROR [STDERR] at
>>> org.jboss.ejb.plugins.ProxyFactoryFinderInterceptor.invoke
>>> (ProxyFactoryFinderInterceptor.java:122)
>>> 18:52:18,662 ERROR [STDERR] at
>>> org.jboss.ejb.StatelessSessionContainer.internalInvoke
>>> (StatelessSessionContainer.java:331)
>>> 18:52:18,662 ERROR [STDERR] at org.jboss.ejb.Container.invoke
>>> (Container.java:700)
>>> 18:52:18,662 ERROR [STDERR] at
>>> sun.reflect.GeneratedMethodAccessor40.invoke(Unknown Source)
>>> 18:52:18,662 ERROR [STDERR] at
>>> sun.reflect.DelegatingMethodAccessorImpl.invok
>>> .
>>> .
>>> Exception java.lang.OutOfMemoryError: requested 4 bytes for CMS: Work
>>> queue overflow; try -XX:-CMSParallelRemarkEnabled. Out of swap space?
>>>
>>>
>>> Could anybody help me???
>>>
>>> Thanks in advance
>>>
>>> Mari Luz
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>
>