Hello
I have been readed about "Too many clauses"........... If the max was set too high, the inefficiency would make the search unsable.
I am testing the performance of Lucene and the time that spend Lucene in searching is too high. Moreover I´ve got OutOfMemory error several times.....
I am speaking about an index with 250.000 documents more or less, but in the future will be necessary an index with millions of documents.
These are the kinds of queries:
1. Greater than or lower than request
RangeQuery with Integer.MAX_VALUE for greater than or Integer.MIN_VALUE for lower than
2. RangeQuery
Example:
Field:[minValue to maxValue]
3.WildcardQuery
Example:
Field:value*
ect....
The problem is that PrefixQuery,WildcardQuery,RangeQuery and FuzzyQuery all expand to a series of OR'ed boolean queries.
I have read about BitSetQuery, FilteringQuery, ConstrantScoreQuery.......... I am confused!!!!!!
I can´t use a Filter (DateFilter, QueryFilter ect...) because the client wants to search for all the documents without filter for anything.
I can´t divide a field in subfields to do the query more specific. For example, the user wants the date with format YYYMMDDHHMMSS, not 6 fields, one with the year, one with the month, one with the day, one with de hour ect....
I can´t add more system resources.
My environment is the next:
----LUCENE 1.4.3-------
INDEX ==> 200.000 documents to million of documents
EACH DOCUMENT +- 20 fields (metadatas)
SIZE TEXT DOCUMENT 1k
-----SERVER (dedicated) -------
Red Hat
2 GB Memory
jboss + lucene
JAVA_OPTS -Xmx640M -Xms640M
My question is very simple...... Is it possible to use Lucene like full text search engine with the environment I have explained before, with the server that I have explained before, and doing the queries that I have explained before with an efficient performance and without OutOfMemoryError????
Thanks in advance
Mari Luz
---------------------------------------------------
Mari Luz Elola
Developer Engineer
Caleruega, 67
28033 Madrid (Spain)
Tel.: +34 91 768 46 58
mailto: melola@seinet.es
---------------------------------------------------
Privileged/Confidential Information may be contained in this message and is intended solely for the use of the named addressee(s). Access to this e-mail by anyone else is unauthorised. If you are not the intended recipient, any disclosure, copying, distribution or re-use of the information contained in it is prohibited and may be unlawful. Opinions, conclusions and any other information contained in this message that do not relate to the official business of Seinet shall be understood as neither given nor endorsed by it. If you have received this communication in error, please notify us immediately by replying to this mail and deleting it from your computer.
Thank you.
I have been readed about "Too many clauses"........... If the max was set too high, the inefficiency would make the search unsable.
I am testing the performance of Lucene and the time that spend Lucene in searching is too high. Moreover I´ve got OutOfMemory error several times.....
I am speaking about an index with 250.000 documents more or less, but in the future will be necessary an index with millions of documents.
These are the kinds of queries:
1. Greater than or lower than request
RangeQuery with Integer.MAX_VALUE for greater than or Integer.MIN_VALUE for lower than
2. RangeQuery
Example:
Field:[minValue to maxValue]
3.WildcardQuery
Example:
Field:value*
ect....
The problem is that PrefixQuery,WildcardQuery,RangeQuery and FuzzyQuery all expand to a series of OR'ed boolean queries.
I have read about BitSetQuery, FilteringQuery, ConstrantScoreQuery.......... I am confused!!!!!!
I can´t use a Filter (DateFilter, QueryFilter ect...) because the client wants to search for all the documents without filter for anything.
I can´t divide a field in subfields to do the query more specific. For example, the user wants the date with format YYYMMDDHHMMSS, not 6 fields, one with the year, one with the month, one with the day, one with de hour ect....
I can´t add more system resources.
My environment is the next:
----LUCENE 1.4.3-------
INDEX ==> 200.000 documents to million of documents
EACH DOCUMENT +- 20 fields (metadatas)
SIZE TEXT DOCUMENT 1k
-----SERVER (dedicated) -------
Red Hat
2 GB Memory
jboss + lucene
JAVA_OPTS -Xmx640M -Xms640M
My question is very simple...... Is it possible to use Lucene like full text search engine with the environment I have explained before, with the server that I have explained before, and doing the queries that I have explained before with an efficient performance and without OutOfMemoryError????
Thanks in advance
Mari Luz
---------------------------------------------------
Mari Luz Elola
Developer Engineer
Caleruega, 67
28033 Madrid (Spain)
Tel.: +34 91 768 46 58
mailto: melola@seinet.es
---------------------------------------------------
Privileged/Confidential Information may be contained in this message and is intended solely for the use of the named addressee(s). Access to this e-mail by anyone else is unauthorised. If you are not the intended recipient, any disclosure, copying, distribution or re-use of the information contained in it is prohibited and may be unlawful. Opinions, conclusions and any other information contained in this message that do not relate to the official business of Seinet shall be understood as neither given nor endorsed by it. If you have received this communication in error, please notify us immediately by replying to this mail and deleting it from your computer.
Thank you.