Mailing List Archive

how to decide when the index needs to be optimized ?
Hello !

We're building a Document Management System and we're using Lucene to
index the
document contents. Initially when we're populating our database we're
adding the
documents to the index also. We're also Optimizing the index after
adding the
documents to the index. Now over a period of time more doucments will be
added to
the index. So it's understabdable that after a period of time the index
will be
unoptimized. Now is there some way we can detect that the index needs
optimizaion.
Or we'll just have to keep optimizing the index, say for every n
documents being
added to the index, and if so how do we really figure out how many
documents we
can add before optimizing the index.

Can anyone throw some light on this ?

Regards
-goutam-



--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
Re: how to decide when the index needs to be optimized ? [ In reply to ]
My understanding it that you don't even have to optimize the index,
unless you want your searches to be faster.
I don't think Lucene has any internal limitation to the number of files
that comprise an unoptimized index, so you'll hit the wall with Java or
OS first, but even that limit is pretty high.
You could just optimize every X documents or at the end of indexing.

Otis



--- "Biswas, Goutam_Kumar" <Goutam-Kumar-Biswas@deshaw.com> wrote:
> Hello !
>
> We're building a Document Management System and we're using
> Lucene to
> index the
> document contents. Initially when we're populating our database
> we're
> adding the
> documents to the index also. We're also Optimizing the index
> after
> adding the
> documents to the index. Now over a period of time more doucments
> will be
> added to
> the index. So it's understabdable that after a period of time the
> index
> will be
> unoptimized. Now is there some way we can detect that the index
> needs
> optimizaion.
> Or we'll just have to keep optimizing the index, say for every n
> documents being
> added to the index, and if so how do we really figure out how
> many
> documents we
> can add before optimizing the index.
>
> Can anyone throw some light on this ?
>
> Regards
> -goutam-
>
>
>
> --
> To unsubscribe, e-mail:
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>
>


__________________________________________________
Do You Yahoo!?
Yahoo! Tax Center - online filing with TurboTax
http://taxes.yahoo.com/

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
RE: how to decide when the index needs to be optimized ? [ In reply to ]
Hi,
I was using the following to do analysis on our document management system
that uses lucene-
opimization counter(how often optimize() should be called, this seems to
help to clean up the deletable files even if you are not interested in
speeding up the searches)
Merge factor - decides how often segments should be merged
Max Merge factor- upper limit on number of documents that can be merged
JVM heap size - determines how much heap should be given to the java process
that uses lucene (-Xmx520m)

If there are any others, I would like to know.
Aruna.

-----Original Message-----
From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
Sent: Thursday, April 11, 2002 11:35 AM
To: Lucene Users List
Subject: Re: how to decide when the index needs to be optimized ?


My understanding it that you don't even have to optimize the index,
unless you want your searches to be faster.
I don't think Lucene has any internal limitation to the number of files
that comprise an unoptimized index, so you'll hit the wall with Java or
OS first, but even that limit is pretty high.
You could just optimize every X documents or at the end of indexing.

Otis



--- "Biswas, Goutam_Kumar" <Goutam-Kumar-Biswas@deshaw.com> wrote:
> Hello !
>
> We're building a Document Management System and we're using
> Lucene to
> index the
> document contents. Initially when we're populating our database
> we're
> adding the
> documents to the index also. We're also Optimizing the index
> after
> adding the
> documents to the index. Now over a period of time more doucments
> will be
> added to
> the index. So it's understabdable that after a period of time the
> index
> will be
> unoptimized. Now is there some way we can detect that the index
> needs
> optimizaion.
> Or we'll just have to keep optimizing the index, say for every n
> documents being
> added to the index, and if so how do we really figure out how
> many
> documents we
> can add before optimizing the index.
>
> Can anyone throw some light on this ?
>
> Regards
> -goutam-
>
>
>
> --
> To unsubscribe, e-mail:
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>
>


__________________________________________________
Do You Yahoo!?
Yahoo! Tax Center - online filing with TurboTax
http://taxes.yahoo.com/

--
To unsubscribe, e-mail:
<mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail:
<mailto:lucene-user-help@jakarta.apache.org>

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>