Mailing List Archive

Lucene index integrity... or lack of :-(
Hello,

I'm starting to wander how "bullet proof" are Lucene indexes? Do they
get corrupted easely? If so is there a way to rebuild them?

I'm started to get the following exception left and right...

"04/25 18:34:39 (Warning) Indexer.indexObjectWithValues:
java.io.IOException: _91.fnm already exists"

I build a little app (http://homepage.mac.com/zoe_info/) that uses
Lucene quiet extensively, and I would like to keep it that way. However,
I'm starting to have second thought about Lucene's reliability... :-(

I'm sure I'm doing something wrong somewhere, but I really cannot see
what...

Any help or insight greatly appreciated.

Thanks.

PA.


--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
Re: Lucene index integrity... or lack of :-( [ In reply to ]
there are some strange problems with FSDirectory, i have found that building
chuncks in a RAMDirectory and then merge these into a FSDirectory is more
stable than indexing directly into the FSDirectory, i ran into your problem
and the dreaded "too many open files" problems when indexing large documents
with many fields....

using a RAMDir as a middle man solved my problems...

mvh karl øie

On Friday 26 April 2002 13:54, petite_abeille wrote:
> Hello,
>
> I'm starting to wander how "bullet proof" are Lucene indexes? Do they
> get corrupted easely? If so is there a way to rebuild them?
>
> I'm started to get the following exception left and right...
>
> "04/25 18:34:39 (Warning) Indexer.indexObjectWithValues:
> java.io.IOException: _91.fnm already exists"
>
> I build a little app (http://homepage.mac.com/zoe_info/) that uses
> Lucene quiet extensively, and I would like to keep it that way. However,
> I'm starting to have second thought about Lucene's reliability... :-(
>
> I'm sure I'm doing something wrong somewhere, but I really cannot see
> what...
>
> Any help or insight greatly appreciated.
>
> Thanks.
>
> PA.


--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
Re: Lucene index integrity... or lack of :-( [ In reply to ]
> using a RAMDir as a middle man solved my problems...

Thanks. What's is your heuristic to flush the RAMDirectory? Also how do
you deal with System.exit() or application death? Eg, your are indexing
something and the application dies or is killed.

Thanks for any input.

R.


--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
Re: Lucene index integrity... or lack of :-( [ In reply to ]
that is a great problem with lucene as it uses a FSDir to store it has no
sence of transaction handling, for critical indexes i serialize a RAMdir to a
database blob, so i can performe a rollback if needed, but this is a enourmos
overhead....

> Thanks. What's is your heuristic to flush the RAMDirectory?
please explain this because i don't understand english that good :-(

mvh karl øie

On Friday 26 April 2002 14:23, petite_abeille wrote:
> > using a RAMDir as a middle man solved my problems...
>
> Thanks. What's is your heuristic to flush the RAMDirectory? Also how do
> you deal with System.exit() or application death? Eg, your are indexing
> something and the application dies or is killed.
>
> Thanks for any input.
>
> R.
Re: Lucene index integrity... or lack of :-( [ In reply to ]
>> Thanks. What's is your heuristic to flush the RAMDirectory?
> please explain this because i don't understand english that good :-(

That's ok, I don't really understand English either :-)

Simply put, when do you "flush" the RAMDirectory into the FSDirectory?
Every five documents? Ten? A thousand? What is a good balance between
RAM and FS?

Thanks.

PA.


--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
Re: Lucene index integrity... or lack of :-( [ In reply to ]
ah, now i see, what i have is a server with 512mb of ram, so i have used two
different approaches and both works ok;

1 - i index a fixed number of documents into a RAMDir, like 10 (each of the
docs are xml docs about 1,5-2mb) and then i optimize the RAMDir and merge it
into the FSDir and then optimize the FSDir...

2 - i use the Runtime.freeMemory() and Runtime.totalMemory() to see if i have
reached more than 80% of the available memory, if so i optimize the RAMDir,
merge it and optimize the FSDir..., if not i just add more documents to the
RAMDir....

as far as i have tested i have never experienced a failure while merging a
RAMDir into a FSDir regardless of size, so it's my systems memory that is the
problem....

mvh karl øie


On Friday 26 April 2002 15:33, petite_abeille wrote:
> >> Thanks. What's is your heuristic to flush the RAMDirectory?
> >
> > please explain this because i don't understand english that good :-(
>
> That's ok, I don't really understand English either :-)
>
> Simply put, when do you "flush" the RAMDirectory into the FSDirectory?
> Every five documents? Ten? A thousand? What is a good balance between
> RAM and FS?
>
> Thanks.
>
> PA.


--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
Re: Lucene index integrity... or lack of :-( [ In reply to ]
forgot this:

its a bit hard to determine a good number of balance while indexing XML
documents because the internal relations of a DOM can make a XML document
become nearly 21 times as big in memory compared to disk (i am not lying, i
have seen it my self)...

also the RAMDir must be kept in memory while indexing and merging, so checking
the systems free memory is easier that trying to calculate memoryusage....

mvh karl øie



--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
Re: Lucene index integrity... or lack of :-( [ In reply to ]
> ah, now i see, what i have is a server with 512mb of ram, so i have
> used two
> different approaches and both works ok;

Thanks a lot! I will give it a try...

PA.


--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
Re: Lucene index integrity... or lack of :-( [ In reply to ]
> also the RAMDir must be kept in memory while indexing and merging, so
> checking
> the systems free memory is easier that trying to calculate
> memoryusage....

I see... I don't deal with XML so I guess I have a better grasp on the
memory requirements of my objects. In any case, I'm afraid I might be
abusing Lucene a bit, as build a kind of oodbms on top of it... Oh,
well...

Thanks for your help.

PA.


--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
Re: Lucene index integrity... or lack of :-( [ In reply to ]
Morning,

> I'm starting to wander how "bullet proof" are Lucene indexes? Do they
>
> get corrupted easely? If so is there a way to rebuild them?

There is no tool to detect index corruption, fixing of indexing, nor
index rebuilding.
The last one anyone can/has to do on their own.

> I'm started to get the following exception left and right...
>
> "04/25 18:34:39 (Warning) Indexer.indexObjectWithValues:
> java.io.IOException: _91.fnm already exists"

I've seen people asking about this on the list, but I never encountered
this particular exception.

> I build a little app (http://homepage.mac.com/zoe_info/) that uses
> Lucene quiet extensively, and I would like to keep it that way.
> However,
> I'm starting to have second thought about Lucene's reliability... :-(
>
> I'm sure I'm doing something wrong somewhere, but I really cannot see
>
> what...

Maybe it's not a Lucene issue then, although I've seen this mentioned
so often, which means that documentation could be improved to prevent
people from making the same mistakes that others have already made.

Otis


__________________________________________________
Do You Yahoo!?
Yahoo! Games - play chess, backgammon, pool and more
http://games.yahoo.com/

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
Re: Lucene index integrity... or lack of :-( [ In reply to ]
Hello again,

> There is no tool to detect index corruption, fixing of indexing, nor
> index rebuilding.
> The last one anyone can/has to do on their own.

:-( Well, that *very* sad to say the least... How do I know if my
indexes are not corrupted even if everything seems to be working fine?
Don't tell me I'm the first one to run into this kind of issues?!? How
can I "trust" an index if there is *no* way of checking its integrity?
And even if you happen to notice that something is fishy, there is no
way to rebuild the index -short or re-indexing everything from scratch?
That does not sound like a very "healthy" situation to me. "Fragile"
will be kind for describing it...

> I've seen people asking about this on the list, but I never encountered
> this particular exception.

Lucky you...

> Maybe it's not a Lucene issue then, although I've seen this mentioned
> so often, which means that documentation could be improved to prevent
> people from making the same mistakes that others have already made.

Maybe, maybe not. And most likely I'm doing something odd. In any case,
could you point me to the "mistakes that others have already made"? Or
did I miss something obvious here?

Thanks.

PA


--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
Re: Lucene index integrity... or lack of :-( [ In reply to ]
Hello,

> > There is no tool to detect index corruption, fixing of indexing,
> nor
> > index rebuilding.
> > The last one anyone can/has to do on their own.
>
> :-( Well, that *very* sad to say the least... How do I know if my
> indexes are not corrupted even if everything seems to be working
> fine?
> Don't tell me I'm the first one to run into this kind of issues?!?
> How
> can I "trust" an index if there is *no* way of checking its
> integrity?
> And even if you happen to notice that something is fishy, there is no
>
> way to rebuild the index -short or re-indexing everything from
> scratch?
> That does not sound like a very "healthy" situation to me. "Fragile"
> will be kind for describing it...

Yes, that's all unfortunate. If you come up with anything, please
share it. Or, you can use Lucene Sandbox and develop stuff there.

> > I've seen people asking about this on the list, but I never
> encountered
> > this particular exception.
>
> Lucky you...

:)

> > Maybe it's not a Lucene issue then, although I've seen this
> mentioned
> > so often, which means that documentation could be improved to
> prevent
> > people from making the same mistakes that others have already made.
>
> Maybe, maybe not. And most likely I'm doing something odd. In any
> case,
> could you point me to the "mistakes that others have already made"?
> Or
> did I miss something obvious here?

Nah, the only thing I can suggest is check the lists' archives, that is
where mistakes of others would be recorded.

Otis


__________________________________________________
Do You Yahoo!?
Yahoo! Games - play chess, backgammon, pool and more
http://games.yahoo.com/

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>