Mailing List Archive

RE: corrupted index
I changed the recipient from -user to -dev list, as that seems more
appropriate.
I think this would not be a bad idea, if we do it right.
Things like IndexLockedException, etc. sound alright to me.
I think Doug once welcomed such a change on one of the lists, too.

Perhaps a list of suggested exceptions, new exception classes and
appropriate patches would be the best contribution.

Thanks,
Otis

--- Matt Tucker <matt@jivesoftware.com> wrote:
> Hey all,
>
> Actually, using shutdown hooks might not be the best idea since
> Lucene is very
> often used in server-side Java environments. Many app-servers throw
> security
> errors when trying to add shutdown hooks, and I've seen Weblogic
> crash before
> when having them in a webapp. Has anyone else run into this?
>
> This all brings up a key issue with Lucene, which is that there is
> little way
> to recover from errors gracefully. I'd love to see a number of
> checked
> exceptions added. For example:
>
> IndexNotFoundException -- when trying to open an index that doesn't
> exist
> IndexLockedException -- when a lock file prevents you from getting
> an index
> IndexCorruptException -- maybe this would be thrown when an index
> appears to
> be broken?
>
> At the moment, Lucene throws many undocumented IOExceptions and even
> NullPointerExceptions when an error case comes up. I catch these in
> my app, but
> there's really not an intelligent way to recover from them. Adding
> checked
> exceptions would be a change of the API, but it seems worth it. I'd
> be happy to
> make a more specific proposal if other people feel like this would be
> a
> worthwhile direction to go in.
>
> Regards,
> Matt
>
> Quoting "Spencer, Dave" <dave@lumos.com>:
>
> > Runtime.addShutdownHook:
> >
> >
> >
> >
>
http://java.sun.com/j2se/1.3/docs/api/java/lang/Runtime.html#addShutdown
> > Hook(java.lang.Thread)
> >
> > -----Original Message-----
> > From: Otis Gospodnetic [ mailto:otis_gospodnetic@yahoo.com]
> > Sent: Sunday, March 17, 2002 12:06 AM
> > To: Lucene Users List
> > Subject: Re: corrupted index
> >
> >
> > Oh, I just thought of something (wine does body good).
> > Perhaps one could use Runtime (the class) to catch the JVM shutdown
> and
> > do whatever is needed to prevent index corruption. I believe there
> are
> > some shutdown hook methods in there that may let you do that. I'm
> too
> > lazy to look up the API docs now, but I rememeber reading about
> that
> > once, and perhaps it was even mentioned on one of the 2 Lucene
> mailing
> > lists.
> >
> > On the other hand, it would be great to have a tool that can verify
> an
> > existing index. I don't know enough about the actual file
> structure
> > yet to write something like that, but maybe somebody else has done
> that
> > already or would like to contribute.
> >
> > Otis
> >
> >
> > --- "Steven J. Owens" <puffmail@darksleep.com> wrote:
> > > Otis,
> > >
> > > > You can remove the .lock file and try re-indexing or continuing
> > > > indexing where you left off.
> > > > I am not sure about the corrupt index. I have never seen it
> > > happen,
> > > > and I believe I recall reading some messages from Doug Cutting
> > > saying
> > > > that index should never be left in an inconsistent state.
> > >
> > > Obviously never "should" be, but if something's pulling the
> rug
> > > out from under his JRE, changes could be only partially written,
> > > right?
> > >
> > > Or is the writing format in some sense transactionally safe?
> > > I've never worked directly on something like this, but I worked
> at a
> > > database software company where they used transaction semantics
> and a
> > > journaling scheme to fake a "bulletproof" file system. Is this
> how
> > > the index-writing code is implemented?
> > >
> > > In general, I can guess Doug's response - just torch the old
> > > index directory and rebuild it; Lucene's indexing is fast enough
> that
> > > you don't need to get clever. This seems to be Doug's stance in
> > > general (i.e. "don't get fancy, I already put all the fanciness
> > > you'll
> > > need into extremely fast indexing and searching"). So far, it
> seems
> > > to work :-).
> > >
> > > > I could be making this up, though, so I suggest you search
> through
> > > > lucene-user and lucene-dev archives on www.mail-archive.com.
> > > > A search for "corrupt" should do it.
> > > > Once you figure things out maybe you can post a summary here.
> > >
> > > I got a little curious, so I went and did the searches.
> There
> > > is
> > > exactly one message in each list archive (dev and users) with the
> > > keyword "corrupt" in it. The lucene-users instance is
> irrelevant:
> > >
> > >
> >
>
http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg00557.html
> > >
> > > The lucene-dev instance is more useful:
> > >
> > >
> >
>
http://www.mail-archive.com/lucene-dev@jakarta.apache.org/msg00157.html
> > >
> > > It's a post from Doug, dated sept 27, 2001, about adding not
> > > just
> > > thread-safety but process-safety:
> > >
> > > It should be impossible to corrupt an index through the Lucene
> API.
> > > However if a Lucene process exits unexpectedly it can leave the
> > > index
> > > locked. The remedy is simply to, at a time when it is certain
> that
> > > no
> > > processes are accessing the index, remove all lock files.
> > >
> > > So it sounds like it's worth trying just removing the lock
> > > files.
> > > Hm, is there a way to come up with a "sanity check" you can run
> on an
> > > index to make sure it's not corrupted? This might be an
> excellent
> > > thing to reassure yourself with: something went wrong? Run a
> sanity
> > > check, if it fails just reindex.
> > >
> > > Steven J. Owens
> > > puff@darksleep.com



__________________________________________________
Do You Yahoo!?
Yahoo! Tax Center - online filing with TurboTax
http://http://taxes.yahoo.com/

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
RE: corrupted index [ In reply to ]
Matt,

I'd welcome a concrete proposal in this area. Probably we should wait until
we have a final 1.2 release out there before making such changes. Note that
this could be done compatibly if the new exceptions subclass
java.io.IOException.

Doug

> -----Original Message-----
> From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
> Sent: Monday, April 01, 2002 9:06 PM
> To: lucene-dev@jakarta.apache.org
> Cc: matt@jivesoftware.com
> Subject: RE: corrupted index
>
>
> I changed the recipient from -user to -dev list, as that seems more
> appropriate.
> I think this would not be a bad idea, if we do it right.
> Things like IndexLockedException, etc. sound alright to me.
> I think Doug once welcomed such a change on one of the lists, too.
>
> Perhaps a list of suggested exceptions, new exception classes and
> appropriate patches would be the best contribution.
>
> Thanks,
> Otis
>
> --- Matt Tucker <matt@jivesoftware.com> wrote:
> > Hey all,
> >
> > Actually, using shutdown hooks might not be the best idea since
> > Lucene is very
> > often used in server-side Java environments. Many app-servers throw
> > security
> > errors when trying to add shutdown hooks, and I've seen Weblogic
> > crash before
> > when having them in a webapp. Has anyone else run into this?
> >
> > This all brings up a key issue with Lucene, which is that there is
> > little way
> > to recover from errors gracefully. I'd love to see a number of
> > checked
> > exceptions added. For example:
> >
> > IndexNotFoundException -- when trying to open an index that doesn't
> > exist
> > IndexLockedException -- when a lock file prevents you from getting
> > an index
> > IndexCorruptException -- maybe this would be thrown when an index
> > appears to
> > be broken?
> >
> > At the moment, Lucene throws many undocumented IOExceptions
> and even
> > NullPointerExceptions when an error case comes up. I catch these in
> > my app, but
> > there's really not an intelligent way to recover from them. Adding
> > checked
> > exceptions would be a change of the API, but it seems worth it. I'd
> > be happy to
> > make a more specific proposal if other people feel like
> this would be
> > a
> > worthwhile direction to go in.
> >
> > Regards,
> > Matt
> >
> > Quoting "Spencer, Dave" <dave@lumos.com>:
> >
> > > Runtime.addShutdownHook:
> > >
> > >
> > >
> > >
> >
> http://java.sun.com/j2se/1.3/docs/api/java/lang/Runtime.html#a
> ddShutdown
> > > Hook(java.lang.Thread)
> > >
> > > -----Original Message-----
> > > From: Otis Gospodnetic [ mailto:otis_gospodnetic@yahoo.com]
> > > Sent: Sunday, March 17, 2002 12:06 AM
> > > To: Lucene Users List
> > > Subject: Re: corrupted index
> > >
> > >
> > > Oh, I just thought of something (wine does body good).
> > > Perhaps one could use Runtime (the class) to catch the
> JVM shutdown
> > and
> > > do whatever is needed to prevent index corruption. I
> believe there
> > are
> > > some shutdown hook methods in there that may let you do that. I'm
> > too
> > > lazy to look up the API docs now, but I rememeber reading about
> > that
> > > once, and perhaps it was even mentioned on one of the 2 Lucene
> > mailing
> > > lists.
> > >
> > > On the other hand, it would be great to have a tool that
> can verify
> > an
> > > existing index. I don't know enough about the actual file
> > structure
> > > yet to write something like that, but maybe somebody else has done
> > that
> > > already or would like to contribute.
> > >
> > > Otis
> > >
> > >
> > > --- "Steven J. Owens" <puffmail@darksleep.com> wrote:
> > > > Otis,
> > > >
> > > > > You can remove the .lock file and try re-indexing or
> continuing
> > > > > indexing where you left off.
> > > > > I am not sure about the corrupt index. I have never seen it
> > > > happen,
> > > > > and I believe I recall reading some messages from Doug Cutting
> > > > saying
> > > > > that index should never be left in an inconsistent state.
> > > >
> > > > Obviously never "should" be, but if something's pulling the
> > rug
> > > > out from under his JRE, changes could be only partially written,
> > > > right?
> > > >
> > > > Or is the writing format in some sense
> transactionally safe?
> > > > I've never worked directly on something like this, but I worked
> > at a
> > > > database software company where they used transaction semantics
> > and a
> > > > journaling scheme to fake a "bulletproof" file system. Is this
> > how
> > > > the index-writing code is implemented?
> > > >
> > > > In general, I can guess Doug's response - just
> torch the old
> > > > index directory and rebuild it; Lucene's indexing is fast enough
> > that
> > > > you don't need to get clever. This seems to be Doug's stance in
> > > > general (i.e. "don't get fancy, I already put all the fanciness
> > > > you'll
> > > > need into extremely fast indexing and searching"). So far, it
> > seems
> > > > to work :-).
> > > >
> > > > > I could be making this up, though, so I suggest you search
> > through
> > > > > lucene-user and lucene-dev archives on www.mail-archive.com.
> > > > > A search for "corrupt" should do it.
> > > > > Once you figure things out maybe you can post a summary here.
> > > >
> > > > I got a little curious, so I went and did the searches.
> > There
> > > > is
> > > > exactly one message in each list archive (dev and
> users) with the
> > > > keyword "corrupt" in it. The lucene-users instance is
> > irrelevant:
> > > >
> > > >
> > >
> >
> http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg
00557.html
> > >
> > > The lucene-dev instance is more useful:
> > >
> > >
> >
>
http://www.mail-archive.com/lucene-dev@jakarta.apache.org/msg00157.html
> > >
> > > It's a post from Doug, dated sept 27, 2001, about adding not
> > > just
> > > thread-safety but process-safety:
> > >
> > > It should be impossible to corrupt an index through the Lucene
> API.
> > > However if a Lucene process exits unexpectedly it can leave the
> > > index
> > > locked. The remedy is simply to, at a time when it is certain
> that
> > > no
> > > processes are accessing the index, remove all lock files.
> > >
> > > So it sounds like it's worth trying just removing the lock
> > > files.
> > > Hm, is there a way to come up with a "sanity check" you can run
> on an
> > > index to make sure it's not corrupted? This might be an
> excellent
> > > thing to reassure yourself with: something went wrong? Run a
> sanity
> > > check, if it fails just reindex.
> > >
> > > Steven J. Owens
> > > puff@darksleep.com



__________________________________________________
Do You Yahoo!?
Yahoo! Tax Center - online filing with TurboTax
http://http://taxes.yahoo.com/

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
RE: corrupted index [ In reply to ]
Doug,

Yep, I think waiting until after 1.2 would be a good idea. As I find
time over the next couple of weeks, I'll try to start putting together a
proposal.

A good short-term improvement would be to document the usage of
IOException in the Javadocs and explain when it might occur.

In terms of subclassing IOException -- sounds like it could be a good
approach.

Regards,
Matt

> -----Original Message-----
> From: Doug Cutting [mailto:DCutting@grandcentral.com]
> Sent: Tuesday, April 02, 2002 11:24 AM
> To: 'Lucene Developers List'
> Subject: RE: corrupted index
>
>
> Matt,
>
> I'd welcome a concrete proposal in this area. Probably we
> should wait until we have a final 1.2 release out there
> before making such changes. Note that this could be done
> compatibly if the new exceptions subclass java.io.IOException.
>
> Doug
>
> > -----Original Message-----
> > From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
> > Sent: Monday, April 01, 2002 9:06 PM
> > To: lucene-dev@jakarta.apache.org
> > Cc: matt@jivesoftware.com
> > Subject: RE: corrupted index
> >
> >
> > I changed the recipient from -user to -dev list, as that seems more
> > appropriate. I think this would not be a bad idea, if we do
> it right.
> > Things like IndexLockedException, etc. sound alright to me.
> > I think Doug once welcomed such a change on one of the lists, too.
> >
> > Perhaps a list of suggested exceptions, new exception classes and
> > appropriate patches would be the best contribution.
> >
> > Thanks,
> > Otis
> >
> > --- Matt Tucker <matt@jivesoftware.com> wrote:
> > > Hey all,
> > >
> > > Actually, using shutdown hooks might not be the best idea since
> > > Lucene is very often used in server-side Java environments. Many
> > > app-servers throw security
> > > errors when trying to add shutdown hooks, and I've seen Weblogic
> > > crash before
> > > when having them in a webapp. Has anyone else run into this?
> > >
> > > This all brings up a key issue with Lucene, which is that
> there is
> > > little way to recover from errors gracefully. I'd love to see a
> > > number of checked
> > > exceptions added. For example:
> > >
> > > IndexNotFoundException -- when trying to open an index
> that doesn't
> > > exist IndexLockedException -- when a lock file prevents you from
> > > getting an index
> > > IndexCorruptException -- maybe this would be thrown when an index
> > > appears to
> > > be broken?
> > >
> > > At the moment, Lucene throws many undocumented IOExceptions
> > and even
> > > NullPointerExceptions when an error case comes up. I
> catch these in
> > > my app, but there's really not an intelligent way to recover from
> > > them. Adding checked
> > > exceptions would be a change of the API, but it seems
> worth it. I'd
> > > be happy to
> > > make a more specific proposal if other people feel like
> > this would be
> > > a
> > > worthwhile direction to go in.
> > >
> > > Regards,
> > > Matt
> > >
> > > Quoting "Spencer, Dave" <dave@lumos.com>:
> > >
> > > > Runtime.addShutdownHook:
> > > >
> > > >
> > > >
> > > >
> > >
> > http://java.sun.com/j2se/1.3/docs/api/java/lang/Runtime.html#a
> > ddShutdown
> > > > Hook(java.lang.Thread)
> > > >
> > > > -----Original Message-----
> > > > From: Otis Gospodnetic [ mailto:otis_gospodnetic@yahoo.com]
> > > > Sent: Sunday, March 17, 2002 12:06 AM
> > > > To: Lucene Users List
> > > > Subject: Re: corrupted index
> > > >
> > > >
> > > > Oh, I just thought of something (wine does body good).
> Perhaps one
> > > > could use Runtime (the class) to catch the
> > JVM shutdown
> > > and
> > > > do whatever is needed to prevent index corruption. I
> > believe there
> > > are
> > > > some shutdown hook methods in there that may let you do
> that. I'm
> > > too
> > > > lazy to look up the API docs now, but I rememeber reading about
> > > that
> > > > once, and perhaps it was even mentioned on one of the 2 Lucene
> > > mailing
> > > > lists.
> > > >
> > > > On the other hand, it would be great to have a tool that
> > can verify
> > > an
> > > > existing index. I don't know enough about the actual file
> > > structure
> > > > yet to write something like that, but maybe somebody
> else has done
> > > that
> > > > already or would like to contribute.
> > > >
> > > > Otis
> > > >
> > > >
> > > > --- "Steven J. Owens" <puffmail@darksleep.com> wrote:
> > > > > Otis,
> > > > >
> > > > > > You can remove the .lock file and try re-indexing or
> > continuing
> > > > > > indexing where you left off.
> > > > > > I am not sure about the corrupt index. I have never seen it
> > > > > happen,
> > > > > > and I believe I recall reading some messages from
> Doug Cutting
> > > > > saying
> > > > > > that index should never be left in an inconsistent state.
> > > > >
> > > > > Obviously never "should" be, but if something's
> pulling the
> > > rug
> > > > > out from under his JRE, changes could be only
> partially written,
> > > > > right?
> > > > >
> > > > > Or is the writing format in some sense
> > transactionally safe?
> > > > > I've never worked directly on something like this,
> but I worked
> > > at a
> > > > > database software company where they used transaction
> semantics
> > > and a
> > > > > journaling scheme to fake a "bulletproof" file
> system. Is this
> > > how
> > > > > the index-writing code is implemented?
> > > > >
> > > > > In general, I can guess Doug's response - just
> > torch the old
> > > > > index directory and rebuild it; Lucene's indexing is
> fast enough
> > > that
> > > > > you don't need to get clever. This seems to be
> Doug's stance in
> > > > > general (i.e. "don't get fancy, I already put all the
> fanciness
> > > > > you'll need into extremely fast indexing and searching"). So
> > > > > far, it
> > > seems
> > > > > to work :-).
> > > > >
> > > > > > I could be making this up, though, so I suggest you search
> > > through
> > > > > > lucene-user and lucene-dev archives on
> www.mail-archive.com. A
> > > > > > search for "corrupt" should do it. Once you figure
> things out
> > > > > > maybe you can post a summary here.
> > > > >
> > > > > I got a little curious, so I went and did the searches.
> > > There
> > > > > is
> > > > > exactly one message in each list archive (dev and
> > users) with the
> > > > > keyword "corrupt" in it. The lucene-users instance is
> > > irrelevant:
> > > > >
> > > > >
> > > >
> > >
> > http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg
> 00557.html
> > > >
> > > > The lucene-dev instance is more useful:
> > > >
> > > >
> > >
> >
> http://www.mail-archive.com/lucene-dev@jakarta.apache.org/msg0
0157.html
> > >
> > > It's a post from Doug, dated sept 27, 2001, about adding not
> > > just thread-safety but process-safety:
> > >
> > > It should be impossible to corrupt an index through the Lucene
> API.
> > > However if a Lucene process exits unexpectedly it can leave the
> > > index
> > > locked. The remedy is simply to, at a time when it is certain
> that
> > > no
> > > processes are accessing the index, remove all lock files.
> > >
> > > So it sounds like it's worth trying just removing the lock
> > > files. Hm, is there a way to come up with a "sanity check" you can

> > > run
> on an
> > > index to make sure it's not corrupted? This might be an
> excellent
> > > thing to reassure yourself with: something went wrong? Run a
> sanity
> > > check, if it fails just reindex.
> > >
> > > Steven J. Owens
> > > puff@darksleep.com



__________________________________________________
Do You Yahoo!?
Yahoo! Tax Center - online filing with TurboTax
http://http://taxes.yahoo.com/

--
To unsubscribe, e-mail:
<mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail:
<mailto:lucene-dev-help@jakarta.apache.org>

--
To unsubscribe, e-mail:
<mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail:
<mailto:lucene-dev-help@jakarta.apache.org>


--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>