Mailing List Archive

CorruptIndexException after failed segment merge caused by No space left on device
Hello everyone,

Recently we had a failed segment merge caused by "No space left on device".
After restart, Lucene failed with the CorruptIndexException.
The expectation was that Lucene automatically recovers in such
case, because there was no succesul commit. Is it a correct assumption, or
I am missing something?
It would be great to know any recommendations to avoid such situations
in future and be able to recover automatically after restart.

Lucene version is 8.5.0

Failed merge stacktrace:

2021-02-02T08:51:51.679+0000

org.apache.lucene.index.MergePolicy$MergeException:
java.io.IOException: No space left on device

at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:704)

at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:684)

Caused by: java.io.IOException: No space left on device

at java.base/sun.nio.ch.FileDispatcherImpl.write0(Native Method)

at java.base/sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:62)

at java.base/sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:113)

at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:79)

at java.base/sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:280)

at java.base/java.nio.channels.Channels.writeFullyImpl(Channels.java:74)

at java.base/java.nio.channels.Channels.writeFully(Channels.java:97)

at java.base/java.nio.channels.Channels$1.write(Channels.java:172)

at org.apache.lucene.store.FSDirectory$FSIndexOutput$1.write(FSDirectory.java:416)

at java.base/java.util.zip.CheckedOutputStream.write(CheckedOutputStream.java:74)

at java.base/java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:81)

at java.base/java.io.BufferedOutputStream.write(BufferedOutputStream.java:127)

at org.apache.lucene.store.OutputStreamIndexOutput.writeBytes(OutputStreamIndexOutput.java:53)

at org.apache.lucene.store.RateLimitedIndexOutput.writeBytes(RateLimitedIndexOutput.java:73)

at org.apache.lucene.util.compress.LZ4.encodeLiterals(LZ4.java:159)

at org.apache.lucene.util.compress.LZ4.encodeSequence(LZ4.java:172)

at org.apache.lucene.util.compress.LZ4.compress(LZ4.java:441)

at org.apache.lucene.codecs.compressing.CompressionMode$LZ4FastCompressor.compress(CompressionMode.java:165)

at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.flush(CompressingStoredFieldsWriter.java:229)

at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.finishDocument(CompressingStoredFieldsWriter.java:159)

at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.merge(CompressingStoredFieldsWriter.java:636)

at org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:229)

at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:106)

at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4463)

at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4057)

at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:625)

at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:662)


Followed by failed startup:

2021-02-02T08:52:07.926+0000

org.apache.lucene.index.CorruptIndexException: Unexpected file read
error while reading index.
(resource=BufferedChecksumIndexInput(NIOFSIndexInput(path="/data/5f91aa0b07ce4d5e7beffaa2/segments_578fu")))

at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:291)

at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:846)

Caused by: java.nio.file.NoSuchFileException:
/data/5f91aa0b07ce4d5e7beffaa2/_6lfem.si

at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92)

at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)

at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116)

at java.base/sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:182)

at java.base/java.nio.channels.FileChannel.open(FileChannel.java:292)

at java.base/java.nio.channels.FileChannel.open(FileChannel.java:345)

at org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:81)

at org.apache.lucene.store.Directory.openChecksumInput(Directory.java:157)

at org.apache.lucene.codecs.lucene70.Lucene70SegmentInfoFormat.read(Lucene70SegmentInfoFormat.java:91)

at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:353)

at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:289)

... 33 common frames omitted


Thank you!

--
Regards,
Alexander L
Re: CorruptIndexException after failed segment merge caused by No space left on device [ In reply to ]
On Wed, Mar 24, 2021 at 1:41 AM Alexander Lukyanchikov <
alexanderlukyanchikov@gmail.com> wrote:

> Hello everyone,
>
> Recently we had a failed segment merge caused by "No space left on device".
> After restart, Lucene failed with the CorruptIndexException.
> The expectation was that Lucene automatically recovers in such
> case, because there was no succesul commit. Is it a correct assumption, or
> I am missing something?
> It would be great to know any recommendations to avoid such situations
> in future and be able to recover automatically after restart.
>

I don't think you are missing something. It should not happen.

Can you please open a issue: https://issues.apache.org/jira/projects/LUCENE

If you don't mind, please supply all relevant info you are able to provide
on the issue: OS, filesystem, JDK version, any hints as to how you are
using lucene (e.g. when you are committing / how you are indexing). There
are a lot of tests in lucene's codebase designed to simulate the disk full
condition and guarantee that stuff like this never happens, but maybe some
case is missing, or some other unknown bug causing the missing files.

Thanks
Re: CorruptIndexException after failed segment merge caused by No space left on device [ In reply to ]
+1, this sounds like a bad bug in Lucene! We try hard to test for and
prevent such bugs!

As long as you succeeded in at least one commit since creating the
index before you hit the disk full, restarting Lucene on the index should
have recovered from that last successful commit.

How often do you commit? Did you have a successful commit before the disk
full event?

Please open an issue and put all possible comments detailing your context,
thanks,

Mike McCandless

http://blog.mikemccandless.com


On Wed, Mar 24, 2021 at 12:55 PM Robert Muir <rcmuir@gmail.com> wrote:

> On Wed, Mar 24, 2021 at 1:41 AM Alexander Lukyanchikov <
> alexanderlukyanchikov@gmail.com> wrote:
>
> > Hello everyone,
> >
> > Recently we had a failed segment merge caused by "No space left on
> device".
> > After restart, Lucene failed with the CorruptIndexException.
> > The expectation was that Lucene automatically recovers in such
> > case, because there was no succesul commit. Is it a correct assumption,
> or
> > I am missing something?
> > It would be great to know any recommendations to avoid such situations
> > in future and be able to recover automatically after restart.
> >
>
> I don't think you are missing something. It should not happen.
>
> Can you please open a issue:
> https://issues.apache.org/jira/projects/LUCENE
>
> If you don't mind, please supply all relevant info you are able to provide
> on the issue: OS, filesystem, JDK version, any hints as to how you are
> using lucene (e.g. when you are committing / how you are indexing). There
> are a lot of tests in lucene's codebase designed to simulate the disk full
> condition and guarantee that stuff like this never happens, but maybe some
> case is missing, or some other unknown bug causing the missing files.
>
> Thanks
>
Re: CorruptIndexException after failed segment merge caused by No space left on device [ In reply to ]
Thank you very much for the response!
I've created a bug and added all relevant details there:
https://issues.apache.org/jira/browse/LUCENE-9867
Please let me know if you have any questions, or if any other information
would be helpful.

--
Regards,
Alexander L


On Wed, Mar 24, 2021 at 10:09 AM Michael McCandless <
lucene@mikemccandless.com> wrote:

> +1, this sounds like a bad bug in Lucene! We try hard to test for and
> prevent such bugs!
>
> As long as you succeeded in at least one commit since creating the
> index before you hit the disk full, restarting Lucene on the index should
> have recovered from that last successful commit.
>
> How often do you commit? Did you have a successful commit before the disk
> full event?
>
> Please open an issue and put all possible comments detailing your context,
> thanks,
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Wed, Mar 24, 2021 at 12:55 PM Robert Muir <rcmuir@gmail.com> wrote:
>
> > On Wed, Mar 24, 2021 at 1:41 AM Alexander Lukyanchikov <
> > alexanderlukyanchikov@gmail.com> wrote:
> >
> > > Hello everyone,
> > >
> > > Recently we had a failed segment merge caused by "No space left on
> > device".
> > > After restart, Lucene failed with the CorruptIndexException.
> > > The expectation was that Lucene automatically recovers in such
> > > case, because there was no succesul commit. Is it a correct assumption,
> > or
> > > I am missing something?
> > > It would be great to know any recommendations to avoid such situations
> > > in future and be able to recover automatically after restart.
> > >
> >
> > I don't think you are missing something. It should not happen.
> >
> > Can you please open a issue:
> > https://issues.apache.org/jira/projects/LUCENE
> >
> > If you don't mind, please supply all relevant info you are able to
> provide
> > on the issue: OS, filesystem, JDK version, any hints as to how you are
> > using lucene (e.g. when you are committing / how you are indexing). There
> > are a lot of tests in lucene's codebase designed to simulate the disk
> full
> > condition and guarantee that stuff like this never happens, but maybe
> some
> > case is missing, or some other unknown bug causing the missing files.
> >
> > Thanks
> >
>