Mailing List Archive

Index corruption and repair
Hello,

We are facing a strange situation in our application as described below:

*Using*:

- Python 3.8.10
- Pylucene 6.5.0
- Java 8 (1.8.0_181)
- Runs on Linux and Windows (error seen on Windows)

We suddenly get the following *error*:

2022-02-10 09:58:09.253215: ERROR : writer | Failed to get index
(D:\i\202202) writer, Exception:
org.apache.lucene.index.CorruptIndexException: Unexpected file read error
while reading index.
(resource=BufferedChecksumIndexInput(MMapIndexInput(path="D:\i\202202\segments_fo")))


After this, no further indexing happens - trying to open the index for
writing throws the above error - and the index writer does not open.

FYI, our code contains the following *settings*:

index_path = "D:\i\202202"
index_directory = FSDirectory.open(Paths.get(index_path))
iconfig = IndexWriterConfig(wrapper_analyzer)
iconfig.setOpenMode(IndexWriterConfig.OpenMode.CREATE_OR_APPEND)
iconfig.setRAMBufferSizeMB(16.0)
writer = IndexWriter(index_directory, iconfig)


*Repairing*
We tried 'repairing' the index with the following command / tool:

java -cp lucene-core-6.5.0.jar:lucene-backward-codecs-6.5.0.jar
org.apache.lucene.index.CheckIndex "D:\i\202202" -exorcise

This however returns saying "No problems found with the index."


*Work around*
We have to manually delete the problematic segment file:
D:\i\202202\segments_fo
after which the application starts again... until the next corruption. We
can't spot a specific pattern.


*Two questions:*

1. Can we handle this situation programmatically, so that no manual
intervention is needed?
2. Any reason why we are facing the corruption issue in the first place?


Before this we were using Pylucene 4.10 and we didn't face this problem -
the application logic is the same.

Also, while the application runs on both Linux and Windows, so far we have
observed this situation only on various Windows platforms.

Would really appreciate some assistance. Thanks in advance.

Regards,
Antony
Re: Index corruption and repair [ In reply to ]
Hi Anthony,

This isn't something that you should try to fix programmatically,
corruptions indicate that something is wrong with the environment,
like a broken disk or corrupt RAM. I would suggest running a memtest
to check your RAM and looking at system logs in case they have
anything to tell about your disks.

Can you also share the full stack trace of the exception?

On Thu, Apr 28, 2022 at 10:26 AM Antony Joseph
<antony.dev.webmail@gmail.com> wrote:
>
> Hello,
>
> We are facing a strange situation in our application as described below:
>
> *Using*:
>
> - Python 3.8.10
> - Pylucene 6.5.0
> - Java 8 (1.8.0_181)
> - Runs on Linux and Windows (error seen on Windows)
>
> We suddenly get the following *error*:
>
> 2022-02-10 09:58:09.253215: ERROR : writer | Failed to get index
> (D:\i\202202) writer, Exception:
> org.apache.lucene.index.CorruptIndexException: Unexpected file read error
> while reading index.
> (resource=BufferedChecksumIndexInput(MMapIndexInput(path="D:\i\202202\segments_fo")))
>
>
> After this, no further indexing happens - trying to open the index for
> writing throws the above error - and the index writer does not open.
>
> FYI, our code contains the following *settings*:
>
> index_path = "D:\i\202202"
> index_directory = FSDirectory.open(Paths.get(index_path))
> iconfig = IndexWriterConfig(wrapper_analyzer)
> iconfig.setOpenMode(IndexWriterConfig.OpenMode.CREATE_OR_APPEND)
> iconfig.setRAMBufferSizeMB(16.0)
> writer = IndexWriter(index_directory, iconfig)
>
>
> *Repairing*
> We tried 'repairing' the index with the following command / tool:
>
> java -cp lucene-core-6.5.0.jar:lucene-backward-codecs-6.5.0.jar
> org.apache.lucene.index.CheckIndex "D:\i\202202" -exorcise
>
> This however returns saying "No problems found with the index."
>
>
> *Work around*
> We have to manually delete the problematic segment file:
> D:\i\202202\segments_fo
> after which the application starts again... until the next corruption. We
> can't spot a specific pattern.
>
>
> *Two questions:*
>
> 1. Can we handle this situation programmatically, so that no manual
> intervention is needed?
> 2. Any reason why we are facing the corruption issue in the first place?
>
>
> Before this we were using Pylucene 4.10 and we didn't face this problem -
> the application logic is the same.
>
> Also, while the application runs on both Linux and Windows, so far we have
> observed this situation only on various Windows platforms.
>
> Would really appreciate some assistance. Thanks in advance.
>
> Regards,
> Antony



--
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Index corruption and repair [ In reply to ]
Thank you for your reply.

This isn't happening in a single environment. Our application is being used
by various clients and this has been reported by multiple users - all of
whom were running the earlier pylucene (v4.10) - without issues.

One thing to mention is that our earlier version used Python 2.7.15 (with
pylucene 4.10) and now we are using Python 3.8.10 with Pylucene 6.5.0 - the
indexing logic is the same...

One other thing to note is that the issue described has (so far!) only
occurred on MS Windows - none of our Linux customers have complained about
this.

Any ideas?

Regards,
Antony

On Thu, 28 Apr 2022 at 17:00, Adrien Grand <jpountz@gmail.com> wrote:

> Hi Anthony,
>
> This isn't something that you should try to fix programmatically,
> corruptions indicate that something is wrong with the environment,
> like a broken disk or corrupt RAM. I would suggest running a memtest
> to check your RAM and looking at system logs in case they have
> anything to tell about your disks.
>
> Can you also share the full stack trace of the exception?
>
> On Thu, Apr 28, 2022 at 10:26 AM Antony Joseph
> <antony.dev.webmail@gmail.com> wrote:
> >
> > Hello,
> >
> > We are facing a strange situation in our application as described below:
> >
> > *Using*:
> >
> > - Python 3.8.10
> > - Pylucene 6.5.0
> > - Java 8 (1.8.0_181)
> > - Runs on Linux and Windows (error seen on Windows)
> >
> > We suddenly get the following *error*:
> >
> > 2022-02-10 09:58:09.253215: ERROR : writer | Failed to get index
> > (D:\i\202202) writer, Exception:
> > org.apache.lucene.index.CorruptIndexException: Unexpected file read error
> > while reading index.
> >
> (resource=BufferedChecksumIndexInput(MMapIndexInput(path="D:\i\202202\segments_fo")))
> >
> >
> > After this, no further indexing happens - trying to open the index for
> > writing throws the above error - and the index writer does not open.
> >
> > FYI, our code contains the following *settings*:
> >
> > index_path = "D:\i\202202"
> > index_directory = FSDirectory.open(Paths.get(index_path))
> > iconfig = IndexWriterConfig(wrapper_analyzer)
> > iconfig.setOpenMode(IndexWriterConfig.OpenMode.CREATE_OR_APPEND)
> > iconfig.setRAMBufferSizeMB(16.0)
> > writer = IndexWriter(index_directory, iconfig)
> >
> >
> > *Repairing*
> > We tried 'repairing' the index with the following command / tool:
> >
> > java -cp lucene-core-6.5.0.jar:lucene-backward-codecs-6.5.0.jar
> > org.apache.lucene.index.CheckIndex "D:\i\202202" -exorcise
> >
> > This however returns saying "No problems found with the index."
> >
> >
> > *Work around*
> > We have to manually delete the problematic segment file:
> > D:\i\202202\segments_fo
> > after which the application starts again... until the next corruption. We
> > can't spot a specific pattern.
> >
> >
> > *Two questions:*
> >
> > 1. Can we handle this situation programmatically, so that no manual
> > intervention is needed?
> > 2. Any reason why we are facing the corruption issue in the first
> place?
> >
> >
> > Before this we were using Pylucene 4.10 and we didn't face this problem -
> > the application logic is the same.
> >
> > Also, while the application runs on both Linux and Windows, so far we
> have
> > observed this situation only on various Windows platforms.
> >
> > Would really appreciate some assistance. Thanks in advance.
> >
> > Regards,
> > Antony
>
>
>
> --
> Adrien
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
Re: Index corruption and repair [ In reply to ]
The most helpful thing would be the full stacktrace of the exception.
This exception should be chaining the original exception and call
site, and maybe tell us more about this error you hit.

To me, it looks like a windows-specific issue where the filesystem is
returning an unexpected error. So it would be helpful to see exactly
which one that is, and the full trace of where it comes from, to chase
it further

On Thu, Apr 28, 2022 at 12:10 PM Antony Joseph
<antony.dev.webmail@gmail.com> wrote:
>
> Thank you for your reply.
>
> This isn't happening in a single environment. Our application is being used
> by various clients and this has been reported by multiple users - all of
> whom were running the earlier pylucene (v4.10) - without issues.
>
> One thing to mention is that our earlier version used Python 2.7.15 (with
> pylucene 4.10) and now we are using Python 3.8.10 with Pylucene 6.5.0 - the
> indexing logic is the same...
>
> One other thing to note is that the issue described has (so far!) only
> occurred on MS Windows - none of our Linux customers have complained about
> this.
>
> Any ideas?
>
> Regards,
> Antony
>
> On Thu, 28 Apr 2022 at 17:00, Adrien Grand <jpountz@gmail.com> wrote:
>
> > Hi Anthony,
> >
> > This isn't something that you should try to fix programmatically,
> > corruptions indicate that something is wrong with the environment,
> > like a broken disk or corrupt RAM. I would suggest running a memtest
> > to check your RAM and looking at system logs in case they have
> > anything to tell about your disks.
> >
> > Can you also share the full stack trace of the exception?
> >
> > On Thu, Apr 28, 2022 at 10:26 AM Antony Joseph
> > <antony.dev.webmail@gmail.com> wrote:
> > >
> > > Hello,
> > >
> > > We are facing a strange situation in our application as described below:
> > >
> > > *Using*:
> > >
> > > - Python 3.8.10
> > > - Pylucene 6.5.0
> > > - Java 8 (1.8.0_181)
> > > - Runs on Linux and Windows (error seen on Windows)
> > >
> > > We suddenly get the following *error*:
> > >
> > > 2022-02-10 09:58:09.253215: ERROR : writer | Failed to get index
> > > (D:\i\202202) writer, Exception:
> > > org.apache.lucene.index.CorruptIndexException: Unexpected file read error
> > > while reading index.
> > >
> > (resource=BufferedChecksumIndexInput(MMapIndexInput(path="D:\i\202202\segments_fo")))
> > >
> > >
> > > After this, no further indexing happens - trying to open the index for
> > > writing throws the above error - and the index writer does not open.
> > >
> > > FYI, our code contains the following *settings*:
> > >
> > > index_path = "D:\i\202202"
> > > index_directory = FSDirectory.open(Paths.get(index_path))
> > > iconfig = IndexWriterConfig(wrapper_analyzer)
> > > iconfig.setOpenMode(IndexWriterConfig.OpenMode.CREATE_OR_APPEND)
> > > iconfig.setRAMBufferSizeMB(16.0)
> > > writer = IndexWriter(index_directory, iconfig)
> > >
> > >
> > > *Repairing*
> > > We tried 'repairing' the index with the following command / tool:
> > >
> > > java -cp lucene-core-6.5.0.jar:lucene-backward-codecs-6.5.0.jar
> > > org.apache.lucene.index.CheckIndex "D:\i\202202" -exorcise
> > >
> > > This however returns saying "No problems found with the index."
> > >
> > >
> > > *Work around*
> > > We have to manually delete the problematic segment file:
> > > D:\i\202202\segments_fo
> > > after which the application starts again... until the next corruption. We
> > > can't spot a specific pattern.
> > >
> > >
> > > *Two questions:*
> > >
> > > 1. Can we handle this situation programmatically, so that no manual
> > > intervention is needed?
> > > 2. Any reason why we are facing the corruption issue in the first
> > place?
> > >
> > >
> > > Before this we were using Pylucene 4.10 and we didn't face this problem -
> > > the application logic is the same.
> > >
> > > Also, while the application runs on both Linux and Windows, so far we
> > have
> > > observed this situation only on various Windows platforms.
> > >
> > > Would really appreciate some assistance. Thanks in advance.
> > >
> > > Regards,
> > > Antony
> >
> >
> >
> > --
> > Adrien
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Index corruption and repair [ In reply to ]
Thank you for your reply.

*The full stack trace is included:*

<super: <class 'JavaError'>, <JavaError object>>
Java stacktrace:
org.apache.lucene.index.CorruptIndexException: Unexpected file read error
while
reading index.
(resource=BufferedChecksumIndexInput(MMapIndexInput(path="D:\i\202204\segments_10fj")))
at
org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:290)
at
org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:165)
at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:972)
Caused by: java.nio.file.NoSuchFileException: D:\i\202204\_14gb.si
at sun.nio.fs.WindowsException.translateToIOException(Unknown
Source)
at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source)
at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source)
at sun.nio.fs.WindowsFileSystemProvider.newFileChannel(Unknown
Source)
at java.nio.channels.FileChannel.open(Unknown Source)
at java.nio.channels.FileChannel.open(Unknown Source)
at
org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:238)
at
org.apache.lucene.store.Directory.openChecksumInput(Directory.java:137)
at
org.apache.lucene.codecs.lucene62.Lucene62SegmentInfoFormat.read(Lucene62SegmentInfoFormat.java:89)
at
org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:357)
at
org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:288)
... 2 more

Traceback (most recent call last):
File "index.py", line 112, in start
writer = IndexWriter(index_directory, iconfig)
lucene.JavaError: <super: <class 'JavaError'>, <JavaError object>>
Java stacktrace:
org.apache.lucene.index.CorruptIndexException: Unexpected file read error
while
reading index.
(resource=BufferedChecksumIndexInput(MMapIndexInput(path="D:\i\202204\segments_10fj")))
at
org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:290)
at
org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:165)
at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:972)
Caused by: java.nio.file.NoSuchFileException: D:\i\202204\_14gb.si
at sun.nio.fs.WindowsException.translateToIOException(Unknown
Source)
at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source)
at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source)
at sun.nio.fs.WindowsFileSystemProvider.newFileChannel(Unknown
Source)
at java.nio.channels.FileChannel.open(Unknown Source)
at java.nio.channels.FileChannel.open(Unknown Source)
at
org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:238)
at
org.apache.lucene.store.Directory.openChecksumInput(Directory.java:137)
at
org.apache.lucene.codecs.lucene62.Lucene62SegmentInfoFormat.read(Lucene62SegmentInfoFormat.java:89)
at
org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:357)
at
org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:288)
... 2 more


Regards,
Antony

On Sat, 30 Apr 2022 at 10:59, Robert Muir <rcmuir@gmail.com> wrote:

> The most helpful thing would be the full stacktrace of the exception.
> This exception should be chaining the original exception and call
> site, and maybe tell us more about this error you hit.
>
> To me, it looks like a windows-specific issue where the filesystem is
> returning an unexpected error. So it would be helpful to see exactly
> which one that is, and the full trace of where it comes from, to chase
> it further
>
> On Thu, Apr 28, 2022 at 12:10 PM Antony Joseph
> <antony.dev.webmail@gmail.com> wrote:
> >
> > Thank you for your reply.
> >
> > This isn't happening in a single environment. Our application is being
> used
> > by various clients and this has been reported by multiple users - all of
> > whom were running the earlier pylucene (v4.10) - without issues.
> >
> > One thing to mention is that our earlier version used Python 2.7.15 (with
> > pylucene 4.10) and now we are using Python 3.8.10 with Pylucene 6.5.0 -
> the
> > indexing logic is the same...
> >
> > One other thing to note is that the issue described has (so far!) only
> > occurred on MS Windows - none of our Linux customers have complained
> about
> > this.
> >
> > Any ideas?
> >
> > Regards,
> > Antony
> >
> > On Thu, 28 Apr 2022 at 17:00, Adrien Grand <jpountz@gmail.com> wrote:
> >
> > > Hi Anthony,
> > >
> > > This isn't something that you should try to fix programmatically,
> > > corruptions indicate that something is wrong with the environment,
> > > like a broken disk or corrupt RAM. I would suggest running a memtest
> > > to check your RAM and looking at system logs in case they have
> > > anything to tell about your disks.
> > >
> > > Can you also share the full stack trace of the exception?
> > >
> > > On Thu, Apr 28, 2022 at 10:26 AM Antony Joseph
> > > <antony.dev.webmail@gmail.com> wrote:
> > > >
> > > > Hello,
> > > >
> > > > We are facing a strange situation in our application as described
> below:
> > > >
> > > > *Using*:
> > > >
> > > > - Python 3.8.10
> > > > - Pylucene 6.5.0
> > > > - Java 8 (1.8.0_181)
> > > > - Runs on Linux and Windows (error seen on Windows)
> > > >
> > > > We suddenly get the following *error*:
> > > >
> > > > 2022-02-10 09:58:09.253215: ERROR : writer | Failed to get index
> > > > (D:\i\202202) writer, Exception:
> > > > org.apache.lucene.index.CorruptIndexException: Unexpected file read
> error
> > > > while reading index.
> > > >
> > >
> (resource=BufferedChecksumIndexInput(MMapIndexInput(path="D:\i\202202\segments_fo")))
> > > >
> > > >
> > > > After this, no further indexing happens - trying to open the index
> for
> > > > writing throws the above error - and the index writer does not open.
> > > >
> > > > FYI, our code contains the following *settings*:
> > > >
> > > > index_path = "D:\i\202202"
> > > > index_directory = FSDirectory.open(Paths.get(index_path))
> > > > iconfig = IndexWriterConfig(wrapper_analyzer)
> > > > iconfig.setOpenMode(IndexWriterConfig.OpenMode.CREATE_OR_APPEND)
> > > > iconfig.setRAMBufferSizeMB(16.0)
> > > > writer = IndexWriter(index_directory, iconfig)
> > > >
> > > >
> > > > *Repairing*
> > > > We tried 'repairing' the index with the following command / tool:
> > > >
> > > > java -cp lucene-core-6.5.0.jar:lucene-backward-codecs-6.5.0.jar
> > > > org.apache.lucene.index.CheckIndex "D:\i\202202" -exorcise
> > > >
> > > > This however returns saying "No problems found with the index."
> > > >
> > > >
> > > > *Work around*
> > > > We have to manually delete the problematic segment file:
> > > > D:\i\202202\segments_fo
> > > > after which the application starts again... until the next
> corruption. We
> > > > can't spot a specific pattern.
> > > >
> > > >
> > > > *Two questions:*
> > > >
> > > > 1. Can we handle this situation programmatically, so that no
> manual
> > > > intervention is needed?
> > > > 2. Any reason why we are facing the corruption issue in the first
> > > place?
> > > >
> > > >
> > > > Before this we were using Pylucene 4.10 and we didn't face this
> problem -
> > > > the application logic is the same.
> > > >
> > > > Also, while the application runs on both Linux and Windows, so far we
> > > have
> > > > observed this situation only on various Windows platforms.
> > > >
> > > > Would really appreciate some assistance. Thanks in advance.
> > > >
> > > > Regards,
> > > > Antony
> > >
> > >
> > >
> > > --
> > > Adrien
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
Re: Index corruption and repair [ In reply to ]
Hi Antony,

Hmm it looks like the root cause is this:

Caused by: java.nio.file.NoSuchFileException: D:\i\202204\_14gb.si

Can you list all the files in the index directory at the time this
exception happens, and reply here? We need to figure out whether the file
is really missing or what.

Do you run any virus scanner / disk file tree utilities / etc.? In the
distant past sometimes such programs might cause strange transient errors
if they open a file for read exclusively or so, on windows.

What is the actual drive you are storing the index on (D:)? Is it a local
disk or remote SMBFS mount?

Mike McCandless

http://blog.mikemccandless.com


On Sat, Apr 30, 2022 at 8:39 AM Antony Joseph <antony.dev.webmail@gmail.com>
wrote:

> Thank you for your reply.
>
> *The full stack trace is included:*
>
> <super: <class 'JavaError'>, <JavaError object>>
> Java stacktrace:
> org.apache.lucene.index.CorruptIndexException: Unexpected file read error
> while
> reading index.
>
> (resource=BufferedChecksumIndexInput(MMapIndexInput(path="D:\i\202204\segments_10fj")))
> at
> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:290)
> at
> org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:165)
> at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:972)
> Caused by: java.nio.file.NoSuchFileException: D:\i\202204\_14gb.si
> at sun.nio.fs.WindowsException.translateToIOException(Unknown
> Source)
> at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source)
> at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source)
> at sun.nio.fs.WindowsFileSystemProvider.newFileChannel(Unknown
> Source)
> at java.nio.channels.FileChannel.open(Unknown Source)
> at java.nio.channels.FileChannel.open(Unknown Source)
> at
> org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:238)
> at
> org.apache.lucene.store.Directory.openChecksumInput(Directory.java:137)
> at
>
> org.apache.lucene.codecs.lucene62.Lucene62SegmentInfoFormat.read(Lucene62SegmentInfoFormat.java:89)
> at
> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:357)
> at
> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:288)
> ... 2 more
>
> Traceback (most recent call last):
> File "index.py", line 112, in start
> writer = IndexWriter(index_directory, iconfig)
> lucene.JavaError: <super: <class 'JavaError'>, <JavaError object>>
> Java stacktrace:
> org.apache.lucene.index.CorruptIndexException: Unexpected file read error
> while
> reading index.
>
> (resource=BufferedChecksumIndexInput(MMapIndexInput(path="D:\i\202204\segments_10fj")))
> at
> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:290)
> at
> org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:165)
> at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:972)
> Caused by: java.nio.file.NoSuchFileException: D:\i\202204\_14gb.si
> at sun.nio.fs.WindowsException.translateToIOException(Unknown
> Source)
> at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source)
> at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source)
> at sun.nio.fs.WindowsFileSystemProvider.newFileChannel(Unknown
> Source)
> at java.nio.channels.FileChannel.open(Unknown Source)
> at java.nio.channels.FileChannel.open(Unknown Source)
> at
> org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:238)
> at
> org.apache.lucene.store.Directory.openChecksumInput(Directory.java:137)
> at
>
> org.apache.lucene.codecs.lucene62.Lucene62SegmentInfoFormat.read(Lucene62SegmentInfoFormat.java:89)
> at
> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:357)
> at
> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:288)
> ... 2 more
>
>
> Regards,
> Antony
>
> On Sat, 30 Apr 2022 at 10:59, Robert Muir <rcmuir@gmail.com> wrote:
>
> > The most helpful thing would be the full stacktrace of the exception.
> > This exception should be chaining the original exception and call
> > site, and maybe tell us more about this error you hit.
> >
> > To me, it looks like a windows-specific issue where the filesystem is
> > returning an unexpected error. So it would be helpful to see exactly
> > which one that is, and the full trace of where it comes from, to chase
> > it further
> >
> > On Thu, Apr 28, 2022 at 12:10 PM Antony Joseph
> > <antony.dev.webmail@gmail.com> wrote:
> > >
> > > Thank you for your reply.
> > >
> > > This isn't happening in a single environment. Our application is being
> > used
> > > by various clients and this has been reported by multiple users - all
> of
> > > whom were running the earlier pylucene (v4.10) - without issues.
> > >
> > > One thing to mention is that our earlier version used Python 2.7.15
> (with
> > > pylucene 4.10) and now we are using Python 3.8.10 with Pylucene 6.5.0 -
> > the
> > > indexing logic is the same...
> > >
> > > One other thing to note is that the issue described has (so far!) only
> > > occurred on MS Windows - none of our Linux customers have complained
> > about
> > > this.
> > >
> > > Any ideas?
> > >
> > > Regards,
> > > Antony
> > >
> > > On Thu, 28 Apr 2022 at 17:00, Adrien Grand <jpountz@gmail.com> wrote:
> > >
> > > > Hi Anthony,
> > > >
> > > > This isn't something that you should try to fix programmatically,
> > > > corruptions indicate that something is wrong with the environment,
> > > > like a broken disk or corrupt RAM. I would suggest running a memtest
> > > > to check your RAM and looking at system logs in case they have
> > > > anything to tell about your disks.
> > > >
> > > > Can you also share the full stack trace of the exception?
> > > >
> > > > On Thu, Apr 28, 2022 at 10:26 AM Antony Joseph
> > > > <antony.dev.webmail@gmail.com> wrote:
> > > > >
> > > > > Hello,
> > > > >
> > > > > We are facing a strange situation in our application as described
> > below:
> > > > >
> > > > > *Using*:
> > > > >
> > > > > - Python 3.8.10
> > > > > - Pylucene 6.5.0
> > > > > - Java 8 (1.8.0_181)
> > > > > - Runs on Linux and Windows (error seen on Windows)
> > > > >
> > > > > We suddenly get the following *error*:
> > > > >
> > > > > 2022-02-10 09:58:09.253215: ERROR : writer | Failed to get index
> > > > > (D:\i\202202) writer, Exception:
> > > > > org.apache.lucene.index.CorruptIndexException: Unexpected file read
> > error
> > > > > while reading index.
> > > > >
> > > >
> >
> (resource=BufferedChecksumIndexInput(MMapIndexInput(path="D:\i\202202\segments_fo")))
> > > > >
> > > > >
> > > > > After this, no further indexing happens - trying to open the index
> > for
> > > > > writing throws the above error - and the index writer does not
> open.
> > > > >
> > > > > FYI, our code contains the following *settings*:
> > > > >
> > > > > index_path = "D:\i\202202"
> > > > > index_directory = FSDirectory.open(Paths.get(index_path))
> > > > > iconfig = IndexWriterConfig(wrapper_analyzer)
> > > > > iconfig.setOpenMode(IndexWriterConfig.OpenMode.CREATE_OR_APPEND)
> > > > > iconfig.setRAMBufferSizeMB(16.0)
> > > > > writer = IndexWriter(index_directory, iconfig)
> > > > >
> > > > >
> > > > > *Repairing*
> > > > > We tried 'repairing' the index with the following command / tool:
> > > > >
> > > > > java -cp lucene-core-6.5.0.jar:lucene-backward-codecs-6.5.0.jar
> > > > > org.apache.lucene.index.CheckIndex "D:\i\202202" -exorcise
> > > > >
> > > > > This however returns saying "No problems found with the index."
> > > > >
> > > > >
> > > > > *Work around*
> > > > > We have to manually delete the problematic segment file:
> > > > > D:\i\202202\segments_fo
> > > > > after which the application starts again... until the next
> > corruption. We
> > > > > can't spot a specific pattern.
> > > > >
> > > > >
> > > > > *Two questions:*
> > > > >
> > > > > 1. Can we handle this situation programmatically, so that no
> > manual
> > > > > intervention is needed?
> > > > > 2. Any reason why we are facing the corruption issue in the
> first
> > > > place?
> > > > >
> > > > >
> > > > > Before this we were using Pylucene 4.10 and we didn't face this
> > problem -
> > > > > the application logic is the same.
> > > > >
> > > > > Also, while the application runs on both Linux and Windows, so far
> we
> > > > have
> > > > > observed this situation only on various Windows platforms.
> > > > >
> > > > > Would really appreciate some assistance. Thanks in advance.
> > > > >
> > > > > Regards,
> > > > > Antony
> > > >
> > > >
> > > >
> > > > --
> > > > Adrien
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > > >
> > > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>
Re: Index corruption and repair [ In reply to ]
Hi Michael,

Thank you for your reply. Please find responses to your questions below.

Regards,
Antony

On Sat, 30 Apr 2022 at 18:59, Michael McCandless <lucene@mikemccandless.com>
wrote:

> Hi Antony,
>
> Hmm it looks like the root cause is this:
>
> Caused by: java.nio.file.NoSuchFileException: D:\i\202204\_14gb.si
>
> Can you list all the files in the index directory at the time this
> exception happens, and reply here? We need to figure out whether the file
> is really missing or what.
>
Below the index directory file listing. Yes, file is missing (D:\i\202204\_
14gb.si)

>
> Do you run any virus scanner / disk file tree utilities / etc.? In the
> distant past sometimes such programs might cause strange transient errors
> if they open a file for read exclusively or so, on windows.
>
There is no virus scanner running.

>
> What is the actual drive you are storing the index on (D:)? Is it a local
> disk or remote SMBFS mount?
>
It's a local disk (D:).

>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Sat, Apr 30, 2022 at 8:39 AM Antony Joseph <
> antony.dev.webmail@gmail.com> wrote:
>
>> Thank you for your reply.
>>
>> *The full stack trace is included:*
>>
>> <super: <class 'JavaError'>, <JavaError object>>
>> Java stacktrace:
>> org.apache.lucene.index.CorruptIndexException: Unexpected file read error
>> while
>> reading index.
>>
>> (resource=BufferedChecksumIndexInput(MMapIndexInput(path="D:\i\202204\segments_10fj")))
>> at
>> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:290)
>> at
>> org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:165)
>> at
>> org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:972)
>> Caused by: java.nio.file.NoSuchFileException: D:\i\202204\_14gb.si
>> at sun.nio.fs.WindowsException.translateToIOException(Unknown
>> Source)
>> at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown
>> Source)
>> at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown
>> Source)
>> at sun.nio.fs.WindowsFileSystemProvider.newFileChannel(Unknown
>> Source)
>> at java.nio.channels.FileChannel.open(Unknown Source)
>> at java.nio.channels.FileChannel.open(Unknown Source)
>> at
>> org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:238)
>> at
>> org.apache.lucene.store.Directory.openChecksumInput(Directory.java:137)
>> at
>>
>> org.apache.lucene.codecs.lucene62.Lucene62SegmentInfoFormat.read(Lucene62SegmentInfoFormat.java:89)
>> at
>> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:357)
>> at
>> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:288)
>> ... 2 more
>>
>> Traceback (most recent call last):
>> File "index.py", line 112, in start
>> writer = IndexWriter(index_directory, iconfig)
>> lucene.JavaError: <super: <class 'JavaError'>, <JavaError object>>
>> Java stacktrace:
>> org.apache.lucene.index.CorruptIndexException: Unexpected file read error
>> while
>> reading index.
>>
>> (resource=BufferedChecksumIndexInput(MMapIndexInput(path="D:\i\202204\segments_10fj")))
>> at
>> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:290)
>> at
>> org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:165)
>> at
>> org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:972)
>> Caused by: java.nio.file.NoSuchFileException: D:\i\202204\_14gb.si
>> at sun.nio.fs.WindowsException.translateToIOException(Unknown
>> Source)
>> at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown
>> Source)
>> at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown
>> Source)
>> at sun.nio.fs.WindowsFileSystemProvider.newFileChannel(Unknown
>> Source)
>> at java.nio.channels.FileChannel.open(Unknown Source)
>> at java.nio.channels.FileChannel.open(Unknown Source)
>> at
>> org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:238)
>> at
>> org.apache.lucene.store.Directory.openChecksumInput(Directory.java:137)
>> at
>>
>> org.apache.lucene.codecs.lucene62.Lucene62SegmentInfoFormat.read(Lucene62SegmentInfoFormat.java:89)
>> at
>> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:357)
>> at
>> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:288)
>> ... 2 more
>>
>>
>> Regards,
>> Antony
>>
>> On Sat, 30 Apr 2022 at 10:59, Robert Muir <rcmuir@gmail.com> wrote:
>>
>> > The most helpful thing would be the full stacktrace of the exception.
>> > This exception should be chaining the original exception and call
>> > site, and maybe tell us more about this error you hit.
>> >
>> > To me, it looks like a windows-specific issue where the filesystem is
>> > returning an unexpected error. So it would be helpful to see exactly
>> > which one that is, and the full trace of where it comes from, to chase
>> > it further
>> >
>> > On Thu, Apr 28, 2022 at 12:10 PM Antony Joseph
>> > <antony.dev.webmail@gmail.com> wrote:
>> > >
>> > > Thank you for your reply.
>> > >
>> > > This isn't happening in a single environment. Our application is being
>> > used
>> > > by various clients and this has been reported by multiple users - all
>> of
>> > > whom were running the earlier pylucene (v4.10) - without issues.
>> > >
>> > > One thing to mention is that our earlier version used Python 2.7.15
>> (with
>> > > pylucene 4.10) and now we are using Python 3.8.10 with Pylucene 6.5.0
>> -
>> > the
>> > > indexing logic is the same...
>> > >
>> > > One other thing to note is that the issue described has (so far!) only
>> > > occurred on MS Windows - none of our Linux customers have complained
>> > about
>> > > this.
>> > >
>> > > Any ideas?
>> > >
>> > > Regards,
>> > > Antony
>> > >
>> > > On Thu, 28 Apr 2022 at 17:00, Adrien Grand <jpountz@gmail.com> wrote:
>> > >
>> > > > Hi Anthony,
>> > > >
>> > > > This isn't something that you should try to fix programmatically,
>> > > > corruptions indicate that something is wrong with the environment,
>> > > > like a broken disk or corrupt RAM. I would suggest running a memtest
>> > > > to check your RAM and looking at system logs in case they have
>> > > > anything to tell about your disks.
>> > > >
>> > > > Can you also share the full stack trace of the exception?
>> > > >
>> > > > On Thu, Apr 28, 2022 at 10:26 AM Antony Joseph
>> > > > <antony.dev.webmail@gmail.com> wrote:
>> > > > >
>> > > > > Hello,
>> > > > >
>> > > > > We are facing a strange situation in our application as described
>> > below:
>> > > > >
>> > > > > *Using*:
>> > > > >
>> > > > > - Python 3.8.10
>> > > > > - Pylucene 6.5.0
>> > > > > - Java 8 (1.8.0_181)
>> > > > > - Runs on Linux and Windows (error seen on Windows)
>> > > > >
>> > > > > We suddenly get the following *error*:
>> > > > >
>> > > > > 2022-02-10 09:58:09.253215: ERROR : writer | Failed to get index
>> > > > > (D:\i\202202) writer, Exception:
>> > > > > org.apache.lucene.index.CorruptIndexException: Unexpected file
>> read
>> > error
>> > > > > while reading index.
>> > > > >
>> > > >
>> >
>> (resource=BufferedChecksumIndexInput(MMapIndexInput(path="D:\i\202202\segments_fo")))
>> > > > >
>> > > > >
>> > > > > After this, no further indexing happens - trying to open the index
>> > for
>> > > > > writing throws the above error - and the index writer does not
>> open.
>> > > > >
>> > > > > FYI, our code contains the following *settings*:
>> > > > >
>> > > > > index_path = "D:\i\202202"
>> > > > > index_directory = FSDirectory.open(Paths.get(index_path))
>> > > > > iconfig = IndexWriterConfig(wrapper_analyzer)
>> > > > > iconfig.setOpenMode(IndexWriterConfig.OpenMode.CREATE_OR_APPEND)
>> > > > > iconfig.setRAMBufferSizeMB(16.0)
>> > > > > writer = IndexWriter(index_directory, iconfig)
>> > > > >
>> > > > >
>> > > > > *Repairing*
>> > > > > We tried 'repairing' the index with the following command / tool:
>> > > > >
>> > > > > java -cp lucene-core-6.5.0.jar:lucene-backward-codecs-6.5.0.jar
>> > > > > org.apache.lucene.index.CheckIndex "D:\i\202202" -exorcise
>> > > > >
>> > > > > This however returns saying "No problems found with the index."
>> > > > >
>> > > > >
>> > > > > *Work around*
>> > > > > We have to manually delete the problematic segment file:
>> > > > > D:\i\202202\segments_fo
>> > > > > after which the application starts again... until the next
>> > corruption. We
>> > > > > can't spot a specific pattern.
>> > > > >
>> > > > >
>> > > > > *Two questions:*
>> > > > >
>> > > > > 1. Can we handle this situation programmatically, so that no
>> > manual
>> > > > > intervention is needed?
>> > > > > 2. Any reason why we are facing the corruption issue in the
>> first
>> > > > place?
>> > > > >
>> > > > >
>> > > > > Before this we were using Pylucene 4.10 and we didn't face this
>> > problem -
>> > > > > the application logic is the same.
>> > > > >
>> > > > > Also, while the application runs on both Linux and Windows, so
>> far we
>> > > > have
>> > > > > observed this situation only on various Windows platforms.
>> > > > >
>> > > > > Would really appreciate some assistance. Thanks in advance.
>> > > > >
>> > > > > Regards,
>> > > > > Antony
>> > > >
>> > > >
>> > > >
>> > > > --
>> > > > Adrien
>> > > >
>> > > >
>> ---------------------------------------------------------------------
>> > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > > > For additional commands, e-mail: java-user-help@lucene.apache.org
>> > > >
>> > > >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > For additional commands, e-mail: java-user-help@lucene.apache.org
>> >
>> >
>>
>
Re: Index corruption and repair [ In reply to ]
Hi Michael,

Any update?

Regards,
Antony

On Sun, 1 May 2022 at 19:35, Antony Joseph <antony.dev.webmail@gmail.com>
wrote:

> Hi Michael,
>
> Thank you for your reply. Please find responses to your questions below.
>
> Regards,
> Antony
>
> On Sat, 30 Apr 2022 at 18:59, Michael McCandless <
> lucene@mikemccandless.com> wrote:
>
>> Hi Antony,
>>
>> Hmm it looks like the root cause is this:
>>
>> Caused by: java.nio.file.NoSuchFileException: D:\i\202204\_14gb.si
>>
>> Can you list all the files in the index directory at the time this
>> exception happens, and reply here? We need to figure out whether the file
>> is really missing or what.
>>
> Below the index directory file listing. Yes, file is missing (D:\i\202204\_
> 14gb.si)
>
>>
>> Do you run any virus scanner / disk file tree utilities / etc.? In the
>> distant past sometimes such programs might cause strange transient errors
>> if they open a file for read exclusively or so, on windows.
>>
> There is no virus scanner running.
>
>>
>> What is the actual drive you are storing the index on (D:)? Is it a
>> local disk or remote SMBFS mount?
>>
> It's a local disk (D:).
>
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Sat, Apr 30, 2022 at 8:39 AM Antony Joseph <
>> antony.dev.webmail@gmail.com> wrote:
>>
>>> Thank you for your reply.
>>>
>>> *The full stack trace is included:*
>>>
>>> <super: <class 'JavaError'>, <JavaError object>>
>>> Java stacktrace:
>>> org.apache.lucene.index.CorruptIndexException: Unexpected file read error
>>> while
>>> reading index.
>>>
>>> (resource=BufferedChecksumIndexInput(MMapIndexInput(path="D:\i\202204\segments_10fj")))
>>> at
>>> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:290)
>>> at
>>>
>>> org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:165)
>>> at
>>> org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:972)
>>> Caused by: java.nio.file.NoSuchFileException: D:\i\202204\_14gb.si
>>> at sun.nio.fs.WindowsException.translateToIOException(Unknown
>>> Source)
>>> at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown
>>> Source)
>>> at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown
>>> Source)
>>> at sun.nio.fs.WindowsFileSystemProvider.newFileChannel(Unknown
>>> Source)
>>> at java.nio.channels.FileChannel.open(Unknown Source)
>>> at java.nio.channels.FileChannel.open(Unknown Source)
>>> at
>>> org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:238)
>>> at
>>> org.apache.lucene.store.Directory.openChecksumInput(Directory.java:137)
>>> at
>>>
>>> org.apache.lucene.codecs.lucene62.Lucene62SegmentInfoFormat.read(Lucene62SegmentInfoFormat.java:89)
>>> at
>>> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:357)
>>> at
>>> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:288)
>>> ... 2 more
>>>
>>> Traceback (most recent call last):
>>> File "index.py", line 112, in start
>>> writer = IndexWriter(index_directory, iconfig)
>>> lucene.JavaError: <super: <class 'JavaError'>, <JavaError object>>
>>> Java stacktrace:
>>> org.apache.lucene.index.CorruptIndexException: Unexpected file read error
>>> while
>>> reading index.
>>>
>>> (resource=BufferedChecksumIndexInput(MMapIndexInput(path="D:\i\202204\segments_10fj")))
>>> at
>>> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:290)
>>> at
>>>
>>> org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:165)
>>> at
>>> org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:972)
>>> Caused by: java.nio.file.NoSuchFileException: D:\i\202204\_14gb.si
>>> at sun.nio.fs.WindowsException.translateToIOException(Unknown
>>> Source)
>>> at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown
>>> Source)
>>> at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown
>>> Source)
>>> at sun.nio.fs.WindowsFileSystemProvider.newFileChannel(Unknown
>>> Source)
>>> at java.nio.channels.FileChannel.open(Unknown Source)
>>> at java.nio.channels.FileChannel.open(Unknown Source)
>>> at
>>> org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:238)
>>> at
>>> org.apache.lucene.store.Directory.openChecksumInput(Directory.java:137)
>>> at
>>>
>>> org.apache.lucene.codecs.lucene62.Lucene62SegmentInfoFormat.read(Lucene62SegmentInfoFormat.java:89)
>>> at
>>> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:357)
>>> at
>>> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:288)
>>> ... 2 more
>>>
>>>
>>> Regards,
>>> Antony
>>>
>>> On Sat, 30 Apr 2022 at 10:59, Robert Muir <rcmuir@gmail.com> wrote:
>>>
>>> > The most helpful thing would be the full stacktrace of the exception.
>>> > This exception should be chaining the original exception and call
>>> > site, and maybe tell us more about this error you hit.
>>> >
>>> > To me, it looks like a windows-specific issue where the filesystem is
>>> > returning an unexpected error. So it would be helpful to see exactly
>>> > which one that is, and the full trace of where it comes from, to chase
>>> > it further
>>> >
>>> > On Thu, Apr 28, 2022 at 12:10 PM Antony Joseph
>>> > <antony.dev.webmail@gmail.com> wrote:
>>> > >
>>> > > Thank you for your reply.
>>> > >
>>> > > This isn't happening in a single environment. Our application is
>>> being
>>> > used
>>> > > by various clients and this has been reported by multiple users -
>>> all of
>>> > > whom were running the earlier pylucene (v4.10) - without issues.
>>> > >
>>> > > One thing to mention is that our earlier version used Python 2.7.15
>>> (with
>>> > > pylucene 4.10) and now we are using Python 3.8.10 with Pylucene
>>> 6.5.0 -
>>> > the
>>> > > indexing logic is the same...
>>> > >
>>> > > One other thing to note is that the issue described has (so far!)
>>> only
>>> > > occurred on MS Windows - none of our Linux customers have complained
>>> > about
>>> > > this.
>>> > >
>>> > > Any ideas?
>>> > >
>>> > > Regards,
>>> > > Antony
>>> > >
>>> > > On Thu, 28 Apr 2022 at 17:00, Adrien Grand <jpountz@gmail.com>
>>> wrote:
>>> > >
>>> > > > Hi Anthony,
>>> > > >
>>> > > > This isn't something that you should try to fix programmatically,
>>> > > > corruptions indicate that something is wrong with the environment,
>>> > > > like a broken disk or corrupt RAM. I would suggest running a
>>> memtest
>>> > > > to check your RAM and looking at system logs in case they have
>>> > > > anything to tell about your disks.
>>> > > >
>>> > > > Can you also share the full stack trace of the exception?
>>> > > >
>>> > > > On Thu, Apr 28, 2022 at 10:26 AM Antony Joseph
>>> > > > <antony.dev.webmail@gmail.com> wrote:
>>> > > > >
>>> > > > > Hello,
>>> > > > >
>>> > > > > We are facing a strange situation in our application as described
>>> > below:
>>> > > > >
>>> > > > > *Using*:
>>> > > > >
>>> > > > > - Python 3.8.10
>>> > > > > - Pylucene 6.5.0
>>> > > > > - Java 8 (1.8.0_181)
>>> > > > > - Runs on Linux and Windows (error seen on Windows)
>>> > > > >
>>> > > > > We suddenly get the following *error*:
>>> > > > >
>>> > > > > 2022-02-10 09:58:09.253215: ERROR : writer | Failed to get index
>>> > > > > (D:\i\202202) writer, Exception:
>>> > > > > org.apache.lucene.index.CorruptIndexException: Unexpected file
>>> read
>>> > error
>>> > > > > while reading index.
>>> > > > >
>>> > > >
>>> >
>>> (resource=BufferedChecksumIndexInput(MMapIndexInput(path="D:\i\202202\segments_fo")))
>>> > > > >
>>> > > > >
>>> > > > > After this, no further indexing happens - trying to open the
>>> index
>>> > for
>>> > > > > writing throws the above error - and the index writer does not
>>> open.
>>> > > > >
>>> > > > > FYI, our code contains the following *settings*:
>>> > > > >
>>> > > > > index_path = "D:\i\202202"
>>> > > > > index_directory = FSDirectory.open(Paths.get(index_path))
>>> > > > > iconfig = IndexWriterConfig(wrapper_analyzer)
>>> > > > > iconfig.setOpenMode(IndexWriterConfig.OpenMode.CREATE_OR_APPEND)
>>> > > > > iconfig.setRAMBufferSizeMB(16.0)
>>> > > > > writer = IndexWriter(index_directory, iconfig)
>>> > > > >
>>> > > > >
>>> > > > > *Repairing*
>>> > > > > We tried 'repairing' the index with the following command / tool:
>>> > > > >
>>> > > > > java -cp lucene-core-6.5.0.jar:lucene-backward-codecs-6.5.0.jar
>>> > > > > org.apache.lucene.index.CheckIndex "D:\i\202202" -exorcise
>>> > > > >
>>> > > > > This however returns saying "No problems found with the index."
>>> > > > >
>>> > > > >
>>> > > > > *Work around*
>>> > > > > We have to manually delete the problematic segment file:
>>> > > > > D:\i\202202\segments_fo
>>> > > > > after which the application starts again... until the next
>>> > corruption. We
>>> > > > > can't spot a specific pattern.
>>> > > > >
>>> > > > >
>>> > > > > *Two questions:*
>>> > > > >
>>> > > > > 1. Can we handle this situation programmatically, so that no
>>> > manual
>>> > > > > intervention is needed?
>>> > > > > 2. Any reason why we are facing the corruption issue in the
>>> first
>>> > > > place?
>>> > > > >
>>> > > > >
>>> > > > > Before this we were using Pylucene 4.10 and we didn't face this
>>> > problem -
>>> > > > > the application logic is the same.
>>> > > > >
>>> > > > > Also, while the application runs on both Linux and Windows, so
>>> far we
>>> > > > have
>>> > > > > observed this situation only on various Windows platforms.
>>> > > > >
>>> > > > > Would really appreciate some assistance. Thanks in advance.
>>> > > > >
>>> > > > > Regards,
>>> > > > > Antony
>>> > > >
>>> > > >
>>> > > >
>>> > > > --
>>> > > > Adrien
>>> > > >
>>> > > >
>>> ---------------------------------------------------------------------
>>> > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> > > > For additional commands, e-mail: java-user-help@lucene.apache.org
>>> > > >
>>> > > >
>>> >
>>> > ---------------------------------------------------------------------
>>> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> > For additional commands, e-mail: java-user-help@lucene.apache.org
>>> >
>>> >
>>>
>>
Re: Index corruption and repair [ In reply to ]
Hi Antony,

Sorry for the late reply.

Indeed the file _14gb.si is missing, yet _14gb.cfs is present (interesting
-- must have failed deletion because an IndexReader has it open). And yet
when you run CheckIndex on this directory (without -exorcise), the index is
fine? No errors reported? Can you post the full CheckIndex output?

There are two segments_N files present, which is interesting. Are
you using the default IndexDeletionPolicy (which deletes the old segments_N
file as soon as the new segments_N+1 is done being committed)?

Do you open near-real-time readers (passing IndexWriter to
DirectoryReader.open)? Or filesystem based readers only (passing Directory
to DirectoryReader.open)?

How do you reopen/refresh those IndexReaders? Is it "every N seconds"? Or
is it timed to after the IndexWriter.commit() has finished? How often are
you calling IndexWriter.commit()?

6.5.0 is quite old by now, and I poked around in our issue history
<https://jirasearch.mikemccandless.com/search.py?index=jira> to see if this
might be a known issue. The only interesting issue I found was LUCENE-6835
<https://issues.apache.org/jira/browse/LUCENE-6835> which shifted
responsibility of retrying file deletions down into Directory (instead of
IndexWriter), but that landed in 6.0 and hopefully any bugs were ironed out
by 6.5.0.

Mike McCandless

http://blog.mikemccandless.com


On Wed, May 4, 2022 at 3:44 PM Antony Joseph <antony.dev.webmail@gmail.com>
wrote:

> Hi Michael,
>
> Any update?
>
> Regards,
> Antony
>
> On Sun, 1 May 2022 at 19:35, Antony Joseph <antony.dev.webmail@gmail.com>
> wrote:
>
>> Hi Michael,
>>
>> Thank you for your reply. Please find responses to your questions below.
>>
>> Regards,
>> Antony
>>
>> On Sat, 30 Apr 2022 at 18:59, Michael McCandless <
>> lucene@mikemccandless.com> wrote:
>>
>>> Hi Antony,
>>>
>>> Hmm it looks like the root cause is this:
>>>
>>> Caused by: java.nio.file.NoSuchFileException: D:\i\202204\_14gb.si
>>>
>>> Can you list all the files in the index directory at the time this
>>> exception happens, and reply here? We need to figure out whether the file
>>> is really missing or what.
>>>
>> Below the index directory file listing. Yes, file is missing
>> (D:\i\202204\_14gb.si)
>>
>>>
>>> Do you run any virus scanner / disk file tree utilities / etc.? In the
>>> distant past sometimes such programs might cause strange transient errors
>>> if they open a file for read exclusively or so, on windows.
>>>
>> There is no virus scanner running.
>>
>>>
>>> What is the actual drive you are storing the index on (D:)? Is it a
>>> local disk or remote SMBFS mount?
>>>
>> It's a local disk (D:).
>>
>>>
>>> Mike McCandless
>>>
>>> http://blog.mikemccandless.com
>>>
>>>
>>> On Sat, Apr 30, 2022 at 8:39 AM Antony Joseph <
>>> antony.dev.webmail@gmail.com> wrote:
>>>
>>>> Thank you for your reply.
>>>>
>>>> *The full stack trace is included:*
>>>>
>>>> <super: <class 'JavaError'>, <JavaError object>>
>>>> Java stacktrace:
>>>> org.apache.lucene.index.CorruptIndexException: Unexpected file read
>>>> error
>>>> while
>>>> reading index.
>>>>
>>>> (resource=BufferedChecksumIndexInput(MMapIndexInput(path="D:\i\202204\segments_10fj")))
>>>> at
>>>> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:290)
>>>> at
>>>>
>>>> org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:165)
>>>> at
>>>> org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:972)
>>>> Caused by: java.nio.file.NoSuchFileException: D:\i\202204\_14gb.si
>>>> at sun.nio.fs.WindowsException.translateToIOException(Unknown
>>>> Source)
>>>> at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown
>>>> Source)
>>>> at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown
>>>> Source)
>>>> at sun.nio.fs.WindowsFileSystemProvider.newFileChannel(Unknown
>>>> Source)
>>>> at java.nio.channels.FileChannel.open(Unknown Source)
>>>> at java.nio.channels.FileChannel.open(Unknown Source)
>>>> at
>>>> org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:238)
>>>> at
>>>> org.apache.lucene.store.Directory.openChecksumInput(Directory.java:137)
>>>> at
>>>>
>>>> org.apache.lucene.codecs.lucene62.Lucene62SegmentInfoFormat.read(Lucene62SegmentInfoFormat.java:89)
>>>> at
>>>> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:357)
>>>> at
>>>> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:288)
>>>> ... 2 more
>>>>
>>>> Traceback (most recent call last):
>>>> File "index.py", line 112, in start
>>>> writer = IndexWriter(index_directory, iconfig)
>>>> lucene.JavaError: <super: <class 'JavaError'>, <JavaError object>>
>>>> Java stacktrace:
>>>> org.apache.lucene.index.CorruptIndexException: Unexpected file read
>>>> error
>>>> while
>>>> reading index.
>>>>
>>>> (resource=BufferedChecksumIndexInput(MMapIndexInput(path="D:\i\202204\segments_10fj")))
>>>> at
>>>> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:290)
>>>> at
>>>>
>>>> org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:165)
>>>> at
>>>> org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:972)
>>>> Caused by: java.nio.file.NoSuchFileException: D:\i\202204\_14gb.si
>>>> at sun.nio.fs.WindowsException.translateToIOException(Unknown
>>>> Source)
>>>> at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown
>>>> Source)
>>>> at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown
>>>> Source)
>>>> at sun.nio.fs.WindowsFileSystemProvider.newFileChannel(Unknown
>>>> Source)
>>>> at java.nio.channels.FileChannel.open(Unknown Source)
>>>> at java.nio.channels.FileChannel.open(Unknown Source)
>>>> at
>>>> org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:238)
>>>> at
>>>> org.apache.lucene.store.Directory.openChecksumInput(Directory.java:137)
>>>> at
>>>>
>>>> org.apache.lucene.codecs.lucene62.Lucene62SegmentInfoFormat.read(Lucene62SegmentInfoFormat.java:89)
>>>> at
>>>> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:357)
>>>> at
>>>> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:288)
>>>> ... 2 more
>>>>
>>>>
>>>> Regards,
>>>> Antony
>>>>
>>>> On Sat, 30 Apr 2022 at 10:59, Robert Muir <rcmuir@gmail.com> wrote:
>>>>
>>>> > The most helpful thing would be the full stacktrace of the exception.
>>>> > This exception should be chaining the original exception and call
>>>> > site, and maybe tell us more about this error you hit.
>>>> >
>>>> > To me, it looks like a windows-specific issue where the filesystem is
>>>> > returning an unexpected error. So it would be helpful to see exactly
>>>> > which one that is, and the full trace of where it comes from, to chase
>>>> > it further
>>>> >
>>>> > On Thu, Apr 28, 2022 at 12:10 PM Antony Joseph
>>>> > <antony.dev.webmail@gmail.com> wrote:
>>>> > >
>>>> > > Thank you for your reply.
>>>> > >
>>>> > > This isn't happening in a single environment. Our application is
>>>> being
>>>> > used
>>>> > > by various clients and this has been reported by multiple users -
>>>> all of
>>>> > > whom were running the earlier pylucene (v4.10) - without issues.
>>>> > >
>>>> > > One thing to mention is that our earlier version used Python 2.7.15
>>>> (with
>>>> > > pylucene 4.10) and now we are using Python 3.8.10 with Pylucene
>>>> 6.5.0 -
>>>> > the
>>>> > > indexing logic is the same...
>>>> > >
>>>> > > One other thing to note is that the issue described has (so far!)
>>>> only
>>>> > > occurred on MS Windows - none of our Linux customers have complained
>>>> > about
>>>> > > this.
>>>> > >
>>>> > > Any ideas?
>>>> > >
>>>> > > Regards,
>>>> > > Antony
>>>> > >
>>>> > > On Thu, 28 Apr 2022 at 17:00, Adrien Grand <jpountz@gmail.com>
>>>> wrote:
>>>> > >
>>>> > > > Hi Anthony,
>>>> > > >
>>>> > > > This isn't something that you should try to fix programmatically,
>>>> > > > corruptions indicate that something is wrong with the environment,
>>>> > > > like a broken disk or corrupt RAM. I would suggest running a
>>>> memtest
>>>> > > > to check your RAM and looking at system logs in case they have
>>>> > > > anything to tell about your disks.
>>>> > > >
>>>> > > > Can you also share the full stack trace of the exception?
>>>> > > >
>>>> > > > On Thu, Apr 28, 2022 at 10:26 AM Antony Joseph
>>>> > > > <antony.dev.webmail@gmail.com> wrote:
>>>> > > > >
>>>> > > > > Hello,
>>>> > > > >
>>>> > > > > We are facing a strange situation in our application as
>>>> described
>>>> > below:
>>>> > > > >
>>>> > > > > *Using*:
>>>> > > > >
>>>> > > > > - Python 3.8.10
>>>> > > > > - Pylucene 6.5.0
>>>> > > > > - Java 8 (1.8.0_181)
>>>> > > > > - Runs on Linux and Windows (error seen on Windows)
>>>> > > > >
>>>> > > > > We suddenly get the following *error*:
>>>> > > > >
>>>> > > > > 2022-02-10 09:58:09.253215: ERROR : writer | Failed to get index
>>>> > > > > (D:\i\202202) writer, Exception:
>>>> > > > > org.apache.lucene.index.CorruptIndexException: Unexpected file
>>>> read
>>>> > error
>>>> > > > > while reading index.
>>>> > > > >
>>>> > > >
>>>> >
>>>> (resource=BufferedChecksumIndexInput(MMapIndexInput(path="D:\i\202202\segments_fo")))
>>>> > > > >
>>>> > > > >
>>>> > > > > After this, no further indexing happens - trying to open the
>>>> index
>>>> > for
>>>> > > > > writing throws the above error - and the index writer does not
>>>> open.
>>>> > > > >
>>>> > > > > FYI, our code contains the following *settings*:
>>>> > > > >
>>>> > > > > index_path = "D:\i\202202"
>>>> > > > > index_directory = FSDirectory.open(Paths.get(index_path))
>>>> > > > > iconfig = IndexWriterConfig(wrapper_analyzer)
>>>> > > > > iconfig.setOpenMode(IndexWriterConfig.OpenMode.CREATE_OR_APPEND)
>>>> > > > > iconfig.setRAMBufferSizeMB(16.0)
>>>> > > > > writer = IndexWriter(index_directory, iconfig)
>>>> > > > >
>>>> > > > >
>>>> > > > > *Repairing*
>>>> > > > > We tried 'repairing' the index with the following command /
>>>> tool:
>>>> > > > >
>>>> > > > > java -cp lucene-core-6.5.0.jar:lucene-backward-codecs-6.5.0.jar
>>>> > > > > org.apache.lucene.index.CheckIndex "D:\i\202202" -exorcise
>>>> > > > >
>>>> > > > > This however returns saying "No problems found with the index."
>>>> > > > >
>>>> > > > >
>>>> > > > > *Work around*
>>>> > > > > We have to manually delete the problematic segment file:
>>>> > > > > D:\i\202202\segments_fo
>>>> > > > > after which the application starts again... until the next
>>>> > corruption. We
>>>> > > > > can't spot a specific pattern.
>>>> > > > >
>>>> > > > >
>>>> > > > > *Two questions:*
>>>> > > > >
>>>> > > > > 1. Can we handle this situation programmatically, so that no
>>>> > manual
>>>> > > > > intervention is needed?
>>>> > > > > 2. Any reason why we are facing the corruption issue in the
>>>> first
>>>> > > > place?
>>>> > > > >
>>>> > > > >
>>>> > > > > Before this we were using Pylucene 4.10 and we didn't face this
>>>> > problem -
>>>> > > > > the application logic is the same.
>>>> > > > >
>>>> > > > > Also, while the application runs on both Linux and Windows, so
>>>> far we
>>>> > > > have
>>>> > > > > observed this situation only on various Windows platforms.
>>>> > > > >
>>>> > > > > Would really appreciate some assistance. Thanks in advance.
>>>> > > > >
>>>> > > > > Regards,
>>>> > > > > Antony
>>>> > > >
>>>> > > >
>>>> > > >
>>>> > > > --
>>>> > > > Adrien
>>>> > > >
>>>> > > >
>>>> ---------------------------------------------------------------------
>>>> > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> > > > For additional commands, e-mail: java-user-help@lucene.apache.org
>>>> > > >
>>>> > > >
>>>> >
>>>> > ---------------------------------------------------------------------
>>>> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> > For additional commands, e-mail: java-user-help@lucene.apache.org
>>>> >
>>>> >
>>>>
>>>
Re: Index corruption and repair [ In reply to ]
Hi,

To find all errors in an index, you should pass -ea to the java command line to enable assertions.

Uwe

Am 5. Mai 2022 14:25:03 UTC schrieb Michael McCandless <lucene@mikemccandless.com>:
>Hi Antony,
>
>Sorry for the late reply.
>
>Indeed the file _14gb.si is missing, yet _14gb.cfs is present (interesting
>-- must have failed deletion because an IndexReader has it open). And yet
>when you run CheckIndex on this directory (without -exorcise), the index is
>fine? No errors reported? Can you post the full CheckIndex output?
>
>There are two segments_N files present, which is interesting. Are
>you using the default IndexDeletionPolicy (which deletes the old segments_N
>file as soon as the new segments_N+1 is done being committed)?
>
>Do you open near-real-time readers (passing IndexWriter to
>DirectoryReader.open)? Or filesystem based readers only (passing Directory
>to DirectoryReader.open)?
>
>How do you reopen/refresh those IndexReaders? Is it "every N seconds"? Or
>is it timed to after the IndexWriter.commit() has finished? How often are
>you calling IndexWriter.commit()?
>
>6.5.0 is quite old by now, and I poked around in our issue history
><https://jirasearch.mikemccandless.com/search.py?index=jira> to see if this
>might be a known issue. The only interesting issue I found was LUCENE-6835
><https://issues.apache.org/jira/browse/LUCENE-6835> which shifted
>responsibility of retrying file deletions down into Directory (instead of
>IndexWriter), but that landed in 6.0 and hopefully any bugs were ironed out
>by 6.5.0.
>
>Mike McCandless
>
>http://blog.mikemccandless.com
>
>
>On Wed, May 4, 2022 at 3:44 PM Antony Joseph <antony.dev.webmail@gmail.com>
>wrote:
>
>> Hi Michael,
>>
>> Any update?
>>
>> Regards,
>> Antony
>>
>> On Sun, 1 May 2022 at 19:35, Antony Joseph <antony.dev.webmail@gmail.com>
>> wrote:
>>
>>> Hi Michael,
>>>
>>> Thank you for your reply. Please find responses to your questions below.
>>>
>>> Regards,
>>> Antony
>>>
>>> On Sat, 30 Apr 2022 at 18:59, Michael McCandless <
>>> lucene@mikemccandless.com> wrote:
>>>
>>>> Hi Antony,
>>>>
>>>> Hmm it looks like the root cause is this:
>>>>
>>>> Caused by: java.nio.file.NoSuchFileException: D:\i\202204\_14gb.si
>>>>
>>>> Can you list all the files in the index directory at the time this
>>>> exception happens, and reply here? We need to figure out whether the file
>>>> is really missing or what.
>>>>
>>> Below the index directory file listing. Yes, file is missing
>>> (D:\i\202204\_14gb.si)
>>>
>>>>
>>>> Do you run any virus scanner / disk file tree utilities / etc.? In the
>>>> distant past sometimes such programs might cause strange transient errors
>>>> if they open a file for read exclusively or so, on windows.
>>>>
>>> There is no virus scanner running.
>>>
>>>>
>>>> What is the actual drive you are storing the index on (D:)? Is it a
>>>> local disk or remote SMBFS mount?
>>>>
>>> It's a local disk (D:).
>>>
>>>>
>>>> Mike McCandless
>>>>
>>>> http://blog.mikemccandless.com
>>>>
>>>>
>>>> On Sat, Apr 30, 2022 at 8:39 AM Antony Joseph <
>>>> antony.dev.webmail@gmail.com> wrote:
>>>>
>>>>> Thank you for your reply.
>>>>>
>>>>> *The full stack trace is included:*
>>>>>
>>>>> <super: <class 'JavaError'>, <JavaError object>>
>>>>> Java stacktrace:
>>>>> org.apache.lucene.index.CorruptIndexException: Unexpected file read
>>>>> error
>>>>> while
>>>>> reading index.
>>>>>
>>>>> (resource=BufferedChecksumIndexInput(MMapIndexInput(path="D:\i\202204\segments_10fj")))
>>>>> at
>>>>> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:290)
>>>>> at
>>>>>
>>>>> org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:165)
>>>>> at
>>>>> org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:972)
>>>>> Caused by: java.nio.file.NoSuchFileException: D:\i\202204\_14gb.si
>>>>> at sun.nio.fs.WindowsException.translateToIOException(Unknown
>>>>> Source)
>>>>> at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown
>>>>> Source)
>>>>> at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown
>>>>> Source)
>>>>> at sun.nio.fs.WindowsFileSystemProvider.newFileChannel(Unknown
>>>>> Source)
>>>>> at java.nio.channels.FileChannel.open(Unknown Source)
>>>>> at java.nio.channels.FileChannel.open(Unknown Source)
>>>>> at
>>>>> org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:238)
>>>>> at
>>>>> org.apache.lucene.store.Directory.openChecksumInput(Directory.java:137)
>>>>> at
>>>>>
>>>>> org.apache.lucene.codecs.lucene62.Lucene62SegmentInfoFormat.read(Lucene62SegmentInfoFormat.java:89)
>>>>> at
>>>>> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:357)
>>>>> at
>>>>> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:288)
>>>>> ... 2 more
>>>>>
>>>>> Traceback (most recent call last):
>>>>> File "index.py", line 112, in start
>>>>> writer = IndexWriter(index_directory, iconfig)
>>>>> lucene.JavaError: <super: <class 'JavaError'>, <JavaError object>>
>>>>> Java stacktrace:
>>>>> org.apache.lucene.index.CorruptIndexException: Unexpected file read
>>>>> error
>>>>> while
>>>>> reading index.
>>>>>
>>>>> (resource=BufferedChecksumIndexInput(MMapIndexInput(path="D:\i\202204\segments_10fj")))
>>>>> at
>>>>> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:290)
>>>>> at
>>>>>
>>>>> org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:165)
>>>>> at
>>>>> org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:972)
>>>>> Caused by: java.nio.file.NoSuchFileException: D:\i\202204\_14gb.si
>>>>> at sun.nio.fs.WindowsException.translateToIOException(Unknown
>>>>> Source)
>>>>> at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown
>>>>> Source)
>>>>> at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown
>>>>> Source)
>>>>> at sun.nio.fs.WindowsFileSystemProvider.newFileChannel(Unknown
>>>>> Source)
>>>>> at java.nio.channels.FileChannel.open(Unknown Source)
>>>>> at java.nio.channels.FileChannel.open(Unknown Source)
>>>>> at
>>>>> org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:238)
>>>>> at
>>>>> org.apache.lucene.store.Directory.openChecksumInput(Directory.java:137)
>>>>> at
>>>>>
>>>>> org.apache.lucene.codecs.lucene62.Lucene62SegmentInfoFormat.read(Lucene62SegmentInfoFormat.java:89)
>>>>> at
>>>>> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:357)
>>>>> at
>>>>> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:288)
>>>>> ... 2 more
>>>>>
>>>>>
>>>>> Regards,
>>>>> Antony
>>>>>
>>>>> On Sat, 30 Apr 2022 at 10:59, Robert Muir <rcmuir@gmail.com> wrote:
>>>>>
>>>>> > The most helpful thing would be the full stacktrace of the exception.
>>>>> > This exception should be chaining the original exception and call
>>>>> > site, and maybe tell us more about this error you hit.
>>>>> >
>>>>> > To me, it looks like a windows-specific issue where the filesystem is
>>>>> > returning an unexpected error. So it would be helpful to see exactly
>>>>> > which one that is, and the full trace of where it comes from, to chase
>>>>> > it further
>>>>> >
>>>>> > On Thu, Apr 28, 2022 at 12:10 PM Antony Joseph
>>>>> > <antony.dev.webmail@gmail.com> wrote:
>>>>> > >
>>>>> > > Thank you for your reply.
>>>>> > >
>>>>> > > This isn't happening in a single environment. Our application is
>>>>> being
>>>>> > used
>>>>> > > by various clients and this has been reported by multiple users -
>>>>> all of
>>>>> > > whom were running the earlier pylucene (v4.10) - without issues.
>>>>> > >
>>>>> > > One thing to mention is that our earlier version used Python 2.7.15
>>>>> (with
>>>>> > > pylucene 4.10) and now we are using Python 3.8.10 with Pylucene
>>>>> 6.5.0 -
>>>>> > the
>>>>> > > indexing logic is the same...
>>>>> > >
>>>>> > > One other thing to note is that the issue described has (so far!)
>>>>> only
>>>>> > > occurred on MS Windows - none of our Linux customers have complained
>>>>> > about
>>>>> > > this.
>>>>> > >
>>>>> > > Any ideas?
>>>>> > >
>>>>> > > Regards,
>>>>> > > Antony
>>>>> > >
>>>>> > > On Thu, 28 Apr 2022 at 17:00, Adrien Grand <jpountz@gmail.com>
>>>>> wrote:
>>>>> > >
>>>>> > > > Hi Anthony,
>>>>> > > >
>>>>> > > > This isn't something that you should try to fix programmatically,
>>>>> > > > corruptions indicate that something is wrong with the environment,
>>>>> > > > like a broken disk or corrupt RAM. I would suggest running a
>>>>> memtest
>>>>> > > > to check your RAM and looking at system logs in case they have
>>>>> > > > anything to tell about your disks.
>>>>> > > >
>>>>> > > > Can you also share the full stack trace of the exception?
>>>>> > > >
>>>>> > > > On Thu, Apr 28, 2022 at 10:26 AM Antony Joseph
>>>>> > > > <antony.dev.webmail@gmail.com> wrote:
>>>>> > > > >
>>>>> > > > > Hello,
>>>>> > > > >
>>>>> > > > > We are facing a strange situation in our application as
>>>>> described
>>>>> > below:
>>>>> > > > >
>>>>> > > > > *Using*:
>>>>> > > > >
>>>>> > > > > - Python 3.8.10
>>>>> > > > > - Pylucene 6.5.0
>>>>> > > > > - Java 8 (1.8.0_181)
>>>>> > > > > - Runs on Linux and Windows (error seen on Windows)
>>>>> > > > >
>>>>> > > > > We suddenly get the following *error*:
>>>>> > > > >
>>>>> > > > > 2022-02-10 09:58:09.253215: ERROR : writer | Failed to get index
>>>>> > > > > (D:\i\202202) writer, Exception:
>>>>> > > > > org.apache.lucene.index.CorruptIndexException: Unexpected file
>>>>> read
>>>>> > error
>>>>> > > > > while reading index.
>>>>> > > > >
>>>>> > > >
>>>>> >
>>>>> (resource=BufferedChecksumIndexInput(MMapIndexInput(path="D:\i\202202\segments_fo")))
>>>>> > > > >
>>>>> > > > >
>>>>> > > > > After this, no further indexing happens - trying to open the
>>>>> index
>>>>> > for
>>>>> > > > > writing throws the above error - and the index writer does not
>>>>> open.
>>>>> > > > >
>>>>> > > > > FYI, our code contains the following *settings*:
>>>>> > > > >
>>>>> > > > > index_path = "D:\i\202202"
>>>>> > > > > index_directory = FSDirectory.open(Paths.get(index_path))
>>>>> > > > > iconfig = IndexWriterConfig(wrapper_analyzer)
>>>>> > > > > iconfig.setOpenMode(IndexWriterConfig.OpenMode.CREATE_OR_APPEND)
>>>>> > > > > iconfig.setRAMBufferSizeMB(16.0)
>>>>> > > > > writer = IndexWriter(index_directory, iconfig)
>>>>> > > > >
>>>>> > > > >
>>>>> > > > > *Repairing*
>>>>> > > > > We tried 'repairing' the index with the following command /
>>>>> tool:
>>>>> > > > >
>>>>> > > > > java -cp lucene-core-6.5.0.jar:lucene-backward-codecs-6.5.0.jar
>>>>> > > > > org.apache.lucene.index.CheckIndex "D:\i\202202" -exorcise
>>>>> > > > >
>>>>> > > > > This however returns saying "No problems found with the index."
>>>>> > > > >
>>>>> > > > >
>>>>> > > > > *Work around*
>>>>> > > > > We have to manually delete the problematic segment file:
>>>>> > > > > D:\i\202202\segments_fo
>>>>> > > > > after which the application starts again... until the next
>>>>> > corruption. We
>>>>> > > > > can't spot a specific pattern.
>>>>> > > > >
>>>>> > > > >
>>>>> > > > > *Two questions:*
>>>>> > > > >
>>>>> > > > > 1. Can we handle this situation programmatically, so that no
>>>>> > manual
>>>>> > > > > intervention is needed?
>>>>> > > > > 2. Any reason why we are facing the corruption issue in the
>>>>> first
>>>>> > > > place?
>>>>> > > > >
>>>>> > > > >
>>>>> > > > > Before this we were using Pylucene 4.10 and we didn't face this
>>>>> > problem -
>>>>> > > > > the application logic is the same.
>>>>> > > > >
>>>>> > > > > Also, while the application runs on both Linux and Windows, so
>>>>> far we
>>>>> > > > have
>>>>> > > > > observed this situation only on various Windows platforms.
>>>>> > > > >
>>>>> > > > > Would really appreciate some assistance. Thanks in advance.
>>>>> > > > >
>>>>> > > > > Regards,
>>>>> > > > > Antony
>>>>> > > >
>>>>> > > >
>>>>> > > >
>>>>> > > > --
>>>>> > > > Adrien
>>>>> > > >
>>>>> > > >
>>>>> ---------------------------------------------------------------------
>>>>> > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> > > > For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>> > > >
>>>>> > > >
>>>>> >
>>>>> > ---------------------------------------------------------------------
>>>>> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> > For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>> >
>>>>> >
>>>>>
>>>>

--
Uwe Schindler
Achterdiek 19, 28357 Bremen
https://www.thetaphi.de
Re: Index corruption and repair [ In reply to ]
On Thu, May 5, 2022 at 10:30 AM Uwe Schindler <uwe@thetaphi.de> wrote:

To find all errors in an index, you should pass -ea to the java command
> line to enable assertions.
>

+1

Tempting to make CheckIndex demand that :) Or at least, slow you down and
make it clear why, if assertions are disabled.

Mike McCandless

http://blog.mikemccandless.com
Re: Index corruption and repair [ In reply to ]
Antony, do you maybe have Microsoft Defender turned on, which might
quarantine files that it suspects are malicious? I'm not sure if it is on
by default these days on modern Windows boxes ...

Mike McCandless

http://blog.mikemccandless.com


On Thu, May 5, 2022 at 10:34 AM Michael McCandless <
lucene@mikemccandless.com> wrote:

> On Thu, May 5, 2022 at 10:30 AM Uwe Schindler <uwe@thetaphi.de> wrote:
>
> To find all errors in an index, you should pass -ea to the java command
>> line to enable assertions.
>>
>
> +1
>
> Tempting to make CheckIndex demand that :) Or at least, slow you down and
> make it clear why, if assertions are disabled.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
Re: Index corruption and repair [ In reply to ]
Hello Mike,

1. As requested, the full checkindex log is attached.

2. We haven't made any changes to the IndexDeletionPolicy - so the
assumption is the default policy is being used.

3. No, we are not using near-real-time readers. We are using filesystem
based readers only (passing Directory
to DirectoryReader.open) as below:

index_directory = FSDirectory.open(Paths.get(index_path))
if DirectoryReader.indexExists(index_directory): # valid index or not
reader = DirectoryReader.open(index_directory)

4. Index readers are checked every 10 seconds as to whether they should be
reopened or not (if the index has changed). Like this:

(reader is an IndexReader)

if not reader.isCurrent():
i_reader = DirectoryReader.openIfChanged(reader)

Yes, 6.5.0 is old, but for now we are constrained to use it; till we can
come up with a plan to upgrade (it'll involve reindexing a lot of data).

If you think it would help you, I could also share with you the simple flow
of our application which includes index, update, delete of the documents.

By the way the users say no Antivirus/Microsoft Defender. Besides before
upgrading to lucene 6.5.0 the same application with lucene 4.10 was running
fine on the same system.

Thanks for your assistance.

Regards,
Antony

On Thu, 5 May 2022 at 20:06, Michael McCandless <lucene@mikemccandless.com>
wrote:

> Antony, do you maybe have Microsoft Defender turned on, which might
> quarantine files that it suspects are malicious? I'm not sure if it is on
> by default these days on modern Windows boxes ...
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Thu, May 5, 2022 at 10:34 AM Michael McCandless <
> lucene@mikemccandless.com> wrote:
>
>> On Thu, May 5, 2022 at 10:30 AM Uwe Schindler <uwe@thetaphi.de> wrote:
>>
>> To find all errors in an index, you should pass -ea to the java command
>>> line to enable assertions.
>>>
>>
>> +1
>>
>> Tempting to make CheckIndex demand that :) Or at least, slow you down
>> and make it clear why, if assertions are disabled.
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
Re: Index corruption and repair [ In reply to ]
Hi Mike,

Any updates?

Regards,
Antony

On Wed, 11 May 2022 at 01:02, Antony Joseph <antony.dev.webmail@gmail.com>
wrote:

> Hello Mike,
>
> 1. As requested, the full checkindex log is attached.
>
> 2. We haven't made any changes to the IndexDeletionPolicy - so the
> assumption is the default policy is being used.
>
> 3. No, we are not using near-real-time readers. We are using filesystem
> based readers only (passing Directory
> to DirectoryReader.open) as below:
>
> index_directory = FSDirectory.open(Paths.get(index_path))
> if DirectoryReader.indexExists(index_directory): # valid index or not
> reader = DirectoryReader.open(index_directory)
>
> 4. Index readers are checked every 10 seconds as to whether they should be
> reopened or not (if the index has changed). Like this:
>
> (reader is an IndexReader)
>
> if not reader.isCurrent():
> i_reader = DirectoryReader.openIfChanged(reader)
>
> Yes, 6.5.0 is old, but for now we are constrained to use it; till we can
> come up with a plan to upgrade (it'll involve reindexing a lot of data).
>
> If you think it would help you, I could also share with you the simple
> flow of our application which includes index, update, delete of the
> documents.
>
> By the way the users say no Antivirus/Microsoft Defender. Besides before
> upgrading to lucene 6.5.0 the same application with lucene 4.10 was running
> fine on the same system.
>
> Thanks for your assistance.
>
> Regards,
> Antony
>
> On Thu, 5 May 2022 at 20:06, Michael McCandless <lucene@mikemccandless.com>
> wrote:
>
>> Antony, do you maybe have Microsoft Defender turned on, which might
>> quarantine files that it suspects are malicious? I'm not sure if it is on
>> by default these days on modern Windows boxes ...
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Thu, May 5, 2022 at 10:34 AM Michael McCandless <
>> lucene@mikemccandless.com> wrote:
>>
>>> On Thu, May 5, 2022 at 10:30 AM Uwe Schindler <uwe@thetaphi.de> wrote:
>>>
>>> To find all errors in an index, you should pass -ea to the java command
>>>> line to enable assertions.
>>>>
>>>
>>> +1
>>>
>>> Tempting to make CheckIndex demand that :) Or at least, slow you down
>>> and make it clear why, if assertions are disabled.
>>>
>>> Mike McCandless
>>>
>>> http://blog.mikemccandless.com
>>>
>>>