Mailing List Archive

Bug in new Index Lock File code ?
First, I would like to appologise for this message being so long, but I have
tried to provide sufficient information for somone to potentialy help me
diagnose my problem - this being that using the latest build of lucene causes
corruption of my indexes, whereas earlier releases of lucene from sourceforge
have worked fine.

I'm afraid I can only describe the symptoms of my problem (see below) as I
haven't been able to pin-point the actual cause; however I suspect the problem
is within the thread safety/index locking changes introduced recently (and
included within the jakarta release lucene-1.2-rc1).

In particular, I have noticed that the method FSDirectory.getDirectory(File
file, boolean create) does not take the 'create' parameter into account when
choosing whether to erase existing contents, as suggested by the supporting
javaDocs comments. I have attached the the relevant code extract below.

I am suprised that no-one else has reported my problem, perhaps no-one else has
upgraded their version of lucene or carried out detailed tests since the
upgrade ? Or perhaps everyone else re-starts their java server (JVM) when
rebuilding/updating their indexes ?

I am able to re-create the same problems each time I follow the 3 steps
described below. I am a bit lost at the moment as to what to investigate to
try to track down the cause of my problematic files within my index directory.

Does anyone else keep their JVM running continuously in between creating &
updating indexes ? If so, have you encountered problems updating indexes using
the latest code ? (Scott Ganyo: perhaps you have a similar set up to myself
since you reported the Exception "java.io.IOException: /index/_1x7f.fnm
already exists" last month ?).

If anyone has a good understanding of how & when the numerous index files are
generated/updated, are you able to give me any tips on what to look into to try
to identify the cause of my problem.


Any help gratefully received.



Joanne

================================================================


Symptoms
---------

Step 1 - Rebuild index from scratch
-----------------------------------
If I re-build an index from scratch; the first time around, all files from my
index directory are deleted and new ones are successfully generated, allowing
me to successfully search and retrieve documents. The index directory contains
all the expected files e.g. .f1, .f2, .fdt, .fnm etc.

Step 2 - modify data & update Index
------------------------------------
Without re-starting the java server, I then update a single document and choose
to 'update' (not rebuild) an index. Then my application 'appears' to have
successfully completed the tesk i.e. no exceptions are reported (or perhaps I
am just not trapping them ?). However when I attempt to search for a new word
added to the updated document, the 'hit' is not reported. If I search for a
word that was removed from the document the 'hit' is still reported.

When I take a closer look within the index directory I see several anomolies ...
.
I have TWO copies of the following files : .f1, .f2, .f4, .f5, .f7, .fdt, .fdx,
.frq, .prx, .tis (having the segment prefix _i and _k). However, I only have
one copy of deleteble, segments, .fnm and .tii.

Hence it's as though the index data has not been replaced, although the .fnm
and .tii have been updated.

Step 3 - Attempt to rebuild now file has been updated
-----------------------------------------------------
Again without stopping the server, I tried to re-build the index from scratch
again. This time, the existing files are not deleted (this is due to the code
included below, since the FSDirectory instance dir is not null). Again, no
exceptions are reported (or perhaps trapped); so the task 'appears' to have
completed successfully. The index directory contains the old _k. segment files
.f1, .f2, .f4, .f5, .f7, .fdt, .fdx, .fnm, .frq, .fnm, .frq, .prx, .tii, .tis
and now contains just 2 new segment files _i.tii and _i.fnm (along with updated
files deletable and segments).

When I attempt to search this index, an exception is raised from the call to
IndexReader.open(dir) with the message "<index_di>\_i.fdt (The system cannot
find the file specified)".



Additional Info
---------------
If I restart the JVM in between each of the above steps, then no problems are
encountered; I guess because the problematic classes are re-initialized each
time around.

Before I upgraded to the latest code which introduced the .lock files; I
experienced a different error message (also reported by Scott Ganyo on 27 Sep).
Previously the exception "java.io.IOException: /<index dir>/_i.fnm already
exists was generated when attempting to re-build or update. This Exception is
no longer thrown using the most recent changes as this check has been removed
from the method FSOutputStream.FSOutputStream(File path).




***** Extract from the class FSDirectory ********

/** This cache of directories ensures that there is a unique Directory
* instance per path, so that synchronization on the Directory can be used to
* synchronize access between readers and writers.
*
* This should be a WeakHashMap, so that entries can be GC'd, but that would
* require Java 1.2. Instead we use refcounts... */
private static final Hashtable DIRECTORIES = new Hashtable();

:
:

/** Returns the directory instance for the named location.
*
* <p>Directories are cached, so that, for a given canonical path, the same
* FSDirectory instance will always be returned. This permits
* synchronization on directories.
*
* @param file the path to the directory.
* @param create if true, create, or erase any existing contents.
* @returns the FSDirectory for the named file. */
public static FSDirectory getDirectory(File file, boolean create)
throws IOException {
file = new File(file.getCanonicalPath());
FSDirectory dir;
synchronized (DIRECTORIES) {
dir = (FSDirectory)DIRECTORIES.get(file);

/* JSproston : a second rebuild will not create a *new* FSDirectory
since dir will NOT be null, thus even if create is true, existing
contents will not be erased. */

if (dir == null) {
dir = new FSDirectory(file, create);
DIRECTORIES.put(file, dir);
}
}
synchronized (dir) {
dir.refCount++;
}
return dir;
}

***** End of Extract ********
RE: Bug in new Index Lock File code ? [ In reply to ]
Joanne,

It looks like you have spotted a bug in FSDirectory. However I am not sure
if it is what is causing the problems you are seeing. I just checked a fix
for this into CVS. Please try this newe version and tell me how things go.

One thing that might be confusing you is the file names in your index. A
few facts: First, there is only ever one version of 'segments' and
'deleteable'. These are global files, not per-index. Second, segment names
are integers in base 36 prefixed by an underscore, and are created in order.
So segment _i precedes segment _k. Third, it is possible on Win32 to have
incomplete segments remain for old segments. On Win32 an open file cannot
be deleted, so Lucene keeps the list of names of files that it would like to
delete in the 'deleteable' file.

Doug