Mailing List Archive

Indexes: locks, files, file contents
When creating a new empty index:

IndexWriter writer = new IndexWriter("myindex", null, true);
writer.close();

Both a write.lock and commit.lock is obtained.

1. What is the purpose of this double locking?
2. When else is write.lock and commit.lock used?
3. Any other locks used?

What I am looking for are some simple statements about locks. For example:
a. When creating or adding documents to an index a write.lock is always obtained.
b. The commit.lock is used when ...


After the creation of an empty index, the directory contains a single "segments" file which is 8 bytes long. The first four bytes contain a counter used in naming segments, the second four bytes contains an integer indicating how many segment's summary information is stored in the segments file.

4. Why keep a separate counter and size variables?
5. Is there a max number of segment infos whose summary (name and document count) is stored in the segments file?

When I then add a document to the index...

String filePath = "C:\\mydocs\\fox.txt";

IndexWriter writer = new IndexWriter("myindex", new StopAnalyzer(), false);
InputStream is = new FileInputStream(filePath);

Document doc = new Document();
doc.add(Field.UnIndexed("path", filePath));
doc.add(Field.Text("body", (Reader) new InputStreamReader(is)));

writer.addDocument(doc);
is.close();

writer.close();


I get the following new files:

deletable
_1.f1
_1.fdt
_1.fdx
_1.fnm
_1.frq
_1.prx
_1.tii
_1.tis


6. What is the content and format of each of these files?
7. What if I open another IndexWriter and write a second document to the index and then close the IndexWriter? What files will be added or modified?
8. What happens when I then open an IndexWriter and optimize the index and then close the writer? What files will be added or modified?


Thanks for your help.

Mark

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
RE: Indexes: locks, files, file contents [ In reply to ]
> From: Mark Tucker [mailto:MTucker@infoimage.com]
>
> When creating a new empty index:
>
> IndexWriter writer = new IndexWriter("myindex", null, true);
> writer.close();
>
> Both a write.lock and commit.lock is obtained.
>
> 1. What is the purpose of this double locking?
> 2. When else is write.lock and commit.lock used?

The write.lock is used to keep processes from concurrently attempting to
modify an index. It is obtained by an IndexWriter while it is open, and by
an IndexReader once documents have been deleted and until it is closed.

The commit.lock is used to coordinate the contents of the 'segments' file
with the files in the index. It is obtained by an IndexReader before it
reads the 'segments' file, which names all of the other files in the index,
and until the IndexReader has opened all of these other files. The
commit.lock is also obtained by the IndexWriter when it is about to write
the segments file and until it has finished trying to delete obsolete index
files. The commit.lock should thus never be held for long, since while it
is obtained files are only opened or deleted, and one small file is read or
written.

> 3. Any other locks used?

No.

> What I am looking for are some simple statements about locks.
> For example:
> a. When creating or adding documents to an index a
> write.lock is always obtained.
> b. The commit.lock is used when ...

Do the above suffice?

> After the creation of an empty index, the directory contains
> a single "segments" file which is 8 bytes long. The first
> four bytes contain a counter used in naming segments, the
> second four bytes contains an integer indicating how many
> segment's summary information is stored in the segments file.
>
> 4. Why keep a separate counter and size variables?

Every segment has a unique name. The counter is used to generate new names.
The size indicates how many segments currently exist.

> 5. Is there a max number of segment infos whose summary
> (name and document count) is stored in the segments file?

All segments in the current index are listed in the segments file. There is
no hard limit. The number of segments in an index has been discussed many
times before. For an un-optimized index it is proportional to the log of
the number of documents in the index. An optimized index contains a single
segment.

> When I then add a document to the index...
> I get the following new files:
>
> deletable
> _1.f1
> _1.fdt
> _1.fdx
> _1.fnm
> _1.frq
> _1.prx
> _1.tii
> _1.tis
>
> 6. What is the content and format of each of these files?

That's not currently documented outside of the code, although it has been
discussed previously on the mailing lists. These files together comprise
segment 1.

> 7. What if I open another IndexWriter and write a second
> document to the index and then close the IndexWriter? What
> files will be added or modified?

No files are modified. Files are added, the segments file is re-written,
and files are deleted. In particular you would probably see segment 3 added
and 1 deleted.

> 8. What happens when I then open an IndexWriter and optimize
> the index and then close the writer? What files will be
> added or modified?

All of the segments are merged into a single new segment.

Why not try and see?

Doug

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>