Mailing List Archive

Using Lucene in a production environment
I have a couple of questions on using Lucene in a production environment:

1) Are updates to an index "transactional" in nature? In other words, can
an index ever get into an inconsistent/corrupt state by killing a writing
process? If so, is there a way to detect and fix this condition?

2) Can you describe where the synchronization points/locks are applied and
released? I would like to make sure I avoid any situations that might cause
performance degradation or deadlock in my application.

Thanks!
Scott
RE: Using Lucene in a production environment [ In reply to ]
> From: Scott Ganyo [mailto:scott.ganyo@eTapestry.com]
>
> I have a couple of questions on using Lucene in a production
> environment:
>
> 1) Are updates to an index "transactional" in nature? In
> other words, can
> an index ever get into an inconsistent/corrupt state by
> killing a writing
> process?

No, it should not be possible to put an index in an inconsistent or corrupt
state. The only problem might be that, after a crash, lock files may need
to be manually removed.

> 2) Can you describe where the synchronization points/locks
> are applied and
> released? I would like to make sure I avoid any situations
> that might cause
> performance degradation or deadlock in my application.

The primary synchronization point is during calls to IndexReader.open() and
IndexWriter.close(). Only one thread may be in this at a time. For that
reason, and for others, one should re-use IndexReader instances. Since
IndexReader is thread-safe, this is not hard. One IndexReader per index is
all that you should need at a time. When the index changes, you should
create a new IndexReader. The IndexReader.lastModified() method is designed
to make this easy. The typical use should be to cache a single index reader
per index, check to see if it is out of date each time the cache is
accessed, and replace it when it is.

IndexWriter.close() commits changes. Aborting without closing will leave
the index locked but otherwise consistent.

Doug
RE: Using Lucene in a production environment [ In reply to ]
I found the following information very useful, and I suggest it'd be put
somewhere in the documentation.

Regards
Anders Nielsen


-----Original Message-----
From: Doug Cutting [mailto:DCutting@grandcentral.com]
Sent: 2. oktober 2001 23:49
To: 'lucene-user@jakarta.apache.org'
Subject: RE: Using Lucene in a production environment

[..]

The primary synchronization point is during calls to IndexReader.open() and
IndexWriter.close(). Only one thread may be in this at a time. For that
reason, and for others, one should re-use IndexReader instances. Since
IndexReader is thread-safe, this is not hard. One IndexReader per index is
all that you should need at a time. When the index changes, you should
create a new IndexReader. The IndexReader.lastModified() method is designed
to make this easy. The typical use should be to cache a single index reader
per index, check to see if it is out of date each time the cache is
accessed, and replace it when it is.

IndexWriter.close() commits changes. Aborting without closing will leave
the index locked but otherwise consistent.

Doug
RE: Using Lucene in a production environment [ In reply to ]
Thanks, Doug. That's exactly what I was looking for.

Scott

> -----Original Message-----
> From: Doug Cutting [mailto:DCutting@grandcentral.com]
> Sent: Tuesday, October 02, 2001 4:49 PM
> To: 'lucene-user@jakarta.apache.org'
> Subject: RE: Using Lucene in a production environment
>
>
> > From: Scott Ganyo [mailto:scott.ganyo@eTapestry.com]
> >
> > I have a couple of questions on using Lucene in a production
> > environment:
> >
> > 1) Are updates to an index "transactional" in nature? In
> > other words, can
> > an index ever get into an inconsistent/corrupt state by
> > killing a writing
> > process?
>
> No, it should not be possible to put an index in an
> inconsistent or corrupt
> state. The only problem might be that, after a crash, lock
> files may need
> to be manually removed.
>
> > 2) Can you describe where the synchronization points/locks
> > are applied and
> > released? I would like to make sure I avoid any situations
> > that might cause
> > performance degradation or deadlock in my application.
>
> The primary synchronization point is during calls to
> IndexReader.open() and
> IndexWriter.close(). Only one thread may be in this at a
> time. For that
> reason, and for others, one should re-use IndexReader
> instances. Since
> IndexReader is thread-safe, this is not hard. One
> IndexReader per index is
> all that you should need at a time. When the index changes,
> you should
> create a new IndexReader. The IndexReader.lastModified()
> method is designed
> to make this easy. The typical use should be to cache a
> single index reader
> per index, check to see if it is out of date each time the cache is
> accessed, and replace it when it is.
>
> IndexWriter.close() commits changes. Aborting without
> closing will leave
> the index locked but otherwise consistent.
>
> Doug
>