Mailing List Archive

Making Lucene Transactional
That's interesting. So it would be a very small change to add transactional
(and even 2-phase commit) capabilities to the writer? What about deletes?
Since they use the reader, would it still be possible to allow a 2-phase
commit/abort on that?

I would very much like to have a 2-phase commit in Lucene in order to ensure
that it is always in sync with my database. I always thought that I'd end
up having to write custom code to store the Lucene index in the database,
but maybe that wouldn't be necessary...?

Scott

> -----Original Message-----
> From: Doug Cutting [mailto:cutting@lucene.com]
> Sent: Thursday, June 27, 2002 10:36 AM
> To: Lucene Users List
> Subject: Re: Stress Testing Lucene
>
>
> It's very hard to leave an index in a bad state. Updating the
> "segments" file atomically updates the index. So the only way to
> corrupt things is to only partly update the segments file.
> But that too
> is hard, since it's first written to a temporary file, which is then
> renamed "segments". The only vulnerability I know if is that
> in Java on
> Win32 you can't atomically rename a file to something that already
> exists, so Lucene has to first remove the old version. So if
> you were
> to crash between the time that the old version of "segments"
> is removed
> and the new version is moved into place, then the index would be
> corrupt, because it would have no "segments" file.
>
> Doug
Re: Making Lucene Transactional [ In reply to ]
> That's interesting. So it would be a very small change to add transactional
> (and even 2-phase commit) capabilities to the writer? What about deletes?
> Since they use the reader, would it still be possible to allow a 2-phase
> commit/abort on that?

I think you're not using "transactional" in the same sense as Doug is.

Very few file systems are transactional, although some offer a small
number of atomic operations, such as rename. This doesn't make them
transactional, but it allows application writers (that's us) to write
apps that are _less likely_ to be victimized by system failure. But
Lucene still writes blocks to disk via the file system, without a
transaction log, and since disk drivers do things like defer or
reorder disk writes, we could still lose if the system crashed at the
wrong time. Still, we do a lot to reduce this risk beyond that of
most file-based applications.

> I would very much like to have a 2-phase commit in Lucene in order to ensure
> that it is always in sync with my database. I always thought that I'd end
> up having to write custom code to store the Lucene index in the database,
> but maybe that wouldn't be necessary...?

Two phase commit is a whole different beast; this involves
coordinating multiple transactional resource managers (which Lucene
isn't) with a separate transaction monitor, using a protocol such as
XA or OTS. We're nowhere near that.

Storing the index in a database would be a good start, although the
Directory interface is really derived with the assumptions of a file
system. Still, that would not get us all the way there -- you'd need
to introduce transaction demarcation methods into the Lucene API, so
that these could be passed to the DBDirectory, so we would know what
groups of updates should be considered atomic.

And that still doesn't get us close to 2PC; we'd still have to support
XA for that, and I don't see any good reason to undertake that level
of effort at this point.

However, I think revisiting Directory with an eye towards making it
something that can be efficiently implemented on either a DB or a file
system would be worthwhile.

> > -----Original Message-----
> > From: Doug Cutting [mailto:cutting@lucene.com]
> > Sent: Thursday, June 27, 2002 10:36 AM
> > To: Lucene Users List
> > Subject: Re: Stress Testing Lucene
> >
> >
> > It's very hard to leave an index in a bad state. Updating the
> > "segments" file atomically updates the index. So the only way to
> > corrupt things is to only partly update the segments file.
> > But that too
> > is hard, since it's first written to a temporary file, which is then
> > renamed "segments". The only vulnerability I know if is that
> > in Java on
> > Win32 you can't atomically rename a file to something that already
> > exists, so Lucene has to first remove the old version. So if
> > you were
> > to crash between the time that the old version of "segments"
> > is removed
> > and the new version is moved into place, then the index would be
> > corrupt, because it would have no "segments" file.
> >
> > Doug

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
Re: Making Lucene Transactional [ In reply to ]
>
>
>
>
>>>-----Original Message-----
>>>From: Doug Cutting [mailto:cutting@lucene.com]
>>>Sent: Thursday, June 27, 2002 10:36 AM
>>>To: Lucene Users List
>>>Subject: Re: Stress Testing Lucene
>>>
>>>
>>>
>>>It's very hard to leave an index in a bad state. Updating the
>>>"segments" file atomically updates the index. So the only way to
>>>corrupt things is to only partly update the segments file.
>>>But that too
>>>is hard, since it's first written to a temporary file, which is then
>>>renamed "segments".
>>>
We could further protect against this one by writing a checksum of some
sort at the end of the segments file and then re-reading it and
verifying the checksum before renaming the temporary segments file to
"segments". This way we'll know that only fully written segments files
are made active.
The checksum can also be used to verify integrity of the other index
segment components. I guess there is always a chance that the disk
driver is caching the writes.

>>>The only vulnerability I know if is that
>>>in Java on
>>>Win32 you can't atomically rename a file to something that already
>>>exists, so Lucene has to first remove the old version. So if
>>>you were
>>>to crash between the time that the old version of "segments"
>>>is removed
>>>and the new version is moved into place, then the index would be
>>>corrupt, because it would have no "segments" file.
>>>
Perhaps we could also protect against this one by simply removing the
old segments file (is that atomic by itself?) and then letting the next
IndexReader look for the temporary file when it sees that there is no
"segments" file and rename it. There might be a case where two competing
IndexReaders do the "segments" file check at the same time, find that it
is not there, go after the "segments.tmp" and try to rename it. But in
this case only the first one will succeed and the following one will
find that the "segments.tmp" is no longer there (or that another
"segments" file already exists), in which case it should look for the
"segments" file again and proceed.

Would these two changes make the index at least as reliable as the disk
driver?
Dmitry.

>>>
>>>Doug
>>>
>>>
>
>--
>To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
>For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
>
>
>
>




--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
RE: Making Lucene Transactional [ In reply to ]
I think that much of the goal can be accomplished with a much smaller effort
than you are suggesting by making a couple of simplifying assumptions:

1) Assume the filesystem is stable. There are ways to accomplish that
outside of Lucene anyway.

2) Assume write transactions will be serialized. The removes any need for
complex write locking strategies.

3) Assume that any transaction monitoring would be outside of Lucene.

What that leaves us with, then, is merely adding to Lucene the ability to
execute the following: begin(), commit(), abort()... and possibly prepare().

I don't see much of a problem implementing these semantics for adding
documents as the IndexWriter as it pretty much follows a transactional
pattern with a very low probability of failure. Therefore, the commit
semantics are merely a rename/no rename decision on the new segments file.

Deletions, on the other hand, seem more problematic. First of all there is
the asymetry of having the delete on the IndexReader. In fact, the need for
serialized control of write/delete access caused me to write my
application's interface to Lucene to go through only two access points
(IndexSearcher and IndexWriter) and force access to the delete() method
through the IndexWriter. Even doing that, though, I don't think the
document deletion process currently has the capability to batch up its
changes and commit them. This would need to be added.

Finally, the additions and deletions would need to be coordinated to allow
both types of changes under a transaction.

So, yes, there's some work that would have to be done, but I'm not at all
convinced that it would be prohibitively challenging. Did I miss anything?

Thanks,
Scott


> -----Original Message-----
> From: Brian Goetz [mailto:brian@quiotix.com]
> Sent: Friday, June 28, 2002 9:45 AM
> To: Lucene Developers List
> Subject: Re: Making Lucene Transactional
>
>
> > That's interesting. So it would be a very small change to
> add transactional
> > (and even 2-phase commit) capabilities to the writer? What
> about deletes?
> > Since they use the reader, would it still be possible to
> allow a 2-phase
> > commit/abort on that?
>
> I think you're not using "transactional" in the same sense as Doug is.
>
> Very few file systems are transactional, although some offer a small
> number of atomic operations, such as rename. This doesn't make them
> transactional, but it allows application writers (that's us) to write
> apps that are _less likely_ to be victimized by system failure. But
> Lucene still writes blocks to disk via the file system, without a
> transaction log, and since disk drivers do things like defer or
> reorder disk writes, we could still lose if the system crashed at the
> wrong time. Still, we do a lot to reduce this risk beyond that of
> most file-based applications.
>
> > I would very much like to have a 2-phase commit in Lucene
> in order to ensure
> > that it is always in sync with my database. I always
> thought that I'd end
> > up having to write custom code to store the Lucene index in
> the database,
> > but maybe that wouldn't be necessary...?
>
> Two phase commit is a whole different beast; this involves
> coordinating multiple transactional resource managers (which Lucene
> isn't) with a separate transaction monitor, using a protocol such as
> XA or OTS. We're nowhere near that.
>
> Storing the index in a database would be a good start, although the
> Directory interface is really derived with the assumptions of a file
> system. Still, that would not get us all the way there -- you'd need
> to introduce transaction demarcation methods into the Lucene API, so
> that these could be passed to the DBDirectory, so we would know what
> groups of updates should be considered atomic.
>
> And that still doesn't get us close to 2PC; we'd still have to support
> XA for that, and I don't see any good reason to undertake that level
> of effort at this point.
>
> However, I think revisiting Directory with an eye towards making it
> something that can be efficiently implemented on either a DB or a file
> system would be worthwhile.
>
> > > -----Original Message-----
> > > From: Doug Cutting [mailto:cutting@lucene.com]
> > > Sent: Thursday, June 27, 2002 10:36 AM
> > > To: Lucene Users List
> > > Subject: Re: Stress Testing Lucene
> > >
> > >
> > > It's very hard to leave an index in a bad state. Updating the
> > > "segments" file atomically updates the index. So the only way to
> > > corrupt things is to only partly update the segments file.
> > > But that too
> > > is hard, since it's first written to a temporary file,
> which is then
> > > renamed "segments". The only vulnerability I know if is that
> > > in Java on
> > > Win32 you can't atomically rename a file to something
> that already
> > > exists, so Lucene has to first remove the old version. So if
> > > you were
> > > to crash between the time that the old version of "segments"
> > > is removed
> > > and the new version is moved into place, then the index would be
> > > corrupt, because it would have no "segments" file.
> > >
> > > Doug
>
> --
> To unsubscribe, e-mail:
> <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-dev-help@jakarta.apache.org>
>
Re: Making Lucene Transactional [ In reply to ]
> I think that much of the goal can be accomplished with a much smaller effort
> than you are suggesting by making a couple of simplifying assumptions:
>
> 1) Assume the filesystem is stable. There are ways to accomplish that
> outside of Lucene anyway.
>
> 2) Assume write transactions will be serialized. The removes any need for
> complex write locking strategies.

But these assumptions are not valid.

Now, if you want to talk about introducing a concept of "batched updates"
into Lucene, where a batch is applied atomically, that could be a useful
improvement. But to pretend we offer transactional semantics when we
don't just seems silly.

> So, yes, there's some work that would have to be done, but I'm not at all
> convinced that it would be prohibitively challenging. Did I miss anything?

It could just be terminology, but I dislike describing something as if
it has transactional semantics when it doesn't. And given that the
file system is simply not transactional, anything built on top of it
will not be, either.

There's nothing wrong with trying to make Lucene _more_ stable, _less_
likely to get corrupted if something bad happens, etc. But this is
not making it transactional. ANd talking about two-phase commit implies
that it works with an outside transaction monitor.

We're already doing much, much better than most search engines because
all additions are done by creating new segments, so as long as rename
is atomic, users will not see an inconsistent state. However, in the
case of disk failure, we're going to be subject to the same risks as
any other file-based application unless we implement a transaction
log.

I do like the idea of grouping updates and making them visible
atomically as a group, if that's not a lot of additional work.




--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
RE: Making Lucene Transactional [ In reply to ]
How about this? I'll admit it punts a little, but I still think it could be
a working model:

A-tomicity - A single call to the file system would commit the transaction.
In the case of an IndexWriter, calling close() already does this with a
simple rename operation (at least on Unix). Adding an abort() would throw
away the new files. Not yet sure of how to achieve this once Document
deletes are thrown in the mix...

C-onsistency - Document deletes must somehow be tracked and applied in a
single operation along with any Document adds. Again, though, I'm not sure
of how the deletes could be accomplished with the current file format.

I-solation - Just force write transactions to be serialize. Lucene does
this with IndexWriters anyway. We could enforce a one-to-one relationship
between transactions and IndexWriters...

D-urability - Lucene would attempt to do its best. Once it is written to
the disk, however, it is outside of Lucene's domain. Wouldn't a journaled
filesystem take care of this?

Scott

> -----Original Message-----
> From: Brian Goetz [mailto:brian@quiotix.com]
> Sent: Friday, June 28, 2002 1:58 PM
> To: Lucene Developers List
> Subject: Re: Making Lucene Transactional
>
>
> > I think that much of the goal can be accomplished with a
> much smaller effort
> > than you are suggesting by making a couple of simplifying
> assumptions:
> >
> > 1) Assume the filesystem is stable. There are ways to
> accomplish that
> > outside of Lucene anyway.
> >
> > 2) Assume write transactions will be serialized. The
> removes any need for
> > complex write locking strategies.
>
> But these assumptions are not valid.
>
> Now, if you want to talk about introducing a concept of
> "batched updates"
> into Lucene, where a batch is applied atomically, that could
> be a useful
> improvement. But to pretend we offer transactional semantics when we
> don't just seems silly.
>
> > So, yes, there's some work that would have to be done, but
> I'm not at all
> > convinced that it would be prohibitively challenging. Did
> I miss anything?
>
> It could just be terminology, but I dislike describing something as if
> it has transactional semantics when it doesn't. And given that the
> file system is simply not transactional, anything built on top of it
> will not be, either.
>
> There's nothing wrong with trying to make Lucene _more_ stable, _less_
> likely to get corrupted if something bad happens, etc. But this is
> not making it transactional. ANd talking about two-phase
> commit implies
> that it works with an outside transaction monitor.
>
> We're already doing much, much better than most search engines because
> all additions are done by creating new segments, so as long as rename
> is atomic, users will not see an inconsistent state. However, in the
> case of disk failure, we're going to be subject to the same risks as
> any other file-based application unless we implement a transaction
> log.
>
> I do like the idea of grouping updates and making them visible
> atomically as a group, if that's not a lot of additional work.
>
>
>
>
> --
> To unsubscribe, e-mail:
> <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-dev-help@jakarta.apache.org>
>
Re: Making Lucene Transactional [ In reply to ]
Scott Ganyo wrote:
> How about this? I'll admit it punts a little, but I still think it could be
> a working model:
>
> A-tomicity - A single call to the file system would commit the transaction.
> In the case of an IndexWriter, calling close() already does this with a
> simple rename operation (at least on Unix). Adding an abort() would throw
> away the new files. Not yet sure of how to achieve this once Document
> deletes are thrown in the mix...

The new files will be silently overwritten, so, strictly speaking, abort
does not need to delete them. The only thing abort would need to do is
remove the lock file: removing the new files would be a courtesy.

> C-onsistency - Document deletes must somehow be tracked and applied in a
> single operation along with any Document adds. Again, though, I'm not sure
> of how the deletes could be accomplished with the current file format.

Currently deletions are represented as an (optional) bit-vector file in
each segment index indicating which files are deleted. When segments
are merged, data for deleted documents is dropped, and the new index
created has no deletions file.

To implement your proposal I think I would move deletions to a global
bit vector file that is named in the "segments" file. That way the
atomic action of installing a new "segments" file would also update the
deletions. This would be a little tricky, since this vector must be
updated whenever segments are merged. In particular, when a segment
with deletions is merged, the deletions vector is shortened, and bits in
the vector must be shifted down. This adds a factor proportional to the
size of the index to every merge, which is bad, but the bit shifting is
probably fast enough that this would not be an issue.

Alternately, one could construct a file of "links" to the current
deletions file for each segment, and point to this "links" file from the
"segments" file. That would enable atomic updates of deletions along
with everything else, but also keep deletions files per segment.

> I-solation - Just force write transactions to be serialize. Lucene does
> this with IndexWriters anyway. We could enforce a one-to-one relationship
> between transactions and IndexWriters...

Doesn't the lock file do this already?

> D-urability - Lucene would attempt to do its best. Once it is written to
> the disk, however, it is outside of Lucene's domain. Wouldn't a journaled
> filesystem take care of this?

I don't think this is Lucene's responsibility. Lucene should be able to
assume a non-corrupt filesystem.

Doug



--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
RE: Making Lucene Transactional [ In reply to ]
>D-urability - Lucene would attempt to do its best. Once it is written to
>the disk, however, it is outside of Lucene's domain. Wouldn't a journaled
>filesystem take care of this?

Not necessarily. Most journaled file systems only handle updates to file
system metadata, not file contents.


--
Brian Goetz
Quiotix Corporation
brian@quiotix.com Tel: 650-843-1300 Fax: 650-324-8032

http://www.quiotix.com


--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>