Mailing List Archive

[jira] Resolved: (LUCENE-1044) Behavior on hard power shutdown
[ https://issues.apache.org/jira/browse/LUCENE-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hoss Man resolved LUCENE-1044.
------------------------------

Resolution: Invalid
Fix Version/s: (was: 1.9)

> Behavior on hard power shutdown
> -------------------------------
>
> Key: LUCENE-1044
> URL: https://issues.apache.org/jira/browse/LUCENE-1044
> Project: Lucene - Java
> Issue Type: Bug
> Components: Index
> Environment: Windows Server 2003, Standard Edition, Sun Hotspot Java 1.5
> Reporter: venkat rangan
>
> When indexing a large number of documents, upon a hard power failure (e.g. pull the power cord), the index seems to get corrupted. We start a Java application as an Windows Service, and feed it documents. In some cases (after an index size of 1.7GB, with 30-40 index segment .cfs files) , the following is observed.
> The 'segments' file contains only zeros. Its size is 265 bytes - all bytes are zeros.
> The 'deleted' file also contains only zeros. Its size is 85 bytes - all bytes are zeros.
> Before corruption, the segments file and deleted file appear to be correct. After this corruption, the index is corrupted and lost.
> This is a problem observed in Lucene 1.4.3. We are not able to upgrade our customer deployments to 1.9 or later version, but would be happy to back-port a patch, if the patch is small enough and if this problem is already solved.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: [jira] Resolved: (LUCENE-1044) Behavior on hard power shutdown [ In reply to ]
Is the new autocommit=false code atomic (the new check point is
successfully made and moved to or its not)? If not I imagine it could be
made to be without too much work right?

Hoss Man (JIRA) wrote:
> [ https://issues.apache.org/jira/browse/LUCENE-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
>
> Hoss Man resolved LUCENE-1044.
> ------------------------------
>
> Resolution: Invalid
> Fix Version/s: (was: 1.9)
>
>
>> Behavior on hard power shutdown
>> -------------------------------
>>
>> Key: LUCENE-1044
>> URL: https://issues.apache.org/jira/browse/LUCENE-1044
>> Project: Lucene - Java
>> Issue Type: Bug
>> Components: Index
>> Environment: Windows Server 2003, Standard Edition, Sun Hotspot Java 1.5
>> Reporter: venkat rangan
>>
>> When indexing a large number of documents, upon a hard power failure (e.g. pull the power cord), the index seems to get corrupted. We start a Java application as an Windows Service, and feed it documents. In some cases (after an index size of 1.7GB, with 30-40 index segment .cfs files) , the following is observed.
>> The 'segments' file contains only zeros. Its size is 265 bytes - all bytes are zeros.
>> The 'deleted' file also contains only zeros. Its size is 85 bytes - all bytes are zeros.
>> Before corruption, the segments file and deleted file appear to be correct. After this corruption, the index is corrupted and lost.
>> This is a problem observed in Lucene 1.4.3. We are not able to upgrade our customer deployments to 1.9 or later version, but would be happy to back-port a patch, if the patch is small enough and if this problem is already solved.
>>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: [jira] Resolved: (LUCENE-1044) Behavior on hard power shutdown [ In reply to ]
: Is the new autocommit=false code atomic (the new check point is successfully
: made and moved to or its not)? If not I imagine it could be made to be without
: too much work right?

No matter what work we do in Java code to try and garuntee atomicity, the
JVM can't garuntee that File IO buffers are flushed unless the JVM is
shutdown cleanly, so i don't see how we could possibly make any claims
of atomicity in the event of hard process (or OS) termination.

i'd be happy to be proven wrong by someone who knows more about IO,
filesystems, and the JVM Specification.



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: [jira] Resolved: (LUCENE-1044) Behavior on hard power shutdown [ In reply to ]
Derby does not guarantee atomicity? Other java DB's? I thought they did,
but perhaps not. You cannot rig a simple system with some sort of rename
call?

If a hard power shutdown interrupted, on the next start the system could
ideally be left in a condition that uses the previous commit point, or
recognize the latest commit failed, clean it up, and go back to the
previous commit point. There must be ways to do it that don't rely on a
buffer being fully flushed, but rather on atomic file system operations
(of which there must be some that can be counted on in java?).

Chris Hostetter wrote:
> : Is the new autocommit=false code atomic (the new check point is successfully
> : made and moved to or its not)? If not I imagine it could be made to be without
> : too much work right?
>
> No matter what work we do in Java code to try and garuntee atomicity, the
> JVM can't garuntee that File IO buffers are flushed unless the JVM is
> shutdown cleanly, so i don't see how we could possibly make any claims
> of atomicity in the event of hard process (or OS) termination.
>
> i'd be happy to be proven wrong by someone who knows more about IO,
> filesystems, and the JVM Specification.
>
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: [jira] Resolved: (LUCENE-1044) Behavior on hard power shutdown [ In reply to ]
This is simple not true. See FileDescriptor.sync().

There are several options, but normally it is used so that when close
completes, all data must be on disk. This is a much slower way to
write data. It is very common in database systems when committing the
log file.

By using this, and proper file naming techniques full transaction
integrity can be GUARANTEED in Java.

See the BerkleyDB, or Derby for other examples.

On Nov 3, 2007, at 6:56 PM, Chris Hostetter wrote:

>
> : Is the new autocommit=false code atomic (the new check point is
> successfully
> : made and moved to or its not)? If not I imagine it could be made
> to be without
> : too much work right?
>
> No matter what work we do in Java code to try and garuntee
> atomicity, the
> JVM can't garuntee that File IO buffers are flushed unless the JVM is
> shutdown cleanly, so i don't see how we could possibly make any claims
> of atomicity in the event of hard process (or OS) termination.
>
> i'd be happy to be proven wrong by someone who knows more about IO,
> filesystems, and the JVM Specification.
>
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: [jira] Resolved: (LUCENE-1044) Behavior on hard power shutdown [ In reply to ]
See my previous email.

I have written several 'atomic' ACID systems in java. It is VERY EASY
TO DO. Read any introductory (academic) book on database systems.

On Nov 3, 2007, at 7:08 PM, Mark Miller wrote:

> Derby does not guarantee atomicity? Other java DB's? I thought they
> did, but perhaps not. You cannot rig a simple system with some sort
> of rename call?
>
> If a hard power shutdown interrupted, on the next start the system
> could ideally be left in a condition that uses the previous commit
> point, or recognize the latest commit failed, clean it up, and go
> back to the previous commit point. There must be ways to do it that
> don't rely on a buffer being fully flushed, but rather on atomic
> file system operations (of which there must be some that can be
> counted on in java?).
>
> Chris Hostetter wrote:
>> : Is the new autocommit=false code atomic (the new check point is
>> successfully
>> : made and moved to or its not)? If not I imagine it could be made
>> to be without
>> : too much work right?
>>
>> No matter what work we do in Java code to try and garuntee
>> atomicity, the JVM can't garuntee that File IO buffers are flushed
>> unless the JVM is shutdown cleanly, so i don't see how we could
>> possibly make any claims
>> of atomicity in the event of hard process (or OS) termination.
>>
>> i'd be happy to be proven wrong by someone who knows more about
>> IO, filesystems, and the JVM Specification.
>>
>>
>>
>> -Hoss
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: [jira] Resolved: (LUCENE-1044) Behavior on hard power shutdown [ In reply to ]
: This is simple not true. See FileDescriptor.sync().
:
: There are several options, but normally it is used so that when close
: completes, all data must be on disk. This is a much slower way to write data.
: It is very common in database systems when committing the log file.

Ok. I'll certainly take your word for it ... i've been trusting the docs
for [File]OutputStream.flush()...

>> If the intended destination of this stream is an abstraction provided
>> by the underlying operating system, for example a file, then flushing
>> the stream guarantees only that bytes previously written to the stream
>> are passed to the operating system for writing; it does not guarantee
>> that they are actually written to a physical device such as a disk drive.

I haven't looked at the internals of FileOutputStream or FileDescriptor on
any particular platforms to see how exactly they work, but if dealing with
the FD directly and using FD.sync() the magic bullet then I'd love to see
a patch that uses it in FSDirectory.

I assume the SyncFailedException it throws is rare? If it is always
thrown when using things like NFS that may be a show stopper for using
sync() in Lucene ... many people have jumped through a lot of hoops this
past year to get Lucene working on NFS; I'd hate to see all that work go
out the window in an effort to make Lucene ACID. (I suspect there are
more users interested in using Lucene on NFS then on using it as a
transactional data store)

>> Throws:
>> SyncFailedException - Thrown when the buffers cannot be flushed, or
>> because the system cannot guarantee that all the buffers have been
>> synchronized with physical media.

-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: [jira] Resolved: (LUCENE-1044) Behavior on hard power shutdown [ In reply to ]
I think it would be great if it where an option. Perhaps even a sandbox
implementation that could wrap or replace a few classes. Maybe that
complicates things too much, but that way you could just not use the
transaction system if you where on NFS (if that ends up being a problem)
or you didn't want to pay a performance cost.

Maybe a Derby guy would pitch in.

Chris Hostetter wrote:
> : This is simple not true. See FileDescriptor.sync().
> :
> : There are several options, but normally it is used so that when close
> : completes, all data must be on disk. This is a much slower way to write data.
> : It is very common in database systems when committing the log file.
>
> Ok. I'll certainly take your word for it ... i've been trusting the docs
> for [File]OutputStream.flush()...
>
>
>>> If the intended destination of this stream is an abstraction provided
>>> by the underlying operating system, for example a file, then flushing
>>> the stream guarantees only that bytes previously written to the stream
>>> are passed to the operating system for writing; it does not guarantee
>>> that they are actually written to a physical device such as a disk drive.
>>>
>
> I haven't looked at the internals of FileOutputStream or FileDescriptor on
> any particular platforms to see how exactly they work, but if dealing with
> the FD directly and using FD.sync() the magic bullet then I'd love to see
> a patch that uses it in FSDirectory.
>
> I assume the SyncFailedException it throws is rare? If it is always
> thrown when using things like NFS that may be a show stopper for using
> sync() in Lucene ... many people have jumped through a lot of hoops this
> past year to get Lucene working on NFS; I'd hate to see all that work go
> out the window in an effort to make Lucene ACID. (I suspect there are
> more users interested in using Lucene on NFS then on using it as a
> transactional data store)
>
>
>>> Throws:
>>> SyncFailedException - Thrown when the buffers cannot be flushed, or
>>> because the system cannot guarantee that all the buffers have been
>>> synchronized with physical media.
>>>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: [jira] Resolved: (LUCENE-1044) Behavior on hard power shutdown [ In reply to ]
I think using FD.sync() might have enabled the proper operation on
NFS in a much simpler way...

I argued quite a bit that the approach people were taking was
probably not correct, and/or a bug in NFS (which it seems there was
at least one that was corrected in a later release).

IMHO the work to make Lucene "work" on NFS was wasted - anyone using
NFS (and that knows the design of it) would say that it was against
the design.

They only needed to move Lucene into a "server" installation. Running
Lucene over NFS just isn't a good use of bandwidth or the architecture.

You ended up making Lucene way more complex for a very boundary
condition that had better solutions.

It is one of the reasons we have stuck with 1.9, and just merge
decently designed improvements.


On Nov 3, 2007, at 7:35 PM, Chris Hostetter wrote:

>
> : This is simple not true. See FileDescriptor.sync().
> :
> : There are several options, but normally it is used so that when
> close
> : completes, all data must be on disk. This is a much slower way to
> write data.
> : It is very common in database systems when committing the log file.
>
> Ok. I'll certainly take your word for it ... i've been trusting
> the docs
> for [File]OutputStream.flush()...
>
>>> If the intended destination of this stream is an abstraction
>>> provided
>>> by the underlying operating system, for example a file, then
>>> flushing
>>> the stream guarantees only that bytes previously written to the
>>> stream
>>> are passed to the operating system for writing; it does not
>>> guarantee
>>> that they are actually written to a physical device such as a
>>> disk drive.
>
> I haven't looked at the internals of FileOutputStream or
> FileDescriptor on
> any particular platforms to see how exactly they work, but if
> dealing with
> the FD directly and using FD.sync() the magic bullet then I'd love
> to see
> a patch that uses it in FSDirectory.
>
> I assume the SyncFailedException it throws is rare? If it is always
> thrown when using things like NFS that may be a show stopper for using
> sync() in Lucene ... many people have jumped through a lot of hoops
> this
> past year to get Lucene working on NFS; I'd hate to see all that
> work go
> out the window in an effort to make Lucene ACID. (I suspect there are
> more users interested in using Lucene on NFS then on using it as a
> transactional data store)
>
>>> Throws:
>>> SyncFailedException - Thrown when the buffers cannot be
>>> flushed, or
>>> because the system cannot guarantee that all the buffers have been
>>> synchronized with physical media.
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: [jira] Resolved: (LUCENE-1044) Behavior on hard power shutdown [ In reply to ]
I'll pull together a patch to allow FSDirectory/FSIndexOutput to
optionally do flush() & sync() before every close().

Lucene is already "atomic" (the "A" in ACID): it's very careful to do
the IO operations in order such that the index is never in an
inconsistent state ASSUMING the IO system completes or
fails-to-complete the operations *in order*.

The problem is, on a hard shutdown (kill -9 or JVM/machine crashes),
apparently future operations may have completed while some past
operations have not. For example, the new segments_N file was
successfully written while say the _X.fdx file of the just-flushed
segment was not successfully written, even though Lucene had written &
closed _X.fdx before segments_N.

It's this out-of-order completion of the IO operations on a hard
shutdown that leads to an inconsistent index.

Using autoCommit=false just means your commits are less frequent, and,
the "atomic" transaction is all changes done during the lifetime of
that one writer, versus all changes done since the last flush() when
autoCommit=true. However, autoCommit=false cannot fully eliminate the
chance of corruption due to out-of-order completion of IO operations.

It sounds like inserting a flush() then sync() call before every
close() *might* in fact force the IO system to attain in-order
completion of the operations Lucene sends it, at the cost of some
performance loss. I say *might* because there are so many layers to
an IO system that it's not clear that the fsync() that the JVM is in
fact calling (and relying on) will really always do the right thing.

The performance hit could in practice be low, especially if you are
using a large RAM buffer in your writer.

Mike

"Mark Miller" <markrmiller@gmail.com> wrote:
> I think it would be great if it where an option. Perhaps even a sandbox
> implementation that could wrap or replace a few classes. Maybe that
> complicates things too much, but that way you could just not use the
> transaction system if you where on NFS (if that ends up being a problem)
> or you didn't want to pay a performance cost.
>
> Maybe a Derby guy would pitch in.
>
> Chris Hostetter wrote:
> > : This is simple not true. See FileDescriptor.sync().
> > :
> > : There are several options, but normally it is used so that when close
> > : completes, all data must be on disk. This is a much slower way to write data.
> > : It is very common in database systems when committing the log file.
> >
> > Ok. I'll certainly take your word for it ... i've been trusting the docs
> > for [File]OutputStream.flush()...
> >
> >
> >>> If the intended destination of this stream is an abstraction provided
> >>> by the underlying operating system, for example a file, then flushing
> >>> the stream guarantees only that bytes previously written to the stream
> >>> are passed to the operating system for writing; it does not guarantee
> >>> that they are actually written to a physical device such as a disk drive.
> >>>
> >
> > I haven't looked at the internals of FileOutputStream or FileDescriptor on
> > any particular platforms to see how exactly they work, but if dealing with
> > the FD directly and using FD.sync() the magic bullet then I'd love to see
> > a patch that uses it in FSDirectory.
> >
> > I assume the SyncFailedException it throws is rare? If it is always
> > thrown when using things like NFS that may be a show stopper for using
> > sync() in Lucene ... many people have jumped through a lot of hoops this
> > past year to get Lucene working on NFS; I'd hate to see all that work go
> > out the window in an effort to make Lucene ACID. (I suspect there are
> > more users interested in using Lucene on NFS then on using it as a
> > transactional data store)
> >
> >
> >>> Throws:
> >>> SyncFailedException - Thrown when the buffers cannot be flushed, or
> >>> because the system cannot guarantee that all the buffers have been
> >>> synchronized with physical media.
> >>>
> >
> > -Hoss
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-dev-help@lucene.apache.org
> >
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: [jira] Resolved: (LUCENE-1044) Behavior on hard power shutdown [ In reply to ]
On 11/4/07, Michael McCandless <lucene@mikemccandless.com> wrote:
> The problem is, on a hard shutdown (kill -9 or JVM/machine crashes),
> apparently future operations may have completed while some past
> operations have not. For example, the new segments_N file was
> successfully written while say the _X.fdx file of the just-flushed
> segment was not successfully written, even though Lucene had written &
> closed _X.fdx before segments_N.

That should be impossible except for a machine crash. Kill -9 or a
JVM crash should have no effect on data already written.

But a sync option would be both simple and useful for people trying to
take live snapshots of an index, or to protect against machine
crashes. This isn't an absolute 100% guarantee either (so don't test
for it) - the drives often lie to the OS about data being flushed.
It's the best we can do at our level though.
http://www.google.com/search?q=fsync+drive+lies

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: [jira] Resolved: (LUCENE-1044) Behavior on hard power shutdown [ In reply to ]
Usually you can configure the drives so that sync() ALWAYS syncs -
drive jumpers, driver setup, or other methods. Some drives that are
battery backed and such do not need it.

Without sync() truly being a sync you could never write a database
that was resilient.

It will exact a heavier toll on performance that you might think. In
order to do it properly, all filesystem metadata must be sync;d as
well. The biggest difference is that you lose the degree of multi-
processing that is inherent when sync'ing is disabled - as the drive
(or OS) does the physical write asynchronously while the system does
other work - with sync() this is lost.

This is why in a db system, the only file that is sync'd is the log
file - all other files can be made "in sync" from the log file - and
this file is normally striped for optimum write performance. Some
systems have special "log file drives" (some even solid state, or
battery backed ram) to aid the performance.


On Nov 4, 2007, at 8:30 AM, Yonik Seeley wrote:

> On 11/4/07, Michael McCandless <lucene@mikemccandless.com> wrote:
>> The problem is, on a hard shutdown (kill -9 or JVM/machine crashes),
>> apparently future operations may have completed while some past
>> operations have not. For example, the new segments_N file was
>> successfully written while say the _X.fdx file of the just-flushed
>> segment was not successfully written, even though Lucene had
>> written &
>> closed _X.fdx before segments_N.
>
> That should be impossible except for a machine crash. Kill -9 or a
> JVM crash should have no effect on data already written.
>
> But a sync option would be both simple and useful for people trying to
> take live snapshots of an index, or to protect against machine
> crashes. This isn't an absolute 100% guarantee either (so don't test
> for it) - the drives often lie to the OS about data being flushed.
> It's the best we can do at our level though.
> http://www.google.com/search?q=fsync+drive+lies
>
> -Yonik
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: [jira] Resolved: (LUCENE-1044) Behavior on hard power shutdown [ In reply to ]
"Yonik Seeley" <yonik@apache.org> wrote:

> On 11/4/07, Michael McCandless <lucene@mikemccandless.com> wrote:
> > The problem is, on a hard shutdown (kill -9 or JVM/machine crashes),
> > apparently future operations may have completed while some past
> > operations have not. For example, the new segments_N file was
> > successfully written while say the _X.fdx file of the just-flushed
> > segment was not successfully written, even though Lucene had written &
> > closed _X.fdx before segments_N.
>
> That should be impossible except for a machine crash. Kill -9 or a
> JVM crash should have no effect on data already written.

OK, right. JVM crashing or getting killed should preserve
order-of-completion on the IO operations: those IO operations that
were handed off to the OS will eventually complete successfully.

But OS crashing, machine crashing or power-cord gets pulled can result
in out-of-order completion of IO operations, which is what can corrupt
the index.

> But a sync option would be both simple and useful for people trying to
> take live snapshots of an index, or to protect against machine
> crashes. This isn't an absolute 100% guarantee either (so don't test
> for it) - the drives often lie to the OS about data being flushed.
> It's the best we can do at our level though.
> http://www.google.com/search?q=fsync+drive+lies

Right, the best the OS can do is get all writes out to the drives, but
if the drives then cache the writes (in non-stable storage) then we
are still at risk.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: [jira] Resolved: (LUCENE-1044) Behavior on hard power shutdown [ In reply to ]
Even if we cannot guarantee durability, it would be nice if we could
guarantee a consistent index. It sounds like the only problem in a
machine with a lying drive is that you could lose a number of committed
transactions. I would much prefer that to a corrupted index. I can
always re-add what was lost much quicker than rebuilding a 5 million doc
archive. In either case, I have my choice between the two as long as the
index is guaranteed to be corruption free.

robert engels wrote:
> Usually you can configure the drives so that sync() ALWAYS syncs -
> drive jumpers, driver setup, or other methods. Some drives that are
> battery backed and such do not need it.
>
> Without sync() truly being a sync you could never write a database
> that was resilient.
>
> It will exact a heavier toll on performance that you might think. In
> order to do it properly, all filesystem metadata must be sync;d as
> well. The biggest difference is that you lose the degree of
> multi-processing that is inherent when sync'ing is disabled - as the
> drive (or OS) does the physical write asynchronously while the system
> does other work - with sync() this is lost.
>
> This is why in a db system, the only file that is sync'd is the log
> file - all other files can be made "in sync" from the log file - and
> this file is normally striped for optimum write performance. Some
> systems have special "log file drives" (some even solid state, or
> battery backed ram) to aid the performance.
>
>
> On Nov 4, 2007, at 8:30 AM, Yonik Seeley wrote:
>
>> On 11/4/07, Michael McCandless <lucene@mikemccandless.com> wrote:
>>> The problem is, on a hard shutdown (kill -9 or JVM/machine crashes),
>>> apparently future operations may have completed while some past
>>> operations have not. For example, the new segments_N file was
>>> successfully written while say the _X.fdx file of the just-flushed
>>> segment was not successfully written, even though Lucene had written &
>>> closed _X.fdx before segments_N.
>>
>> That should be impossible except for a machine crash. Kill -9 or a
>> JVM crash should have no effect on data already written.
>>
>> But a sync option would be both simple and useful for people trying to
>> take live snapshots of an index, or to protect against machine
>> crashes. This isn't an absolute 100% guarantee either (so don't test
>> for it) - the drives often lie to the OS about data being flushed.
>> It's the best we can do at our level though.
>> http://www.google.com/search?q=fsync+drive+lies
>>
>> -Yonik
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: [jira] Resolved: (LUCENE-1044) Behavior on hard power shutdown [ In reply to ]
Well, by calling sync() on every file before closing it (the patch in
LUCENE-1044), we should achieve this, albeit with a possibly sizable
loss of indexing performance (I'm testing that now...).

Though, I still can't figure out how to sync a directory from Java.

In the meantime ... one simple way to be robust to machine/OS crashes
is to keep more than just the last commit point alive in the index.
You just have to create a deletion policy that keeps all commit points
younger than X amount of time (ther is an example of this in Lucene's
TestDeletionPolicy unit test).

This way if the machine crashes and segments_N is not usable you'd
still have segments_N-1 (and maybe segments_N-2, ..., if they are new
enough) to fall back to.

However, I'm not sure how large X would need to be, in practice, for
all write caches to be properly flushed. And, this will necessarily
use more disk space in your index.

Mike


"Mark Miller" <markrmiller@gmail.com> wrote:
> Even if we cannot guarantee durability, it would be nice if we could
> guarantee a consistent index. It sounds like the only problem in a
> machine with a lying drive is that you could lose a number of committed
> transactions. I would much prefer that to a corrupted index. I can
> always re-add what was lost much quicker than rebuilding a 5 million doc
> archive. In either case, I have my choice between the two as long as the
> index is guaranteed to be corruption free.
>
> robert engels wrote:
> > Usually you can configure the drives so that sync() ALWAYS syncs -
> > drive jumpers, driver setup, or other methods. Some drives that are
> > battery backed and such do not need it.
> >
> > Without sync() truly being a sync you could never write a database
> > that was resilient.
> >
> > It will exact a heavier toll on performance that you might think. In
> > order to do it properly, all filesystem metadata must be sync;d as
> > well. The biggest difference is that you lose the degree of
> > multi-processing that is inherent when sync'ing is disabled - as the
> > drive (or OS) does the physical write asynchronously while the system
> > does other work - with sync() this is lost.
> >
> > This is why in a db system, the only file that is sync'd is the log
> > file - all other files can be made "in sync" from the log file - and
> > this file is normally striped for optimum write performance. Some
> > systems have special "log file drives" (some even solid state, or
> > battery backed ram) to aid the performance.
> >
> >
> > On Nov 4, 2007, at 8:30 AM, Yonik Seeley wrote:
> >
> >> On 11/4/07, Michael McCandless <lucene@mikemccandless.com> wrote:
> >>> The problem is, on a hard shutdown (kill -9 or JVM/machine crashes),
> >>> apparently future operations may have completed while some past
> >>> operations have not. For example, the new segments_N file was
> >>> successfully written while say the _X.fdx file of the just-flushed
> >>> segment was not successfully written, even though Lucene had written &
> >>> closed _X.fdx before segments_N.
> >>
> >> That should be impossible except for a machine crash. Kill -9 or a
> >> JVM crash should have no effect on data already written.
> >>
> >> But a sync option would be both simple and useful for people trying to
> >> take live snapshots of an index, or to protect against machine
> >> crashes. This isn't an absolute 100% guarantee either (so don't test
> >> for it) - the drives often lie to the OS about data being flushed.
> >> It's the best we can do at our level though.
> >> http://www.google.com/search?q=fsync+drive+lies
> >>
> >> -Yonik
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-dev-help@lucene.apache.org
> >>
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-dev-help@lucene.apache.org
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: [jira] Resolved: (LUCENE-1044) Behavior on hard power shutdown [ In reply to ]
Much of this depends on the file system. There are journaling
filesystems that will never be corrupted. They use similar techniques
as we are discussing for Lucene.

There are other ways of opening the file that control whether or not
the metadata (directory blocks, file length, etc.) is syncd as well
as the file data.

On Nov 4, 2007, at 10:23 AM, Michael McCandless wrote:

>
> Well, by calling sync() on every file before closing it (the patch in
> LUCENE-1044), we should achieve this, albeit with a possibly sizable
> loss of indexing performance (I'm testing that now...).
>
> Though, I still can't figure out how to sync a directory from Java.
>
> In the meantime ... one simple way to be robust to machine/OS crashes
> is to keep more than just the last commit point alive in the index.
> You just have to create a deletion policy that keeps all commit points
> younger than X amount of time (ther is an example of this in Lucene's
> TestDeletionPolicy unit test).
>
> This way if the machine crashes and segments_N is not usable you'd
> still have segments_N-1 (and maybe segments_N-2, ..., if they are new
> enough) to fall back to.
>
> However, I'm not sure how large X would need to be, in practice, for
> all write caches to be properly flushed. And, this will necessarily
> use more disk space in your index.
>
> Mike
>
>
> "Mark Miller" <markrmiller@gmail.com> wrote:
>> Even if we cannot guarantee durability, it would be nice if we could
>> guarantee a consistent index. It sounds like the only problem in a
>> machine with a lying drive is that you could lose a number of
>> committed
>> transactions. I would much prefer that to a corrupted index. I can
>> always re-add what was lost much quicker than rebuilding a 5
>> million doc
>> archive. In either case, I have my choice between the two as long
>> as the
>> index is guaranteed to be corruption free.
>>
>> robert engels wrote:
>>> Usually you can configure the drives so that sync() ALWAYS syncs -
>>> drive jumpers, driver setup, or other methods. Some drives that are
>>> battery backed and such do not need it.
>>>
>>> Without sync() truly being a sync you could never write a database
>>> that was resilient.
>>>
>>> It will exact a heavier toll on performance that you might think. In
>>> order to do it properly, all filesystem metadata must be sync;d as
>>> well. The biggest difference is that you lose the degree of
>>> multi-processing that is inherent when sync'ing is disabled - as the
>>> drive (or OS) does the physical write asynchronously while the
>>> system
>>> does other work - with sync() this is lost.
>>>
>>> This is why in a db system, the only file that is sync'd is the log
>>> file - all other files can be made "in sync" from the log file - and
>>> this file is normally striped for optimum write performance. Some
>>> systems have special "log file drives" (some even solid state, or
>>> battery backed ram) to aid the performance.
>>>
>>>
>>> On Nov 4, 2007, at 8:30 AM, Yonik Seeley wrote:
>>>
>>>> On 11/4/07, Michael McCandless <lucene@mikemccandless.com> wrote:
>>>>> The problem is, on a hard shutdown (kill -9 or JVM/machine
>>>>> crashes),
>>>>> apparently future operations may have completed while some past
>>>>> operations have not. For example, the new segments_N file was
>>>>> successfully written while say the _X.fdx file of the just-flushed
>>>>> segment was not successfully written, even though Lucene had
>>>>> written &
>>>>> closed _X.fdx before segments_N.
>>>>
>>>> That should be impossible except for a machine crash. Kill -9 or a
>>>> JVM crash should have no effect on data already written.
>>>>
>>>> But a sync option would be both simple and useful for people
>>>> trying to
>>>> take live snapshots of an index, or to protect against machine
>>>> crashes. This isn't an absolute 100% guarantee either (so don't
>>>> test
>>>> for it) - the drives often lie to the OS about data being flushed.
>>>> It's the best we can do at our level though.
>>>> http://www.google.com/search?q=fsync+drive+lies
>>>>
>>>> -Yonik
>>>>
>>>> -------------------------------------------------------------------
>>>> --
>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>
>>>
>>>
>>> --------------------------------------------------------------------
>>> -
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
[jira] Resolved: (LUCENE-1044) Behavior on hard power shutdown [ In reply to ]
[ https://issues.apache.org/jira/browse/LUCENE-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless resolved LUCENE-1044.
----------------------------------------

Resolution: Fixed
Fix Version/s: 2.3

I just committed this. Thanks Venkat!

> Behavior on hard power shutdown
> -------------------------------
>
> Key: LUCENE-1044
> URL: https://issues.apache.org/jira/browse/LUCENE-1044
> Project: Lucene - Java
> Issue Type: Bug
> Components: Index
> Environment: Windows Server 2003, Standard Edition, Sun Hotspot Java 1.5
> Reporter: venkat rangan
> Assignee: Michael McCandless
> Fix For: 2.3
>
> Attachments: LUCENE-1044.patch, LUCENE-1044.take2.patch, LUCENE-1044.take3.patch
>
>
> When indexing a large number of documents, upon a hard power failure (e.g. pull the power cord), the index seems to get corrupted. We start a Java application as an Windows Service, and feed it documents. In some cases (after an index size of 1.7GB, with 30-40 index segment .cfs files) , the following is observed.
> The 'segments' file contains only zeros. Its size is 265 bytes - all bytes are zeros.
> The 'deleted' file also contains only zeros. Its size is 85 bytes - all bytes are zeros.
> Before corruption, the segments file and deleted file appear to be correct. After this corruption, the index is corrupted and lost.
> This is a problem observed in Lucene 1.4.3. We are not able to upgrade our customer deployments to 1.9 or later version, but would be happy to back-port a patch, if the patch is small enough and if this problem is already solved.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org