Mailing List Archive

NFS concurrency improvements
Greets,

Now that a couple stability issues have been addressed in trunk, it's
time to mention some changes that went in a few weeks back.

Three new public APIs have been exposed:

KinoSearch::Store::Lock
KinoSearch::Store::LockFactory
KinoSearch::Store::SharedLock

Additionally, KinoSearch::Index::IndexReader has recently been made
public, a move that had been long planned for various reasons,
including this one. Documentation for all is linked off of...

http://www.rectangular.com/kinosearch/docs/devel/

To address the problem of files being deleted out from underneath
active searchers/readers accessing indexes located on NFS volumes,
voluntary read-locking can now be enabled. Readers establish a
"lock" on a particular segments_XXX.yaml file, which has the effect
of "locking" all its dependent files by proxy. If InvIndexer looks
for and detects such a lock, it will refrain from unlinking any of
those files.

The other problem the system addresses is that of write applications
from multiple hosts causing index corruption when they attempt to
modify the same index at the same time. This can be prevented by
having each write application identify itself with an 'agent_id'.
Locks will clobber existing locks with the same agent_id if the pid
associated with that lock is not active; however, they will not
clobber an existing lock with a different agent_id.

The locking system is advisory in the sense that all applications
which access the shared directory must turn on enhanced locking.
(The default locking system only prevents multiple writers on the
same computer from modifying the same index.) This is done by
providing a lock_factory argument to either InvIndexer->new or
IndexReader->open.

use Sys::Hostname;
my $hostname = hostname();
die "Can't get unique hostname" unless $hostname;

my $invindex = MySchema->open('/path/to/invindex/on/nfs/volume');
my $lock_factory = KinoSearch::Store::LockFactory->new(
folder => $invindex->get_folder,
agent_id => $hostname,
);
my $reader = KinoSearch::Index::IndexReader->new(
invindex => $invindex,
lock_factory => $lock_factory,
);
my $searcher = KinoSearch::Searcher->new( reader => $reader );

Lock and SharedLock are implemented using lockfiles within the index
directory, because that is the only system which is portable.
However, if lockfiles are inappropriate for you, it is possible to
subclass LockFactory, Lock and SharedLock.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/